Magnan, Morris A; Maklebust, Joann
2008-01-01
To evaluate the effect of Web-based Braden Scale training on the reliability and precision of pressure ulcer risk assessments made by registered nurses (RN) working in acute care settings. Pretest-posttest, 2-group, quasi-experimental design. Five hundred Braden Scale risk assessments were made on 102 acute care patients deemed to be at various levels of risk for pressure ulceration. Assessments were made by RNs working in acute care hospitals at 3 different medical centers where the Braden Scale was in regular daily use (2 medical centers) or new to the setting (1 medical center). The Braden Scale for Predicting Pressure Sore Risk was used to guide pressure ulcer risk assessments. A Web-based version of the Detroit Medical Center Braden Scale Computerized Training Module was used to teach nurses correct use of the Braden Scale and selection of risk-based pressure ulcer prevention interventions. In the aggregate, RN generated reliable Braden Scale pressure ulcer risk assessments 65% of the time after training. The effect of Web-based Braden Scale training on reliability and precision of assessments varied according to familiarity with the scale. With training, new users of the scale made reliable assessments 84% of the time and significantly improved precision of their assessments. The reliability and precision of Braden Scale risk assessments made by its regular users was unaffected by training. Technology-assisted Braden Scale training improved both reliability and precision of risk assessments made by new users of the scale, but had virtually no effect on the reliability or precision of risk assessments made by regular users of the instrument. Further research is needed to determine best approaches for improving reliability and precision of Braden Scale assessments made by its regular users.
Mani, Suresh; Sharma, Shobha; Omar, Baharudin; Paungmali, Aatit; Joseph, Leonard
2017-04-01
Purpose The purpose of this review is to systematically explore and summarise the validity and reliability of telerehabilitation (TR)-based physiotherapy assessment for musculoskeletal disorders. Method A comprehensive systematic literature review was conducted using a number of electronic databases: PubMed, EMBASE, PsycINFO, Cochrane Library and CINAHL, published between January 2000 and May 2015. The studies examined the validity, inter- and intra-rater reliabilities of TR-based physiotherapy assessment for musculoskeletal conditions were included. Two independent reviewers used the Quality Appraisal Tool for studies of diagnostic Reliability (QAREL) and the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) tool to assess the methodological quality of reliability and validity studies respectively. Results A total of 898 hits were achieved, of which 11 articles based on inclusion criteria were reviewed. Nine studies explored the concurrent validity, inter- and intra-rater reliabilities, while two studies examined only the concurrent validity. Reviewed studies were moderate to good in methodological quality. The physiotherapy assessments such as pain, swelling, range of motion, muscle strength, balance, gait and functional assessment demonstrated good concurrent validity. However, the reported concurrent validity of lumbar spine posture, special orthopaedic tests, neurodynamic tests and scar assessments ranged from low to moderate. Conclusion TR-based physiotherapy assessment was technically feasible with overall good concurrent validity and excellent reliability, except for lumbar spine posture, orthopaedic special tests, neurodynamic testa and scar assessment.
ERIC Educational Resources Information Center
Ghazali, Nor Hasnida Md
2016-01-01
A valid, reliable and practical instrument is needed to evaluate the implementation of the school-based assessment (SBA) system. The aim of this study is to develop and assess the validity and reliability of an instrument to measure the perception of teachers towards the SBA implementation in schools. The instrument is developed based on a…
ERIC Educational Resources Information Center
Chang, Chi-Cheng; Wu, Bing-Hong
2012-01-01
This study explored the reliability and validity of teacher assessment under a Web-based portfolio assessment environment (or Web-based teacher portfolio assessment). Participants were 72 eleventh graders taking the "Computer Application" course. The students perform portfolio creation, inspection, self- and peer-assessment using the Web-based…
Parts and Components Reliability Assessment: A Cost Effective Approach
NASA Technical Reports Server (NTRS)
Lee, Lydia
2009-01-01
System reliability assessment is a methodology which incorporates reliability analyses performed at parts and components level such as Reliability Prediction, Failure Modes and Effects Analysis (FMEA) and Fault Tree Analysis (FTA) to assess risks, perform design tradeoffs, and therefore, to ensure effective productivity and/or mission success. The system reliability is used to optimize the product design to accommodate today?s mandated budget, manpower, and schedule constraints. Stand ard based reliability assessment is an effective approach consisting of reliability predictions together with other reliability analyses for electronic, electrical, and electro-mechanical (EEE) complex parts and components of large systems based on failure rate estimates published by the United States (U.S.) military or commercial standards and handbooks. Many of these standards are globally accepted and recognized. The reliability assessment is especially useful during the initial stages when the system design is still in the development and hard failure data is not yet available or manufacturers are not contractually obliged by their customers to publish the reliability estimates/predictions for their parts and components. This paper presents a methodology to assess system reliability using parts and components reliability estimates to ensure effective productivity and/or mission success in an efficient manner, low cost, and tight schedule.
Towards early software reliability prediction for computer forensic tools (case study).
Abu Talib, Manar
2016-01-01
Versatility, flexibility and robustness are essential requirements for software forensic tools. Researchers and practitioners need to put more effort into assessing this type of tool. A Markov model is a robust means for analyzing and anticipating the functioning of an advanced component based system. It is used, for instance, to analyze the reliability of the state machines of real time reactive systems. This research extends the architecture-based software reliability prediction model for computer forensic tools, which is based on Markov chains and COSMIC-FFP. Basically, every part of the computer forensic tool is linked to a discrete time Markov chain. If this can be done, then a probabilistic analysis by Markov chains can be performed to analyze the reliability of the components and of the whole tool. The purposes of the proposed reliability assessment method are to evaluate the tool's reliability in the early phases of its development, to improve the reliability assessment process for large computer forensic tools over time, and to compare alternative tool designs. The reliability analysis can assist designers in choosing the most reliable topology for the components, which can maximize the reliability of the tool and meet the expected reliability level specified by the end-user. The approach of assessing component-based tool reliability in the COSMIC-FFP context is illustrated with the Forensic Toolkit Imager case study.
O'Grady, Michael G; Dusing, Stacey C
2015-01-01
Play is vital for development. Infants and children learn through play. Traditional standardized developmental tests measure whether a child performs individual skills within controlled environments. Play-based assessments can measure skill performance during natural, child-driven play. The purpose of this study was to systematically review reliability, validity, and responsiveness of all play-based assessments that quantify motor and cognitive skills in children from birth to 36 months of age. Studies were identified from a literature search using PubMed, ERIC, CINAHL, and PsycINFO databases and the reference lists of included papers. Included studies investigated reliability, validity, or responsiveness of play-based assessments that measured motor and cognitive skills for children to 36 months of age. Two reviewers independently screened 40 studies for eligibility and inclusion. The reviewers independently extracted reliability, validity, and responsiveness data. They examined measurement properties and methodological quality of the included studies. Four current play-based assessment tools were identified in 8 included studies. Each play-based assessment tool measured motor and cognitive skills in a different way during play. Interrater reliability correlations ranged from .86 to .98 for motor development and from .23 to .90 for cognitive development. Test-retest reliability correlations ranged from .88 to .95 for motor development and from .45 to .91 for cognitive development. Structural validity correlations ranged from .62 to .90 for motor development and from .42 to .93 for cognitive development. One study assessed responsiveness to change in motor development. Most studies had small and poorly described samples. Lack of transparency in data management and statistical analysis was common. Play-based assessments have potential to be reliable and valid tools to assess cognitive and motor skills, but higher-quality research is needed. Psychometric properties should be considered for each play-based assessment before it is used in clinical and research practice. © 2015 American Physical Therapy Association.
ERIC Educational Resources Information Center
Chang, Chi-Cheng; Liang, Chaoyun; Chen, Yi-Hui
2013-01-01
This study explored the reliability and validity of Web-based portfolio self-assessment. Participants were 72 senior high school students enrolled in a computer application course. The students created learning portfolios, viewed peers' work, and performed self-assessment on the Web-based portfolio assessment system. The results indicated: 1)…
Leifker, Feea R.; Patterson, Thomas L.; Bowie, Christopher R.; Mausbach, Brent T.; Harvey, Philip D.
2010-01-01
Performance-based measures of the ability to perform social and everyday living skills are being more widely used to assess functional capacity in people with serious mental illnesses such as schizophrenia and bipolar disorder. Since they are also being used as outcome measures in pharmacological and cognitive remediation studies aimed at cognitive impairments in schizophrenia, understanding their measurement properties and potential sensitivity to change is important. In this study, the test-retest reliability, practice effects, and reliable change indices of two different performance-based functional capacity measures, the UCSD Performance-based skills assessment (UPSA) and Social skills performance assessment (SSPA) were examined over several different retest intervals in two different samples of people with schizophrenia (n’s=238 and 116) and a healthy comparison sample (n=109). These psychometric properties were compared to those of a neuropsychological assessment battery. Test-retest reliabilities of the long form of the UPSA ranged from r=.63 to r=.80 over follow-up periods up to 36 months in people with schizophrenia, while brief UPSA reliabilities ranged from r=.66 to r=.81. Test-retest reliability of the NP performance scores ranged from r=.77 to r=.79. Test-retest reliabilities of the UPSA were lower in healthy controls, while NP performance was slightly more reliable. SSPA test-retest reliability was lower. Practice effect sizes ranged from .05 to .16 for the UPSA and .07 to .19 for the NP assessment in patients, with HC having more practice effects. Reliable change intervals were consistent across NP and both FC measures, indicating equal potential for detection of change. These performance-based measures of functional capacity appear to have similar potential to be sensitive to change compared to NP performance in people with schizophrenia. PMID:20399613
Inter-rater reliability of an observation-based ergonomics assessment checklist for office workers.
Pereira, Michelle Jessica; Straker, Leon Melville; Comans, Tracy Anne; Johnston, Venerina
2016-12-01
To establish the inter-rater reliability of an observation-based ergonomics assessment checklist for computer workers. A 37-item (38-item if a laptop was part of the workstation) comprehensive observational ergonomics assessment checklist comparable to government guidelines and up to date with empirical evidence was developed. Two trained practitioners assessed full-time office workers performing their usual computer-based work and evaluated the suitability of workstations used. Practitioners assessed each participant consecutively. The order of assessors was randomised, and the second assessor was blinded to the findings of the first. Unadjusted kappa coefficients between the raters were obtained for the overall checklist and subsections that were formed from question-items relevant to specific workstation equipment. Twenty-seven office workers were recruited. The inter-rater reliability between two trained practitioners achieved moderate to good reliability for all except one checklist component. This checklist has mostly moderate to good reliability between two trained practitioners. Practitioner Summary: This reliable ergonomics assessment checklist for computer workers was designed using accessible government guidelines and supplemented with up-to-date evidence. Employers in Queensland (Australia) can fulfil legislative requirements by using this reliable checklist to identify and subsequently address potential risk factors for work-related injury to provide a safe working environment.
Advanced reliability modeling of fault-tolerant computer-based systems
NASA Technical Reports Server (NTRS)
Bavuso, S. J.
1982-01-01
Two methodologies for the reliability assessment of fault tolerant digital computer based systems are discussed. The computer-aided reliability estimation 3 (CARE 3) and gate logic software simulation (GLOSS) are assessment technologies that were developed to mitigate a serious weakness in the design and evaluation process of ultrareliable digital systems. The weak link is based on the unavailability of a sufficiently powerful modeling technique for comparing the stochastic attributes of one system against others. Some of the more interesting attributes are reliability, system survival, safety, and mission success.
Lord, Sarah Peregrine; Can, Doğan; Yi, Michael; Marin, Rebeca; Dunn, Christopher W.; Imel, Zac E.; Georgiou, Panayiotis; Narayanan, Shrikanth; Steyvers, Mark; Atkins, David C.
2014-01-01
The current paper presents novel methods for collecting MISC data and accurately assessing reliability of behavior codes at the level of the utterance. The MISC 2.1 was used to rate MI interviews from five randomized trials targeting alcohol and drug use. Sessions were coded at the utterance-level. Utterance-based coding reliability was estimated using three methods and compared to traditional reliability estimates of session tallies. Session-level reliability was generally higher compared to reliability using utterance-based codes, suggesting that typical methods for MISC reliability may be biased. These novel methods in MI fidelity data collection and reliability assessment provided rich data for therapist feedback and further analyses. Beyond implications for fidelity coding, utterance-level coding schemes may elucidate important elements in the counselor-client interaction that could inform theories of change and the practice of MI. PMID:25242192
Lord, Sarah Peregrine; Can, Doğan; Yi, Michael; Marin, Rebeca; Dunn, Christopher W; Imel, Zac E; Georgiou, Panayiotis; Narayanan, Shrikanth; Steyvers, Mark; Atkins, David C
2015-02-01
The current paper presents novel methods for collecting MISC data and accurately assessing reliability of behavior codes at the level of the utterance. The MISC 2.1 was used to rate MI interviews from five randomized trials targeting alcohol and drug use. Sessions were coded at the utterance-level. Utterance-based coding reliability was estimated using three methods and compared to traditional reliability estimates of session tallies. Session-level reliability was generally higher compared to reliability using utterance-based codes, suggesting that typical methods for MISC reliability may be biased. These novel methods in MI fidelity data collection and reliability assessment provided rich data for therapist feedback and further analyses. Beyond implications for fidelity coding, utterance-level coding schemes may elucidate important elements in the counselor-client interaction that could inform theories of change and the practice of MI. Copyright © 2015 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Abdenov, A. Zh; Trushin, V. A.; Abdenova, G. A.
2018-01-01
The paper considers the questions of filling the relevant SIEM nodes based on calculations of objective assessments in order to improve the reliability of subjective expert assessments. The proposed methodology is necessary for the most accurate security risk assessment of information systems. This technique is also intended for the purpose of establishing real-time operational information protection in the enterprise information systems. Risk calculations are based on objective estimates of the adverse events implementation probabilities, predictions of the damage magnitude from information security violations. Calculations of objective assessments are necessary to increase the reliability of the proposed expert assessments.
Development of Internet-Based Tasks for the Executive Function Performance Test.
Rand, Debbie; Lee Ben-Haim, Keren; Malka, Rachel; Portnoy, Sigal
The Executive Function Performance Test (EFPT) is a reliable and valid performance-based tool to assess executive functions (EFs). This study's objective was to develop and verify two Internet-based tasks for the EFPT. A cross-sectional study assessed the alternate-form reliability of the Internet-based bill-paying and telephone-use tasks in healthy adults and people with subacute stroke (Study 1). It also sought to establish the tasks' criterion reliability for assessing EF deficits by correlating performance with that on the Trail Making Test in five groups: healthy young adults, healthy older adults, people with subacute stroke, people with chronic stroke, and young adults with attention deficit hyperactivity disorder (Study 2). The alternative-form reliability and initial construct validity for the Internet-based bill-paying task were verified. Criterion validity was established for both tasks. The Internet-based tasks are comparable to the original EFPT tasks and can be used for assessment of EF deficits. Copyright © 2018 by the American Occupational Therapy Association, Inc.
Infant polysomnography: reliability and validity of infant arousal assessment.
Crowell, David H; Kulp, Thomas D; Kapuniai, Linda E; Hunt, Carl E; Brooks, Lee J; Weese-Mayer, Debra E; Silvestri, Jean; Ward, Sally Davidson; Corwin, Michael; Tinsley, Larry; Peucker, Mark
2002-10-01
Infant arousal scoring based on the Atlas Task Force definition of transient EEG arousal was evaluated to determine (1). whether transient arousals can be identified and assessed reliably in infants and (2). whether arousal and no-arousal epochs scored previously by trained raters can be validated reliably by independent sleep experts. Phase I for inter- and intrarater reliability scoring was based on two datasets of sleep epochs selected randomly from nocturnal polysomnograms of healthy full-term, preterm, idiopathic apparent life-threatening event cases, and siblings of Sudden Infant Death Syndrome infants of 35 to 64 weeks postconceptional age. After training, test set 1 reliability was assessed and discrepancies identified. After retraining, test set 2 was scored by the same raters to determine interrater reliability. Later, three raters from the trained group rescored test set 2 to assess inter- and intrarater reliabilities. Interrater and intrarater reliability kappa's, with 95% confidence intervals, ranged from substantial to almost perfect levels of agreement. Interrater reliabilities for spontaneous arousals were initially moderate and then substantial. During the validation phase, 315 previously scored epochs were presented to four sleep experts to rate as containing arousal or no-arousal events. Interrater expert agreements were diverse and considered as noninterpretable. Concordance in sleep experts' agreements, based on identification of the previously sampled arousal and no-arousal epochs, was used as a secondary evaluative technique. Results showed agreement by two or more experts on 86% of the Collaborative Home Infant Monitoring Evaluation Study arousal scored events. Conversely, only 1% of the Collaborative Home Infant Monitoring Evaluation Study-scored no-arousal epochs were rated as an arousal. In summary, this study presents an empirically tested model with procedures and criteria for attaining improved reliability in transient EEG arousal assessments in infants using the modified Atlas Task Force standards. With training based on specific criteria, substantial inter- and intrarater agreement in identifying infant arousals was demonstrated. Corroborative validation results were too disparate for meaningful interpretation. Alternate evaluation based on concordance agreements supports reliance on infant EEG criteria for assessment. Results mandate additional confirmatory validation studies with specific training on infant EEG arousal assessment criteria.
Cai, Gaigai; Chen, Xuefeng; Li, Bing; Chen, Baojia; He, Zhengjia
2012-01-01
The reliability of cutting tools is critical to machining precision and production efficiency. The conventional statistic-based reliability assessment method aims at providing a general and overall estimation of reliability for a large population of identical units under given and fixed conditions. However, it has limited effectiveness in depicting the operational characteristics of a cutting tool. To overcome this limitation, this paper proposes an approach to assess the operation reliability of cutting tools. A proportional covariate model is introduced to construct the relationship between operation reliability and condition monitoring information. The wavelet packet transform and an improved distance evaluation technique are used to extract sensitive features from vibration signals, and a covariate function is constructed based on the proportional covariate model. Ultimately, the failure rate function of the cutting tool being assessed is calculated using the baseline covariate function obtained from a small sample of historical data. Experimental results and a comparative study show that the proposed method is effective for assessing the operation reliability of cutting tools. PMID:23201980
Michaelsen, Stella M; Rocha, André S; Knabben, Rodrigo J; Rodrigues, Luciano P; Fernandes, Claudia G C
2011-01-01
Recently, the reliability of the Brazilian version of the Fugl-Meyer Assessment (FMA) was assessed through the scoring given according to observations made by a single evaluator who applied the test. When different raters apply the scale, the reliability may depend on the interpretation given to the assessment sheet. In such cases, a clear administration manual is essential for ensuring homogeneity of application. To translate and adapt the French Canadian version of the FMA administration manual into Brazilian Portuguese and to evaluate the inter-rater reliability when different evaluators apply the FMA on the basis of the information contained in the manual. Eighteen adults (59±10 years) with chronic hemiparesis (38±35 months after a stroke) took part in this study. Eight patients participated in the first part of the study and 10 in the second part. Based on analyzing the results from part 1, an adapted version was developed, in which information and photos were added to illustrate the positions of the patient and evaluator. The inter-rater reliability was assessed using the intraclass correlation coefficient (ICC). The reliability of the FMA based on the adapted version of the manual was excellent for the total motor scores for the upper limbs (ICC=0.98) and lower limbs (ICC=0.90), as well as for movement sense (ICC=0.98) and upper and lower-limb passive range of motion (ICC=0.84 and 0.90, respectively). The reliability was moderate for tactile sensitivity (0.75). The joint pain assessment presented low reliability. The results showed that, except for pain assessment, application of the FMA based on the adapted version of the application manual for Brazilian Portuguese presented adequate inter-rater reliability.
Composite Reliability of a Workplace-Based Assessment Toolbox for Postgraduate Medical Education
ERIC Educational Resources Information Center
Moonen-van Loon, J. M. W.; Overeem, K.; Donkers, H. H. L. M.; van der Vleuten, C. P. M.; Driessen, E. W.
2013-01-01
In recent years, postgraduate assessment programmes around the world have embraced workplace-based assessment (WBA) and its related tools. Despite their widespread use, results of studies on the validity and reliability of these tools have been variable. Although in many countries decisions about residents' continuation of training and…
Assessing segment- and corridor-based travel-time reliability on urban freeways : final report.
DOT National Transportation Integrated Search
2016-09-01
Travel time and its reliability are intuitive performance measures for freeway traffic operations. The objective of this project was to quantify segment-based and corridor-based travel time reliability measures on urban freeways. To achieve this obje...
Murphy, Douglas J; Bruce, David A; Mercer, Stewart W; Eva, Kevin W
2009-05-01
To investigate the reliability and feasibility of six potential workplace-based assessment methods in general practice training: criterion audit, multi-source feedback from clinical and non-clinical colleagues, patient feedback (the CARE Measure), referral letters, significant event analysis, and video analysis of consultations. Performance of GP registrars (trainees) was evaluated with each tool to assess the reliabilities of the tools and feasibility, given raters and number of assessments needed. Participant experience of process determined by questionnaire. 171 GP registrars and their trainers, drawn from nine deaneries (representing all four countries in the UK), participated. The ability of each tool to differentiate between doctors (reliability) was assessed using generalisability theory. Decision studies were then conducted to determine the number of observations required to achieve an acceptably high reliability for "high-stakes assessment" using each instrument. Finally, descriptive statistics were used to summarise participants' ratings of their experience using these tools. Multi-source feedback from colleagues and patient feedback on consultations emerged as the two methods most likely to offer a reliable and feasible opinion of workplace performance. Reliability co-efficients of 0.8 were attainable with 41 CARE Measure patient questionnaires and six clinical and/or five non-clinical colleagues per doctor when assessed on two occasions. For the other four methods tested, 10 or more assessors were required per doctor in order to achieve a reliable assessment, making the feasibility of their use in high-stakes assessment extremely low. Participant feedback did not raise any major concerns regarding the acceptability, feasibility, or educational impact of the tools. The combination of patient and colleague views of doctors' performance, coupled with reliable competence measures, may offer a suitable evidence-base on which to monitor progress and completion of doctors' training in general practice.
Wolf, Timothy J; Dahl, Abigail; Auen, Colleen; Doherty, Meghan
2017-07-01
The objective of this study was to evaluate the inter-rater reliability, test-retest reliability, concurrent validity, and discriminant validity of the Complex Task Performance Assessment (CTPA): an ecologically valid performance-based assessment of executive function. Community control participants (n = 20) and individuals with mild stroke (n = 14) participated in this study. All participants completed the CTPA and a battery of cognitive assessments at initial testing. The control participants completed the CTPA at two different times one week apart. The intra-class correlation coefficient (ICC) for inter-rater reliability for the total score on the CTPA was .991. The ICCs for all of the sub-scores of the CTPA were also high (.889-.977). The CTPA total score was significantly correlated to Condition 4 of the DKEFS Color-Word Interference Test (p = -.425), and the Wechsler Test of Adult Reading (p = -.493). Finally, there were significant differences between control subjects and individuals with mild stroke on the total score of the CTPA (p = .007) and all sub-scores except interpretation failures and total items incorrect. These results are also consistent with other current executive function performance-based assessments and indicate that the CTPA is a reliable and valid performance-based measure of executive function.
Mash, Bob; Derese, Anselme
2013-01-01
Abstract Background Competency-based education and the validity and reliability of workplace-based assessment of postgraduate trainees have received increasing attention worldwide. Family medicine was recognised as a speciality in South Africa six years ago and a satisfactory portfolio of learning is a prerequisite to sit the national exit exam. A massive scaling up of the number of family physicians is needed in order to meet the health needs of the country. Aim The aim of this study was to develop a reliable, robust and feasible portfolio assessment tool (PAT) for South Africa. Methods Six raters each rated nine portfolios from the Stellenbosch University programme, using the PAT, to test for inter-rater reliability. This rating was repeated three months later to determine test–retest reliability. Following initial analysis and feedback the PAT was modified and the inter-rater reliability again assessed on nine new portfolios. An acceptable intra-class correlation was considered to be > 0.80. Results The total score was found to be reliable, with a coefficient of 0.92. For test–retest reliability, the difference in mean total score was 1.7%, which was not statistically significant. Amongst the subsections, only assessment of the educational meetings and the logbook showed reliability coefficients > 0.80. Conclusion This was the first attempt to develop a reliable, robust and feasible national portfolio assessment tool to assess postgraduate family medicine training in the South African context. The tool was reliable for the total score, but the low reliability of several sections in the PAT helped us to develop 12 recommendations regarding the use of the portfolio, the design of the PAT and the training of raters.
Reliability Assessment of a Robust Design Under Uncertainty for a 3-D Flexible Wing
NASA Technical Reports Server (NTRS)
Gumbert, Clyde R.; Hou, Gene J. -W.; Newman, Perry A.
2003-01-01
The paper presents reliability assessment results for the robust designs under uncertainty of a 3-D flexible wing previously reported by the authors. Reliability assessments (additional optimization problems) of the active constraints at the various probabilistic robust design points are obtained and compared with the constraint values or target constraint probabilities specified in the robust design. In addition, reliability-based sensitivity derivatives with respect to design variable mean values are also obtained and shown to agree with finite difference values. These derivatives allow one to perform reliability based design without having to obtain second-order sensitivity derivatives. However, an inner-loop optimization problem must be solved for each active constraint to find the most probable point on that constraint failure surface.
Vanwolleghem, Griet; Van Dyck, Delfien; Ducheyne, Fabian; De Bourdeaudhuij, Ilse; Cardon, Greet
2014-06-10
Google Street View provides a valuable and efficient alternative to observe the physical environment compared to on-site fieldwork. However, studies on the use, reliability and validity of Google Street View in a cycling-to-school context are lacking. We aimed to study the intra-, inter-rater reliability and criterion validity of EGA-Cycling (Environmental Google Street View Based Audit - Cycling to school), a newly developed audit using Google Street View to assess the physical environment along cycling routes to school. Parents (n = 52) of 11-to-12-year old Flemish children, who mostly cycled to school, completed a questionnaire and identified their child's cycling route to school on a street map. Fifty cycling routes of 11-to-12-year olds were identified and physical environmental characteristics along the identified routes were rated with EGA-Cycling (5 subscales; 37 items), based on Google Street View. To assess reliability, two researchers performed the audit. Criterion validity of the audit was examined by comparing the ratings based on Google Street View with ratings through on-site assessments. Intra-rater reliability was high (kappa range 0.47-1.00). Large variations in the inter-rater reliability (kappa range -0.03-1.00) and criterion validity scores (kappa range -0.06-1.00) were reported, with acceptable inter-rater reliability values for 43% of all items and acceptable criterion validity for 54% of all items. EGA-Cycling can be used to assess physical environmental characteristics along cycling routes to school. However, to assess the micro-environment specifically related to cycling, on-site assessments have to be added.
The Validation of a Case-Based, Cumulative Assessment and Progressions Examination
Coker, Adeola O.; Copeland, Jeffrey T.; Gottlieb, Helmut B.; Horlen, Cheryl; Smith, Helen E.; Urteaga, Elizabeth M.; Ramsinghani, Sushma; Zertuche, Alejandra; Maize, David
2016-01-01
Objective. To assess content and criterion validity, as well as reliability of an internally developed, case-based, cumulative, high-stakes third-year Annual Student Assessment and Progression Examination (P3 ASAP Exam). Methods. Content validity was assessed through the writing-reviewing process. Criterion validity was assessed by comparing student scores on the P3 ASAP Exam with the nationally validated Pharmacy Curriculum Outcomes Assessment (PCOA). Reliability was assessed with psychometric analysis comparing student performance over four years. Results. The P3 ASAP Exam showed content validity through representation of didactic courses and professional outcomes. Similar scores on the P3 ASAP Exam and PCOA with Pearson correlation coefficient established criterion validity. Consistent student performance using Kuder-Richardson coefficient (KR-20) since 2012 reflected reliability of the examination. Conclusion. Pharmacy schools can implement internally developed, high-stakes, cumulative progression examinations that are valid and reliable using a robust writing-reviewing process and psychometric analyses. PMID:26941435
Schiffman, Eric L.; Truelove, Edmond L.; Ohrbach, Richard; Anderson, Gary C.; John, Mike T.; List, Thomas; Look, John O.
2011-01-01
AIMS The purpose of the Research Diagnostic Criteria for Temporomandibular Disorders (RDC/TMD) Validation Project was to assess the diagnostic validity of this examination protocol. An overview is presented, including Axis I and II methodology and descriptive statistics for the study participant sample. This paper details the development of reliable methods to establish the reference standards for assessing criterion validity of the Axis I RDC/TMD diagnoses. Validity testing for the Axis II biobehavioral instruments was based on previously validated reference standards. METHODS The Axis I reference standards were based on the consensus of 2 criterion examiners independently performing a comprehensive history, clinical examination, and evaluation of imaging. Intersite reliability was assessed annually for criterion examiners and radiologists. Criterion exam reliability was also assessed within study sites. RESULTS Study participant demographics were comparable to those of participants in previous studies using the RDC/TMD. Diagnostic agreement of the criterion examiners with each other and with the consensus-based reference standards was excellent with all kappas ≥ 0.81, except for osteoarthrosis (moderate agreement, k = 0.53). Intrasite criterion exam agreement with reference standards was excellent (k ≥ 0.95). Intersite reliability of the radiologists for detecting computed tomography-disclosed osteoarthrosis and magnetic resonance imaging-disclosed disc displacement was good to excellent (k = 0.71 and 0.84, respectively). CONCLUSION The Validation Project study population was appropriate for assessing the reliability and validity of the RDC/TMD Axis I and II. The reference standards used to assess the validity of Axis I TMD were based on reliable and clinically credible methods. PMID:20213028
NASA Astrophysics Data System (ADS)
Li, Lin; Zeng, Li; Lin, Zi-Jing; Cazzell, Mary; Liu, Hanli
2015-05-01
Test-retest reliability of neuroimaging measurements is an important concern in the investigation of cognitive functions in the human brain. To date, intraclass correlation coefficients (ICCs), originally used in inter-rater reliability studies in behavioral sciences, have become commonly used metrics in reliability studies on neuroimaging and functional near-infrared spectroscopy (fNIRS). However, as there are six popular forms of ICC, the adequateness of the comprehensive understanding of ICCs will affect how one may appropriately select, use, and interpret ICCs toward a reliability study. We first offer a brief review and tutorial on the statistical rationale of ICCs, including their underlying analysis of variance models and technical definitions, in the context of assessment on intertest reliability. Second, we provide general guidelines on the selection and interpretation of ICCs. Third, we illustrate the proposed approach by using an actual research study to assess intertest reliability of fNIRS-based, volumetric diffuse optical tomography of brain activities stimulated by a risk decision-making protocol. Last, special issues that may arise in reliability assessment using ICCs are discussed and solutions are suggested.
Peterson, Eleanor B; Calhoun, Aaron W; Rider, Elizabeth A
2014-09-01
With increased recognition of the importance of sound communication skills and communication skills education, reliable assessment tools are essential. This study reports on the psychometric properties of an assessment tool based on the Kalamazoo Consensus Statement Essential Elements Communication Checklist. The Gap-Kalamazoo Communication Skills Assessment Form (GKCSAF), a modified version of an existing communication skills assessment tool, the Kalamazoo Essential Elements Communication Checklist-Adapted, was used to assess learners in a multidisciplinary, simulation-based communication skills educational program using multiple raters. 118 simulated conversations were available for analysis. Internal consistency and inter-rater reliability were determined by calculating a Cronbach's alpha score and intra-class correlation coefficients (ICC), respectively. The GKCSAF demonstrated high internal consistency with a Cronbach's alpha score of 0.844 (faculty raters) and 0.880 (peer observer raters), and high inter-rater reliability with an ICC of 0.830 (faculty raters) and 0.89 (peer observer raters). The Gap-Kalamazoo Communication Skills Assessment Form is a reliable method of assessing the communication skills of multidisciplinary learners using multi-rater methods within the learning environment. The Gap-Kalamazoo Communication Skills Assessment Form can be used by educational programs that wish to implement a reliable assessment and feedback system for a variety of learners. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Probabilistic Assessment of National Wind Tunnel
NASA Technical Reports Server (NTRS)
Shah, A. R.; Shiao, M.; Chamis, C. C.
1996-01-01
A preliminary probabilistic structural assessment of the critical section of National Wind Tunnel (NWT) is performed using NESSUS (Numerical Evaluation of Stochastic Structures Under Stress) computer code. Thereby, the capabilities of NESSUS code have been demonstrated to address reliability issues of the NWT. Uncertainties in the geometry, material properties, loads and stiffener location on the NWT are considered to perform the reliability assessment. Probabilistic stress, frequency, buckling, fatigue and proof load analyses are performed. These analyses cover the major global and some local design requirements. Based on the assumed uncertainties, the results reveal the assurance of minimum 0.999 reliability for the NWT. Preliminary life prediction analysis results show that the life of the NWT is governed by the fatigue of welds. Also, reliability based proof test assessment is performed.
Schiffman, Eric L; Truelove, Edmond L; Ohrbach, Richard; Anderson, Gary C; John, Mike T; List, Thomas; Look, John O
2010-01-01
The purpose of the Research Diagnostic Criteria for Temporomandibular Disorders (RDC/TMD) Validation Project was to assess the diagnostic validity of this examination protocol. The aim of this article is to provide an overview of the project's methodology, descriptive statistics, and data for the study participant sample. This article also details the development of reliable methods to establish the reference standards for assessing criterion validity of the Axis I RDC/TMD diagnoses. The Axis I reference standards were based on the consensus of two criterion examiners independently performing a comprehensive history, clinical examination, and evaluation of imaging. Intersite reliability was assessed annually for criterion examiners and radiologists. Criterion examination reliability was also assessed within study sites. Study participant demographics were comparable to those of participants in previous studies using the RDC/TMD. Diagnostic agreement of the criterion examiners with each other and with the consensus-based reference standards was excellent with all kappas > or = 0.81, except for osteoarthrosis (moderate agreement, k = 0.53). Intrasite criterion examiner agreement with reference standards was excellent (k > or = 0.95). Intersite reliability of the radiologists for detecting computed tomography-disclosed osteoarthrosis and magnetic resonance imaging-disclosed disc displacement was good to excellent (k = 0.71 and 0.84, respectively). The Validation Project study population was appropriate for assessing the reliability and validity of the RDC/TMD Axis I and II. The reference standards used to assess the validity of Axis I TMD were based on reliable and clinically credible methods.
NASA Astrophysics Data System (ADS)
Chen, Fan; Huang, Shaoxiong; Ding, Jinjin; Ding, Jinjin; Gao, Bo; Xie, Yuguang; Wang, Xiaoming
2018-01-01
This paper proposes a fast reliability assessing method for distribution grid with distributed renewable energy generation. First, the Weibull distribution and the Beta distribution are used to describe the probability distribution characteristics of wind speed and solar irradiance respectively, and the models of wind farm, solar park and local load are built for reliability assessment. Then based on power system production cost simulation probability discretization and linearization power flow, a optimal power flow objected with minimum cost of conventional power generation is to be resolved. Thus a reliability assessment for distribution grid is implemented fast and accurately. The Loss Of Load Probability (LOLP) and Expected Energy Not Supplied (EENS) are selected as the reliability index, a simulation for IEEE RBTS BUS6 system in MATLAB indicates that the fast reliability assessing method calculates the reliability index much faster with the accuracy ensured when compared with Monte Carlo method.
Spaceflight tracking and data network operational reliability assessment for Skylab
NASA Technical Reports Server (NTRS)
Seneca, V. I.; Mlynarczyk, R. H.
1974-01-01
Data on the spaceflight communications equipment status during the Skylab mission were subjected to an operational reliability assessment. Reliability models were revised to reflect pertinent equipment changes accomplished prior to the beginning of the Skylab missions. Appropriate adjustments were made to fit the data to the models. The availabilities are based on the failure events resulting in the stations inability to support a function of functions and the MTBF's are based on all events including 'can support' and 'cannot support'. Data were received from eleven land-based stations and one ship.
Becker, Anne E.; Roberts, Andrea L.; Perloe, Alexandra; Bainivualiku, Asenaca; Richards, Lauren K.; Gilman, Stephen E.; Striegel-Moore, Ruth H.
2010-01-01
Objective The Global School-based Student Health Survey (GSHS) is an assessment for adolescent health risk behaviors and exposures, supported by the World Health Organization. Although already widely implemented—and intended for youth assessment across diverse ethnic and national contexts—no reliability data have yet been reported for GSHS-based assessment in any ethnicity or country-specific population. This study reports test-retest reliability for GSHS content adapted for a female adolescent ethnic Fijian study sample in Fiji. Design We adapted and translated GSHS content to assess health risk behaviors as part of a larger study investigating the impact of social transition on ethnic Fijian secondary schoolgirls in Fiji. In order to evaluate the performance of this measure for our ethnic Fijian study sample (n=523), we examined its test-retest reliability with kappa coefficients, % agreement, and prevalence estimates in a sub-sample (n=81). Reliability among strata defined by topic, age, and language was also examined. Results Average agreement between test and retest was 77%, and average Cohen's kappa was 0.47. Mean kappas for questions from core modules about alcohol use, tobacco use, and sexual behavior were substantial, and higher than those for modules relating to other risk behaviors. Conclusions Although test-retest reliability of responses within this country-specific version of GSHS content was substantial in several topical domains for this ethnic Fijian sample, only fair reliability for the module assessing dietary behaviors and other individual items suggests that population-specific psychometric evaluation is essential to interpreting language and country-specific GSHS data. PMID:20234961
Integrated performance and reliability specification for digital avionics systems
NASA Technical Reports Server (NTRS)
Brehm, Eric W.; Goettge, Robert T.
1995-01-01
This paper describes an automated tool for performance and reliability assessment of digital avionics systems, called the Automated Design Tool Set (ADTS). ADTS is based on an integrated approach to design assessment that unifies traditional performance and reliability views of system designs, and that addresses interdependencies between performance and reliability behavior via exchange of parameters and result between mathematical models of each type. A multi-layer tool set architecture has been developed for ADTS that separates the concerns of system specification, model generation, and model solution. Performance and reliability models are generated automatically as a function of candidate system designs, and model results are expressed within the system specification. The layered approach helps deal with the inherent complexity of the design assessment process, and preserves long-term flexibility to accommodate a wide range of models and solution techniques within the tool set structure. ADTS research and development to date has focused on development of a language for specification of system designs as a basis for performance and reliability evaluation. A model generation and solution framework has also been developed for ADTS, that will ultimately encompass an integrated set of analytic and simulated based techniques for performance, reliability, and combined design assessment.
Reliability, Compliance, and Security in Web-Based Course Assessments
ERIC Educational Resources Information Center
Bonham, Scott
2008-01-01
Pre- and postcourse assessment has become a very important tool for education research in physics and other areas. The web offers an attractive alternative to in-class paper administration, but concerns about web-based administration include reliability due to changes in medium, student compliance rates, and test security, both question leakage…
Kumar, A; Bridgham, R; Potts, M; Gushurst, C; Hamp, M; Passal, D
2001-01-01
To determine consistency of assessment in a new paper case-based structured oral examination in a multi-community pediatrics clerkship, and to identify correctable problems in the administration of examination and assessment process. Nine paper case-based oral examinations were audio-taped. From audio-tapes five community coordinators scored examiner behaviors and graded student performance. Correlations among examiner behaviors scores were examined. Graphs identified grading patterns of evaluators. The effect of exam-giving on evaluators was assessed by t-test. Reliability of grades was calculated and the effect of reducing assessment problems was modeled. Exam-givers differed most in their "teaching-guiding" behavior, and this negatively correlated with student grades. Exam reliability was lowered mainly by evaluator differences in leniency and grading pattern; less important was absence of standardization in cases. While grade reliability was low in early use of the paper case-based oral examination, modeling of plausible effects of training and monitoring for greater uniformity in administration of the examination and assigning scores suggests that more adequate reliabilities can be attained.
Kvistgaard Olsen, Jack; Fener, Dilay Kesgin; Waehrens, Eva Elisabet; Wulf Christensen, Anton; Jespersen, Anders; Danneskiold-Samsøe, Bente; Bartels, Else Marie
2017-07-01
Computerized pneumatic cuff pressure algometry (CPA) using the DoloCuff is a new method for pain assessment. Intra- and inter-rater reliabilities have not yet been established. Our aim was to examine the inter- and intrarater reliabilities of DoloCuff measures in healthy subjects. Twenty healthy subjects (ages 20 to 29 years) were assessed three times at 24-hour intervals by two trained raters. Inter-rater reliability was established based on the first and second assessments, whereas intrarater reliability was based on the second and third assessments. Subjects were randomized 1:1 to first assessment at either rater 1 or rater 2. The variables of interest were pressure pain threshold (PT), pressure pain tolerance (PTol), and temporal summation index (TSI). Reliability was estimated by a two-way mixed intraclass correlation coefficient (ICC) absolute agreement analysis. Reliability was considered excellent if ICC > 0.75, fair to good if 0.4 < ICC < 0.75, and poor if ICC < 0.4. Bias and random errors between raters and assessments were evaluated using 95% confidence interval (CI) and Bland-Altman plots. Inter-rater reliability for PT, PTol, and TSI was 0.88 (95% CI: 0.69 to 0.95), 0.86 (95% CI: 0.65 to 0.95), and 0.81 (95% CI: 0.42 to 0.94), respectively. The intrarater reliability for PT, PTol, and TSI was 0.81 (95% CI: 0.53 to 0.92), 0.89 (95% CI: 0.74 to 0.96), and 0.75 (95% CI: 0.28 to 0.91), respectively. Inter-rater reliability was excellent for PT, PTol, and TSI. Similarly, the intrarater reliability for PT and PTol was excellent, while borderline excellent/good for TSI. Therefore, the DoloCuff can be used to obtain reliable measures of pressure pain parameters in healthy subjects. © 2016 World Institute of Pain.
TVA-Based Assessment of Visual Attention Using Line-Drawings of Fruits and Vegetables
Wang, Tianlu; Gillebert, Celine R.
2018-01-01
Visuospatial attention and short-term memory allow us to prioritize, select, and briefly maintain part of the visual information that reaches our senses. These cognitive abilities are quantitatively accounted for by Bundesen’s theory of visual attention (TVA; Bundesen, 1990). Previous studies have suggested that TVA-based assessments are sensitive to inter-individual differences in spatial bias, visual short-term memory capacity, top-down control, and processing speed in healthy volunteers as well as in patients with various neurological and psychiatric conditions. However, most neuropsychological assessments of attention and executive functions, including TVA-based assessment, make use of alphanumeric stimuli and/or are performed verbally, which can pose difficulties for individuals who have troubles processing letters or numbers. Here we examined the reliability of TVA-based assessments when stimuli are used that are not alphanumeric, but instead based on line-drawings of fruits and vegetables. We compared five TVA parameters quantifying the aforementioned cognitive abilities, obtained by modeling accuracy data on a whole/partial report paradigm using conventional alphabet stimuli versus the food stimuli. Significant correlations were found for all TVA parameters, indicating a high parallel-form reliability. Split-half correlations assessing internal reliability, and correlations between predicted and observed data assessing goodness-of-fit were both significant. Our results provide an indication that line-drawings of fruits and vegetables can be used for a reliable assessment of attention and short-term memory. PMID:29535660
Reliability-based trajectory optimization using nonintrusive polynomial chaos for Mars entry mission
NASA Astrophysics Data System (ADS)
Huang, Yuechen; Li, Haiyang
2018-06-01
This paper presents the reliability-based sequential optimization (RBSO) method to settle the trajectory optimization problem with parametric uncertainties in entry dynamics for Mars entry mission. First, the deterministic entry trajectory optimization model is reviewed, and then the reliability-based optimization model is formulated. In addition, the modified sequential optimization method, in which the nonintrusive polynomial chaos expansion (PCE) method and the most probable point (MPP) searching method are employed, is proposed to solve the reliability-based optimization problem efficiently. The nonintrusive PCE method contributes to the transformation between the stochastic optimization (SO) and the deterministic optimization (DO) and to the approximation of trajectory solution efficiently. The MPP method, which is used for assessing the reliability of constraints satisfaction only up to the necessary level, is employed to further improve the computational efficiency. The cycle including SO, reliability assessment and constraints update is repeated in the RBSO until the reliability requirements of constraints satisfaction are satisfied. Finally, the RBSO is compared with the traditional DO and the traditional sequential optimization based on Monte Carlo (MC) simulation in a specific Mars entry mission to demonstrate the effectiveness and the efficiency of the proposed method.
Reliable Assessment with CyberTutor, a Web-Based Homework Tutor.
ERIC Educational Resources Information Center
Pritchard, David E.; Morote, Elsa-Sofia
This paper demonstrates that an electronic tutoring program can collect data that enables a far more reliable assessment of students' skills than a standard examination. Socratic electronic homework tutor, CyberTutor can integrate effectively instruction and assessment. CyberTutor assessment has about 62 times less variance due to random test…
Going DEEP: guidelines for building simulation-based team assessments.
Grand, James A; Pearce, Marina; Rench, Tara A; Chao, Georgia T; Fernandez, Rosemarie; Kozlowski, Steve W J
2013-05-01
Whether for team training, research or evaluation, making effective use of simulation-based technologies requires robust, reliable and accurate assessment tools. Extant literature on simulation-based assessment practices has primarily focused on scenario and instructional design; however, relatively little direct guidance has been provided regarding the challenging decisions and fundamental principles related to assessment development and implementation. The objective of this manuscript is to introduce a generalisable assessment framework supplemented by specific guidance on how to construct and ensure valid and reliable simulation-based team assessment tools. The recommendations reflect best practices in assessment and are designed to empower healthcare educators, professionals and researchers with the knowledge to design and employ valid and reliable simulation-based team assessments. Information and actionable recommendations associated with creating assessments of team processes (non-technical 'teamwork' activities) and performance (demonstration of technical proficiency) are presented which provide direct guidance on how to Distinguish the underlying competencies one aims to assess, Elaborate the measures used to capture team member behaviours during simulation activities, Establish the content validity of these measures and Proceduralise the measurement tools in a way that is systematically aligned with the goals of the simulation activity while maintaining methodological rigour (DEEP). The DEEP framework targets fundamental principles and critical activities that are important for effective assessment, and should benefit healthcare educators, professionals and researchers seeking to design or enhance any simulation-based assessment effort.
NASA Astrophysics Data System (ADS)
Serevina, V.; Muliyati, D.
2018-05-01
This research aims to develop students’ performance assessment instrument based on scientific approach is valid and reliable in assessing the performance of students on basic physics lab of Simple Harmonic Motion (SHM). This study uses the ADDIE consisting of stages: Analyze, Design, Development, Implementation, and Evaluation. The student performance assessment developed can be used to measure students’ skills in observing, asking, conducting experiments, associating and communicate experimental results that are the ‘5M’ stages in a scientific approach. Each grain of assessment in the instrument is validated by the instrument expert and the evaluation with the result of all points of assessment shall be eligible to be used with a 100% eligibility percentage. The instrument is then tested for the quality of construction, material, and language by panel (lecturer) with the result: 85% or very good instrument construction aspect, material aspect 87.5% or very good, and language aspect 83% or very good. For small group trial obtained instrument reliability level of 0.878 or is in the high category, where r-table is 0.707. For large group trial obtained instrument reliability level of 0.889 or is in the high category, where r-table is 0.320. Instruments declared valid and reliable for 5% significance level. Based on the result of this research, it can be concluded that the student performance appraisal instrument based on the developed scientific approach is declared valid and reliable to be used in assessing student skill in SHM experimental activity.
Everett, Tobias C; Ng, Elaine; Power, Daniel; Marsh, Christopher; Tolchard, Stephen; Shadrina, Anna; Bould, Matthew D
2013-12-01
The use of simulation-based assessments for high-stakes physician examinations remains controversial. The Managing Emergencies in Paediatric Anaesthesia course uses simulation to teach evidence-based management of anesthesia crises to trainee anesthetists in the United Kingdom (UK) and Canada. In this study, we investigated the feasibility and reliability of custom-designed scenario-specific performance checklists and a global rating scale (GRS) assessing readiness for independent practice. After research ethics board approval, subjects were videoed managing simulated pediatric anesthesia crises in a single Canadian teaching hospital. Each subject was randomized to two of six different scenarios. All 60 scenarios were subsequently rated by four blinded raters (two in the UK, two in Canada) using the checklists and GRS. The actual and predicted reliability of the tools was calculated for different numbers of raters using the intraclass correlation coefficient (ICC) and the Spearman-Brown prophecy formula. Average measures ICCs ranged from 'substantial' to 'near perfect' (P ≤ 0.001). The reliability of the checklists and the GRS was similar. Single measures ICCs showed more variability than average measures ICC. At least two raters would be required to achieve acceptable reliability. We have established the reliability of a GRS to assess the management of simulated crisis scenarios in pediatric anesthesia, and this tool is feasible within the setting of a research study. The global rating scale allows raters to make a judgement regarding a participant's readiness for independent practice. These tools may be used in the future research examining simulation-based assessment. © 2013 John Wiley & Sons Ltd.
Test-retest reliability of sensor-based sit-to-stand measures in young and older adults.
Regterschot, G Ruben H; Zhang, Wei; Baldus, Heribert; Stevens, Martin; Zijlstra, Wiebren
2014-01-01
This study investigated test-retest reliability of sensor-based sit-to-stand (STS) peak power and other STS measures in young and older adults. In addition, test-retest reliability of the sensor method was compared to test-retest reliability of the Timed Up and Go Test (TUGT) and Five-Times-Sit-to-Stand Test (FTSST) in older adults. Ten healthy young female adults (20-23 years) and 31 older adults (21 females; 73-94 years) participated in two assessment sessions separated by 3-8 days. Vertical peak power was assessed during three (young adults) and five (older adults) normal and fast STS trials with a hybrid motion sensor worn on the hip. Older adults also performed the FTSST and TUGT. The average sensor-based STS peak power of the normal STS trials and the average sensor-based STS peak power of the fast STS trials showed excellent test-retest reliability in young adults (intra-class correlation (ICC)≥0.90; zero in 95% confidence interval of mean difference between test and retest (95%CI of D); standard error of measurement (SEM)≤6.7% of mean peak power) and older adults (ICC≥0.91; zero in 95%CI of D; SEM≤9.9%). Test-retest reliability of sensor-based STS peak power and TUGT (ICC=0.98; zero in 95%CI of D; SEM=8.5%) was comparable in older adults, test-retest reliability of the FTSST was lower (ICC=0.73; zero outside 95%CI of D; SEM=14.4%). Sensor-based STS peak power demonstrated excellent test-retest reliability and may therefore be useful for clinical assessment of functional status and fall risk. Copyright © 2014 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Barbu, Otilia C.; Levine-Donnerstein, Deborah; Marx, Ronald W.; Yaden, David B., Jr.
2013-01-01
This study examined reliability and validity of the Devereux Early Childhood Assessment (DECA), based on samples of parents and teachers' ratings of 1,145 entering kindergartners in the Southwest. Confirmatory factor analysis showed that DECA presented good reliability and validity for manifest variables, corroborating previous findings. Three…
An Evaluation of Test Speededness in an Assessment for Third-Grade Gifted Students
ERIC Educational Resources Information Center
Hailey, Emily; Callahan, Carolyn M.; Azano, Amy; Moon, Tonya R.
2012-01-01
Reliability and validity are integral concepts in assessment design. Test speededness, the influence of time constraints on test taker performance, is often an overlooked threat to reliability and validity, especially in classroom-based testing. The purpose of this study is to evaluate the degree of test speededness of classroom-based assessments…
Hulteen, Ryan M; Lander, Natalie J; Morgan, Philip J; Barnett, Lisa M; Robertson, Samuel J; Lubans, David R
2015-10-01
It has been suggested that young people should develop competence in a variety of 'lifelong physical activities' to ensure that they can be active across the lifespan. The primary aim of this systematic review is to report the methodological properties, validity, reliability, and test duration of field-based measures that assess movement skill competency in lifelong physical activities. A secondary aim was to clearly define those characteristics unique to lifelong physical activities. A search of four electronic databases (Scopus, SPORTDiscus, ProQuest, and PubMed) was conducted between June 2014 and April 2015 with no date restrictions. Studies addressing the validity and/or reliability of lifelong physical activity tests were reviewed. Included articles were required to assess lifelong physical activities using process-oriented measures, as well as report either one type of validity or reliability. Assessment criteria for methodological quality were adapted from a checklist used in a previous review of sport skill outcome assessments. Movement skill assessments for eight different lifelong physical activities (badminton, cycling, dance, golf, racquetball, resistance training, swimming, and tennis) in 17 studies were identified for inclusion. Methodological quality, validity, reliability, and test duration (time to assess a single participant), for each article were assessed. Moderate to excellent reliability results were found in 16 of 17 studies, with 71% reporting inter-rater reliability and 41% reporting intra-rater reliability. Only four studies in this review reported test-retest reliability. Ten studies reported validity results; content validity was cited in 41% of these studies. Construct validity was reported in 24% of studies, while criterion validity was only reported in 12% of studies. Numerous assessments for lifelong physical activities may exist, yet only assessments for eight lifelong physical activities were included in this review. Generalizability of results may be more applicable if more heterogeneous samples are used in future research. Moderate to excellent levels of inter- and intra-rater reliability were reported in the majority of studies. However, future work should look to establish test-retest reliability. Validity was less commonly reported than reliability, and further types of validity other than content validity need to be established in future research. Specifically, predictive validity of 'lifelong physical activity' movement skill competency is needed to support the assertion that such activities provide the foundation for a lifetime of activity.
ERIC Educational Resources Information Center
Lee, Guemin; Park, In-Yong
2012-01-01
Previous assessments of the reliability of test scores for testlet-composed tests have indicated that item-based estimation methods overestimate reliability. This study was designed to address issues related to the extent to which item-based estimation methods overestimate the reliability of test scores composed of testlets and to compare several…
Konge, L; Vilmann, P; Clementsen, P; Annema, J T; Ringsted, C
2012-10-01
Fine-needle aspiration (FNA) guided by endoscopic ultrasonography (EUS) is important in mediastinal staging of non-small cell lung cancer (NSCLC). Training standards and implementation strategies of this technique are currently under discussion. The aim of this study was to explore the reliability and validity of a newly developed EUS Assessment Tool (EUSAT) designed to measure competence in EUS - FNA for mediastinal staging of NSCLC. A total of 30 patients with proven or suspected NSCLC underwent EUS - FNA for mediastinal staging by three trainees and three experienced physicians. Their performances were assessed prospectively by three experts in EUS under direct observation and again 2 months later in a blinded fashion using digital video-recordings. Based on the assessments, intra-rater reliability, inter-rater reliability, and construct validity were explored. The intra-rater reliability was good (Cronbach's α = 0.80), but comparison of results based on direct observations and blinded video-recordings indicated a significant bias favoring consultants (P = 0.022). Inter-rater reliability was very good (Cronbach's α = 0.93). However, one rater assessing five procedures or two raters each assessing four procedures were necessary to secure a generalizability coefficient of 0.80. The assessment tool demonstrated construct validity by discriminating between trainees and experienced physicians (P = 0.034). Competency in mediastinal staging of NSCLC using EUS and EUS - FNA can be assessed in a reliable and valid way using the EUSAT assessment tool. Measuring and defining competency and training requirements could improve EUS quality and benefit patient care. © Georg Thieme Verlag KG Stuttgart · New York.
Reliability Generalization of the Alcohol Use Disorder Identification Test.
ERIC Educational Resources Information Center
Shields, Alan L.; Caruso, John C.
2002-01-01
Evaluated the reliability of scores from the Alcohol Use Disorders Identification Test (AUDIT; J. Sounders and others, 1993) in a reliability generalization study based on 17 empirical journal articles. Results show AUDIT scores to be generally reliable for basic assessment. (SLD)
Fagbeja, Mofoluso A; Hill, Jennifer L; Chatterton, Tim J; Longhurst, James W S
2015-02-01
An assessment of the reliability of the Scanning Imaging Absorption Spectrometer for Atmospheric Cartography (SCIAMACHY) satellite sensor measurements to interpolate tropospheric concentrations of carbon monoxide considering the low-latitude climate of the Niger Delta region in Nigeria was conducted. Monthly SCIAMACHY carbon monoxide (CO) column measurements from January 2,003 to December 2005 were interpolated using ordinary kriging technique. The spatio-temporal variations observed in the reliability were based on proximity to the Atlantic Ocean, seasonal variations in the intensities of rainfall and relative humidity, the presence of dust particles from the Sahara desert, industrialization in Southwest Nigeria and biomass burning during the dry season in Northern Nigeria. Spatial reliabilities of 74 and 42 % are observed for the inland and coastal areas, respectively. Temporally, average reliability of 61 and 55 % occur during the dry and wet seasons, respectively. Reliability in the inland and coastal areas was 72 and 38 % during the wet season, and 75 and 46 % during the dry season, respectively. Based on the results, the WFM-DOAS SCIAMACHY CO data product used for this study is therefore relevant in the assessment of CO concentrations in developing countries within the low latitudes that could not afford monitoring infrastructure due to the required high costs. Although the SCIAMACHY sensor is no longer available, it provided cost-effective, reliable and accessible data that could support air quality assessment in developing countries.
Palmer, Kara K.
2017-01-01
Assessing children’s perceptions of their movement abilities (i.e., perceived competence) is traditionally done using picture scales—Pictorial Scale of Perceived Competence and Acceptance for Young Children or Pictorial Scale of Perceived Movement Skill Competence. Pictures fail to capture the temporal components of movement. To address this limitation, we created a digital-based instrument to assess perceived motor competence: the Digital Scale of Perceived Motor Competence. The purpose of this study was to determine the validity, reliability, and internal consistency of the Digital-based Scale of Perceived Motor Skill Competence. The Digital-based Scale of Perceived Motor Skill Competence is based on the twelve fundamental motor skills from the Test of Gross Motor Development-2nd Edition with a similar layout and item structure as the Pictorial Scale of Perceived Movement Skill Competence. Face Validity of the instrument was examined in Phase I (n = 56; Mage = 8.6 ± 0.7 years, 26 girls). Test-retest reliability and internal consistency were assessed in Phase II (n = 54, Mage = 8.7 years ± 0.5 years, 26 girls). Intra-class correlations (ICC) and Cronbach’s alpha were conducted to determine test-retest reliability and internal consistency for all twelve skills along with locomotor and object control subscales. The Digital Scale of Perceived Motor Competence demonstrates excellent test-retest reliability (ICC = 0.83, total; ICC = 0.77, locomotor; ICC = 0.79, object control) and acceptable/good internal consistency (α = 0.62, total; α = 0.57, locomotor; α = 0.49, object control). Findings provide evidence of the reliability of the three level digital-based instrument of perceived motor competence for older children. PMID:29910408
Validation of a method for assessing resident physicians' quality improvement proposals.
Leenstra, James L; Beckman, Thomas J; Reed, Darcy A; Mundell, William C; Thomas, Kris G; Krajicek, Bryan J; Cha, Stephen S; Kolars, Joseph C; McDonald, Furman S
2007-09-01
Residency programs involve trainees in quality improvement (QI) projects to evaluate competency in systems-based practice and practice-based learning and improvement. Valid approaches to assess QI proposals are lacking. We developed an instrument for assessing resident QI proposals--the Quality Improvement Proposal Assessment Tool (QIPAT-7)-and determined its validity and reliability. QIPAT-7 content was initially obtained from a national panel of QI experts. Through an iterative process, the instrument was refined, pilot-tested, and revised. Seven raters used the instrument to assess 45 resident QI proposals. Principal factor analysis was used to explore the dimensionality of instrument scores. Cronbach's alpha and intraclass correlations were calculated to determine internal consistency and interrater reliability, respectively. QIPAT-7 items comprised a single factor (eigenvalue = 3.4) suggesting a single assessment dimension. Interrater reliability for each item (range 0.79 to 0.93) and internal consistency reliability among the items (Cronbach's alpha = 0.87) were high. This method for assessing resident physician QI proposals is supported by content and internal structure validity evidence. QIPAT-7 is a useful tool for assessing resident QI proposals. Future research should determine the reliability of QIPAT-7 scores in other residency and fellowship training programs. Correlations should also be made between assessment scores and criteria for QI proposal success such as implementation of QI proposals, resident scholarly productivity, and improved patient outcomes.
Brouillette, Robert M; Foil, Heather; Fontenot, Stephanie; Correro, Anthony; Allen, Ray; Martin, Corby K; Bruce-Keller, Annadora J; Keller, Jeffrey N
2013-01-01
While considerable knowledge has been gained through the use of established cognitive and motor assessment tools, there is a considerable interest and need for the development of a battery of reliable and validated assessment tools that provide real-time and remote analysis of cognitive and motor function in the elderly. Smartphones appear to be an obvious choice for the development of these "next-generation" assessment tools for geriatric research, although to date no studies have reported on the use of smartphone-based applications for the study of cognition in the elderly. The primary focus of the current study was to assess the feasibility, reliability, and validity of a smartphone-based application for the assessment of cognitive function in the elderly. A total of 57 non-demented elderly individuals were administered a newly developed smartphone application-based Color-Shape Test (CST) in order to determine its utility in measuring cognitive processing speed in the elderly. Validity of this novel cognitive task was assessed by correlating performance on the CST with scores on widely accepted assessments of cognitive function. Scores on the CST were significantly correlated with global cognition (Mini-Mental State Exam: r = 0.515, p<0.0001) and multiple measures of processing speed and attention (Digit Span: r = 0.427, p<0.0001; Trail Making Test: r = -0.651, p<0.00001; Digit Symbol Test: r = 0.508, p<0.0001). The CST was not correlated with naming and verbal fluency tasks (Boston Naming Test, Vegetable/Animal Naming) or memory tasks (Logical Memory Test). Test re-test reliability was observed to be significant (r = 0.726; p = 0.02). Together, these data are the first to demonstrate the feasibility, reliability, and validity of using a smartphone-based application for the purpose of assessing cognitive function in the elderly. The importance of these findings for the establishment of smartphone-based assessment batteries of cognitive and motor function in the elderly is discussed.
ERIC Educational Resources Information Center
McGill, D. A.; van der Vleuten, C. P. M.; Clarke, M. J.
2011-01-01
Even though rater-based judgements of clinical competence are widely used, they are context sensitive and vary between individuals and institutions. To deal adequately with rater-judgement unreliability, evaluating the reliability of workplace rater-based assessments in the local context is essential. Using such an approach, the primary intention…
The Reliability and Sources of Error of Using Rubrics-Based Assessment for Student Projects
ERIC Educational Resources Information Center
Menéndez-Varela, José-Luis; Gregori-Giralt, Eva
2018-01-01
Rubrics are widely used in higher education to assess performance in project-based learning environments. To date, the sources of error that may affect their reliability have not been studied in depth. Using generalisability theory as its starting-point, this article analyses the influence of the assessors and the criteria of the rubrics on the…
Chen, J D; Sun, H L
1999-04-01
Objective. To assess and predict reliability of an equipment dynamically by making full use of various test informations in the development of products. Method. A new reliability growth assessment method based on army material system analysis activity (AMSAA) model was developed. The method is composed of the AMSAA model and test data conversion technology. Result. The assessment and prediction results of a space-borne equipment conform to its expectations. Conclusion. It is suggested that this method should be further researched and popularized.
Reliability and risk assessment of structures
NASA Technical Reports Server (NTRS)
Chamis, C. C.
1991-01-01
Development of reliability and risk assessment of structural components and structures is a major activity at Lewis Research Center. It consists of five program elements: (1) probabilistic loads; (2) probabilistic finite element analysis; (3) probabilistic material behavior; (4) assessment of reliability and risk; and (5) probabilistic structural performance evaluation. Recent progress includes: (1) the evaluation of the various uncertainties in terms of cumulative distribution functions for various structural response variables based on known or assumed uncertainties in primitive structural variables; (2) evaluation of the failure probability; (3) reliability and risk-cost assessment; and (4) an outline of an emerging approach for eventual certification of man-rated structures by computational methods. Collectively, the results demonstrate that the structural durability/reliability of man-rated structural components and structures can be effectively evaluated by using formal probabilistic methods.
ERIC Educational Resources Information Center
Williams, Harriet G.; Pfeiffer, Karin A.; Dowda, Marsha; Jeter, Chevy; Jones, Shaverra; Pate, Russell R.
2009-01-01
The purpose of this study was to develop a valid and reliable tool for use in assessing motor skills in preschool children in field-based settings. The development of the Children's Activity and Movement in Preschool Study Motor Skills Protocol included evidence of its reliability and validity for use in field-based environments as part of large…
Using generalizability theory to develop clinical assessment protocols.
Preuss, Richard A
2013-04-01
Clinical assessment protocols must produce data that are reliable, with a clinically attainable minimal detectable change (MDC). In a reliability study, generalizability theory has 2 advantages over classical test theory. These advantages provide information that allows assessment protocols to be adjusted to match individual patient profiles. First, generalizability theory allows the user to simultaneously consider multiple sources of measurement error variance (facets). Second, it allows the user to generalize the findings of the main study across the different study facets and to recalculate the reliability and MDC based on different combinations of facet conditions. In doing so, clinical assessment protocols can be chosen based on minimizing the number of measures that must be taken to achieve a realistic MDC, using repeated measures to minimize the MDC, or simply based on the combination that best allows the clinician to monitor an individual patient's progress over a specified period of time.
NASA Astrophysics Data System (ADS)
Wallace, Jon Michael
2003-10-01
Reliability prediction of components operating in complex systems has historically been conducted in a statistically isolated manner. Current physics-based, i.e. mechanistic, component reliability approaches focus more on component-specific attributes and mathematical algorithms and not enough on the influence of the system. The result is that significant error can be introduced into the component reliability assessment process. The objective of this study is the development of a framework that infuses the needs and influence of the system into the process of conducting mechanistic-based component reliability assessments. The formulated framework consists of six primary steps. The first three steps, identification, decomposition, and synthesis, are primarily qualitative in nature and employ system reliability and safety engineering principles to construct an appropriate starting point for the component reliability assessment. The following two steps are the most unique. They involve a step to efficiently characterize and quantify the system-driven local parameter space and a subsequent step using this information to guide the reduction of the component parameter space. The local statistical space quantification step is accomplished using two proposed multivariate probability models: Multi-Response First Order Second Moment and Taylor-Based Inverse Transformation. Where existing joint probability models require preliminary distribution and correlation information of the responses, these models combine statistical information of the input parameters with an efficient sampling of the response analyses to produce the multi-response joint probability distribution. Parameter space reduction is accomplished using Approximate Canonical Correlation Analysis (ACCA) employed as a multi-response screening technique. The novelty of this approach is that each individual local parameter and even subsets of parameters representing entire contributing analyses can now be rank ordered with respect to their contribution to not just one response, but the entire vector of component responses simultaneously. The final step of the framework is the actual probabilistic assessment of the component. Although the same multivariate probability tools employed in the characterization step can be used for the component probability assessment, variations of this final step are given to allow for the utilization of existing probabilistic methods such as response surface Monte Carlo and Fast Probability Integration. The overall framework developed in this study is implemented to assess the finite-element based reliability prediction of a gas turbine airfoil involving several failure responses. Results of this implementation are compared to results generated using the conventional 'isolated' approach as well as a validation approach conducted through large sample Monte Carlo simulations. The framework resulted in a considerable improvement to the accuracy of the part reliability assessment and an improved understanding of the component failure behavior. Considerable statistical complexity in the form of joint non-normal behavior was found and accounted for using the framework. Future applications of the framework elements are discussed.
Reliability Evaluation of Machine Center Components Based on Cascading Failure Analysis
NASA Astrophysics Data System (ADS)
Zhang, Ying-Zhi; Liu, Jin-Tong; Shen, Gui-Xiang; Long, Zhe; Sun, Shu-Guang
2017-07-01
In order to rectify the problems that the component reliability model exhibits deviation, and the evaluation result is low due to the overlook of failure propagation in traditional reliability evaluation of machine center components, a new reliability evaluation method based on cascading failure analysis and the failure influenced degree assessment is proposed. A direct graph model of cascading failure among components is established according to cascading failure mechanism analysis and graph theory. The failure influenced degrees of the system components are assessed by the adjacency matrix and its transposition, combined with the Pagerank algorithm. Based on the comprehensive failure probability function and total probability formula, the inherent failure probability function is determined to realize the reliability evaluation of the system components. Finally, the method is applied to a machine center, it shows the following: 1) The reliability evaluation values of the proposed method are at least 2.5% higher than those of the traditional method; 2) The difference between the comprehensive and inherent reliability of the system component presents a positive correlation with the failure influenced degree of the system component, which provides a theoretical basis for reliability allocation of machine center system.
Leddy, Abigail L; Crowner, Beth E; Earhart, Gammon M
2011-01-01
Gait impairments, balance impairments, and falls are prevalent in individuals with Parkinson disease (PD). Although the Berg Balance Scale (BBS) can be considered the reference standard for the determination of fall risk, it has a noted ceiling effect. Development of ceiling-free measures that can assess balance and are good at discriminating "fallers" from "nonfallers" is needed. The purpose of this study was to compare the Functional Gait Assessment (FGA) and the Balance Evaluation Systems Test (BESTest) with the BBS among individuals with PD and evaluate the tests' reliability, validity, and discriminatory sensitivity and specificity for fallers versus nonfallers. This was an observational study of community-dwelling individuals with idiopathic PD. The BBS, FGA, and BESTest were administered to 80 individuals with PD. Interrater reliability (n=15) was assessed by 3 raters. Test-retest reliability was based on 2 tests of participants (n=24), 2 weeks apart. Intraclass correlation coefficients (2,1) were used to calculate reliability, and Spearman correlation coefficients were used to assess validity. Cutoff points, sensitivity, and specificity were based on receiver operating characteristic plots. Test-retest reliability was .80 for the BBS, .91 for the FGA, and .88 for the BESTest. Interrater reliability was greater than .93 for all 3 tests. The FGA and BESTest were correlated with the BBS (r=.78 and r=.87, respectively). Cutoff scores to identify fallers were 47/56 for the BBS, 15/30 for the FGA, and 69% for the BESTest. The overall accuracy (area under the curve) for the BBS, FGA, and BESTest was .79, .80, and .85, respectively. Fall reports were retrospective. Both the FGA and the BESTest have reliability and validity for assessing balance in individuals with PD. The BESTest is most sensitive for identifying fallers.
Reliable and valid assessment of Lichtenstein hernia repair skills.
Carlsen, C G; Lindorff-Larsen, K; Funch-Jensen, P; Lund, L; Charles, P; Konge, L
2014-08-01
Lichtenstein hernia repair is a common surgical procedure and one of the first procedures performed by a surgical trainee. However, formal assessment tools developed for this procedure are few and sparsely validated. The aim of this study was to determine the reliability and validity of an assessment tool designed to measure surgical skills in Lichtenstein hernia repair. Key issues were identified through a focus group interview. On this basis, an assessment tool with eight items was designed. Ten surgeons and surgical trainees were video recorded while performing Lichtenstein hernia repair, (four experts, three intermediates, and three novices). The videos were blindly and individually assessed by three raters (surgical consultants) using the assessment tool. Based on these assessments, validity and reliability were explored. The internal consistency of the items was high (Cronbach's alpha = 0.97). The inter-rater reliability was very good with an intra-class correlation coefficient (ICC) = 0.93. Generalizability analysis showed a coefficient above 0.8 even with one rater. The coefficient improved to 0.92 if three raters were used. One-way analysis of variance found a significant difference between the three groups which indicates construct validity, p < 0.001. Lichtenstein hernia repair skills can be assessed blindly by a single rater in a reliable and valid fashion with the new procedure-specific assessment tool. We recommend this tool for future assessment of trainees performing Lichtenstein hernia repair to ensure that the objectives of competency-based surgical training are met.
Kuehnapfel, Andreas; Ahnert, Peter; Loeffler, Markus; Scholz, Markus
2017-02-01
Body surface area is a physiological quantity relevant for many medical applications. In clinical practice, it is determined by empirical formulae. 3D laser-based anthropometry provides an easy and effective way to measure body surface area but is not ubiquitously available. We used data from laser-based anthropometry from a population-based study to assess validity of published and commonly used empirical formulae. We performed a large population-based study on adults collecting classical anthropometric measurements and 3D body surface assessments (N = 1435). We determined reliability of the 3D body surface assessment and validity of 18 different empirical formulae proposed in the literature. The performance of these formulae is studied in subsets of sex and BMI. Finally, improvements of parameter settings of formulae and adjustments for sex and BMI were considered. 3D body surface measurements show excellent intra- and inter-rater reliability of 0.998 (overall concordance correlation coefficient, OCCC was used as measure of agreement). Empirical formulae of Fujimoto and Watanabe, Shuter and Aslani and Sendroy and Cecchini performed best with excellent concordance with OCCC > 0.949 even in subgroups of sex and BMI. Re-parametrization of formulae and adjustment for sex and BMI slightly improved results. In adults, 3D laser-based body surface assessment is a reliable alternative to estimation by empirical formulae. However, there are empirical formulae showing excellent results even in subgroups of sex and BMI with only little room for improvement.
Cramer, Emily
2016-01-01
Abstract Hospital performance reports often include rankings of unit pressure ulcer rates. Differentiating among units on the basis of quality requires reliable measurement. Our objectives were to describe and apply methods for assessing reliability of hospital‐acquired pressure ulcer rates and evaluate a standard signal‐noise reliability measure as an indicator of precision of differentiation among units. Quarterly pressure ulcer data from 8,199 critical care, step‐down, medical, surgical, and medical‐surgical nursing units from 1,299 US hospitals were analyzed. Using beta‐binomial models, we estimated between‐unit variability (signal) and within‐unit variability (noise) in annual unit pressure ulcer rates. Signal‐noise reliability was computed as the ratio of between‐unit variability to the total of between‐ and within‐unit variability. To assess precision of differentiation among units based on ranked pressure ulcer rates, we simulated data to estimate the probabilities of a unit's observed pressure ulcer rate rank in a given sample falling within five and ten percentiles of its true rank, and the probabilities of units with ulcer rates in the highest quartile and highest decile being identified as such. We assessed the signal‐noise measure as an indicator of differentiation precision by computing its correlations with these probabilities. Pressure ulcer rates based on a single year of quarterly or weekly prevalence surveys were too susceptible to noise to allow for precise differentiation among units, and signal‐noise reliability was a poor indicator of precision of differentiation. To ensure precise differentiation on the basis of true differences, alternative methods of assessing reliability should be applied to measures purported to differentiate among providers or units based on quality. © 2016 The Authors. Research in Nursing & Health published by Wiley Periodicals, Inc. PMID:27223598
Palm, Peter; Josephson, Malin; Mathiassen, Svend Erik; Kjellberg, Katarina
2016-06-01
We evaluated the intra- and inter-observer reliability and criterion validity of an observation protocol, developed in an iterative process involving practicing ergonomists, for assessment of working technique during cash register work for the purpose of preventing upper extremity symptoms. Two ergonomists independently assessed 17 15-min videos of cash register work on two occasions each, as a basis for examining reliability. Criterion validity was assessed by comparing these assessments with meticulous video-based analyses by researchers. Intra-observer reliability was acceptable (i.e. proportional agreement >0.7 and kappa >0.4) for 10/10 questions. Inter-observer reliability was acceptable for only 3/10 questions. An acceptable inter-observer reliability combined with an acceptable criterion validity was obtained only for one working technique aspect, 'Quality of movements'. Thus, major elements of the cashiers' working technique could not be assessed with an acceptable accuracy from short periods of observations by one observer, such as often desired by practitioners. Practitioner Summary: We examined an observation protocol for assessing working technique in cash register work. It was feasible in use, but inter-observer reliability and criterion validity were generally not acceptable when working technique aspects were assessed from short periods of work. We recommend the protocol to be used for educational purposes only.
Validity and reliability assessment of a peer evaluation method in team-based learning classes.
Yoon, Hyun Bae; Park, Wan Beom; Myung, Sun-Jung; Moon, Sang Hui; Park, Jun-Bean
2018-03-01
Team-based learning (TBL) is increasingly employed in medical education because of its potential to promote active group learning. In TBL, learners are usually asked to assess the contributions of peers within their group to ensure accountability. The purpose of this study is to assess the validity and reliability of a peer evaluation instrument that was used in TBL classes in a single medical school. A total of 141 students were divided into 18 groups in 11 TBL classes. The students were asked to evaluate their peers in the group based on evaluation criteria that were provided to them. We analyzed the comments that were written for the highest and lowest achievers to assess the validity of the peer evaluation instrument. The reliability of the instrument was assessed by examining the agreement among peer ratings within each group of students via intraclass correlation coefficient (ICC) analysis. Most of the students provided reasonable and understandable comments for the high and low achievers within their group, and most of those comments were compatible with the evaluation criteria. The average ICC of each group ranged from 0.390 to 0.863, and the overall average was 0.659. There was no significant difference in inter-rater reliability according to the number of members in the group or the timing of the evaluation within the course. The peer evaluation instrument that was used in the TBL classes was valid and reliable. Providing evaluation criteria and rules seemed to improve the validity and reliability of the instrument.
Silsupadol, Patima; Teja, Kunlanan; Lugade, Vipul
2017-10-01
The assessment of spatiotemporal gait parameters is a useful clinical indicator of health status. Unfortunately, most assessment tools require controlled laboratory environments which can be expensive and time consuming. As smartphones with embedded sensors are becoming ubiquitous, this technology can provide a cost-effective, easily deployable method for assessing gait. Therefore, the purpose of this study was to assess the reliability and validity of a smartphone-based accelerometer in quantifying spatiotemporal gait parameters when attached to the body or in a bag, belt, hand, and pocket. Thirty-four healthy adults were asked to walk at self-selected comfortable, slow, and fast speeds over a 10-m walkway while carrying a smartphone. Step length, step time, gait velocity, and cadence were computed from smartphone-based accelerometers and validated with GAITRite. Across all walking speeds, smartphone data had excellent reliability (ICC 2,1 ≥0.90) for the body and belt locations, with bag, hand, and pocket locations having good to excellent reliability (ICC 2,1 ≥0.69). Correlations between the smartphone-based and GAITRite-based systems were very high for the body (r=0.89, 0.98, 0.96, and 0.87 for step length, step time, gait velocity, and cadence, respectively). Similarly, Bland-Altman analysis demonstrated that the bias approached zero, particularly in the body, bag, and belt conditions under comfortable and fast speeds. Thus, smartphone-based assessments of gait are most valid when placed on the body, in a bag, or on a belt. The use of a smartphone to assess gait can provide relevant data to clinicians without encumbering the user and allow for data collection in the free-living environment. Copyright © 2017 Elsevier B.V. All rights reserved.
The Reliability of Encounter Cards to Assess the CanMEDs Roles
ERIC Educational Resources Information Center
Sherbino, Jonathan; Kulasegaram, Kulamakan; Worster, Andrew; Norman, Geoffrey R.
2013-01-01
The purpose of this study was to determine the reliability of a computer-based encounter card (EC) to assess medical students during an emergency medicine rotation. From April 2011 to March 2012, multiple physicians assessed an entire medical school class during their emergency medicine rotation using the CanMEDS framework. At the end of an…
Investigating the Validity and Reliability of the Vanderbilt Assessment of Leadership in Education
ERIC Educational Resources Information Center
Porter, Andrew C.; Polikoff, Morgan S.; Goldring, Ellen B.; Murphy, Joseph; Elliott, Stephen N.; May, Henry
2010-01-01
The Vanderbilt Assessment of Leadership in Education (VAL-ED) is a multirater assessment of principals' learning-centered leadership. The instrument was developed based on the Standards for Educational and Psychological Testing. In this article, we report on the validity and reliability evidence for the VAL-ED accumulated in a national field…
Can a Two-Question Test Be Reliable and Valid for Predicting Academic Outcomes?
ERIC Educational Resources Information Center
Bridgeman, Brent
2016-01-01
Scores on essay-based assessments that are part of standardized admissions tests are typically given relatively little weight in admissions decisions compared to the weight given to scores from multiple-choice assessments. Evidence is presented to suggest that more weight should be given to these assessments. The reliability of the writing scores…
Reliability of human-supervised formant-trajectory measurement for forensic voice comparison.
Zhang, Cuiling; Morrison, Geoffrey Stewart; Ochoa, Felipe; Enzinger, Ewald
2013-01-01
Acoustic-phonetic approaches to forensic voice comparison often include human-supervised measurement of vowel formants, but the reliability of such measurements is a matter of concern. This study assesses the within- and between-supervisor variability of three sets of formant-trajectory measurements made by each of four human supervisors. It also assesses the validity and reliability of forensic-voice-comparison systems based on these measurements. Each supervisor's formant-trajectory system was fused with a baseline mel-frequency cepstral-coefficient system, and performance was assessed relative to the baseline system. Substantial improvements in validity were found for all supervisors' systems, but some supervisors' systems were more reliable than others.
Web-Based Assessment of Visual and Visuospatial Symptoms in Parkinson's Disease
Amick, Melissa M.; Miller, Ivy N.; Neargarder, Sandy; Cronin-Golomb, Alice
2012-01-01
Visual and visuospatial dysfunction is prevalent in Parkinson's disease (PD). To promote assessment of these often overlooked symptoms, we adapted the PD Vision Questionnaire for Internet administration. The questionnaire evaluates visual and visuospatial symptoms, impairments in activities of daily living (ADLs), and motor symptoms. PD participants of mild to moderate motor severity (n = 24) and healthy control participants (HC, n = 23) completed the questionnaire in paper and web-based formats. Reliability was assessed by comparing responses across formats. Construct validity was evaluated by reference to performance on measures of vision, visuospatial cognition, ADLs, and motor symptoms. The web-based format showed excellent reliability with respect to the paper format for both groups (all P′s < 0.001; HC completing the visual and visuospatial section only). Demonstrating the construct validity of the web-based questionnaire, self-rated ADL and visual and visuospatial functioning were significantly associated with performance on objective measures of these abilities (all P′s < 0.01). The findings indicate that web-based administration may be a reliable and valid method of assessing visual and visuospatial and ADL functioning in PD. PMID:22530162
Eliasson, Kristina; Palm, Peter; Nyman, Teresia; Forsman, Mikael
2017-07-01
A common way to conduct practical risk assessments is to observe a job and report the observed long term risks for musculoskeletal disorders. The aim of this study was to evaluate the inter- and intra-observer reliability of ergonomists' risk assessments without the support of an explicit risk assessment method. Twenty-one experienced ergonomists assessed the risk level (low, moderate, high risk) of eight upper body regions, as well as the global risk of 10 video recorded work tasks. Intra-observer reliability was assessed by having nine of the ergonomists repeat the procedure at least three weeks after the first assessment. The ergonomists made their risk assessment based on his/her experience and knowledge. The statistical parameters of reliability included agreement in %, kappa, linearly weighted kappa, intraclass correlation and Kendall's coefficient of concordance. The average inter-observer agreement of the global risk was 53% and the corresponding weighted kappa (K w ) was 0.32, indicating fair reliability. The intra-observer agreement was 61% and 0.41 (K w ). This study indicates that risk assessments of the upper body, without the use of an explicit observational method, have non-acceptable reliability. It is therefore recommended to use systematic risk assessment methods to a higher degree. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
The 20 GHz solid state transmitter design, impatt diode development and reliability assessment
NASA Technical Reports Server (NTRS)
Picone, S.; Cho, Y.; Asmus, J. R.
1984-01-01
A single drift gallium arsenide (GaAs) Schottky barrier IMPATT diode and related components were developed. The IMPATT diode reliability was assessed. A proof of concept solid state transmitter design and a technology assessment study were performed. The transmitter design utilizes technology which, upon implementation, will demonstrate readiness for development of a POC model within the 1982 time frame and will provide an information base for flight hardware capable of deployment in a 1985 to 1990 demonstrational 30/20 GHz satellite communication system. Life test data for Schottky barrier GaAs diodes and grown junction GaAs diodes are described. The results demonstrate the viability of GaAs IMPATTs as high performance, reliable RF power sources which, based on the recommendation made herein, will surpass device reliability requirements consistent with a ten year spaceborne solid state power amplifier mission.
Evaluation of high fidelity patient simulator in assessment of performance of anaesthetists.
Weller, J M; Bloch, M; Young, S; Maze, M; Oyesola, S; Wyner, J; Dob, D; Haire, K; Durbridge, J; Walker, T; Newble, D
2003-01-01
There is increasing emphasis on performance-based assessment of clinical competence. The High Fidelity Patient Simulator (HPS) may be useful for assessment of clinical practice in anaesthesia, but needs formal evaluation of validity, reliability, feasibility and effect on learning. We set out to assess the reliability of a global rating scale for scoring simulator performance in crisis management. Using a global rating scale, three judges independently rated videotapes of anaesthetists in simulated crises in the operating theatre. Five anaesthetists then independently rated subsets of these videotapes. There was good agreement between raters for medical management, behavioural attributes and overall performance. Agreement was high for both the initial judges and the five additional raters. Using a global scale to assess simulator performance, we found good inter-rater reliability for scoring performance in a crisis. We estimate that two judges should provide a reliable assessment. High fidelity simulation should be studied further for assessing clinical performance.
NDE detectability of fatigue type cracks in high strength alloys
NASA Technical Reports Server (NTRS)
Christner, B. K.; Rummel, W. D.
1983-01-01
Specimens suitable for investigating the reliability of production nondestructive evaluation (NDE) to detect tightly closed fatigue cracks in high strength alloys representative of those materials used in spacecraft engine/booster construction were produced. Inconel 718 was selected as representative of nickel base alloys and Haynes 188 was selected as representative of cobalt base alloys used in this application. Cleaning procedures were developed to insure the reusability of the test specimens and a flaw detection reliability assessment of the fluorescent penetrant inspection method was performed using the test specimens produced to characterize their use for future reliability assessments and to provide additional NDE flaw detection reliability data for high strength alloys. The statistical analysis of the fluorescent penetrant inspection data was performed to determine the detection reliabilities for each inspection at a 90% probability/95% confidence level.
Baker, Elizabeth A; Ledford, Cynthia H; Fogg, Louis; Way, David P; Park, Yoon Soo
2015-01-01
Construct: Clinical skills are used in the care of patients, including reporting, diagnostic reasoning, and decision-making skills. Written comprehensive new patient admission notes (H&Ps) are a ubiquitous part of student education but are underutilized in the assessment of clinical skills. The interpretive summary, differential diagnosis, explanation of reasoning, and alternatives (IDEA) assessment tool was developed to assess students' clinical skills using written comprehensive new patient admission notes. The validity evidence for assessment of clinical skills using clinical documentation following authentic patient encounters has not been well documented. Diagnostic justification tools and postencounter notes are described in the literature (1,2) but are based on standardized patient encounters. To our knowledge, the IDEA assessment tool is the first published tool that uses medical students' H&Ps to rate students' clinical skills. The IDEA assessment tool is a 15-item instrument that asks evaluators to rate students' reporting, diagnostic reasoning, and decision-making skills based on medical students' new patient admission notes. This study presents validity evidence in support of the IDEA assessment tool using Messick's unified framework, including content (theoretical framework), response process (interrater reliability), internal structure (factor analysis and internal-consistency reliability), and relationship to other variables. Validity evidence is based on results from four studies conducted between 2010 and 2013. First, the factor analysis (2010, n = 216) yielded a three-factor solution, measuring patient story, IDEA, and completeness, with reliabilities of .79, .88, and .79, respectively. Second, an initial interrater reliability study (2010) involving two raters demonstrated fair to moderate consensus (κ = .21-.56, ρ =.42-.79). Third, a second interrater reliability study (2011) with 22 trained raters also demonstrated fair to moderate agreement (intraclass correlations [ICCs] = .29-.67). There was moderate reliability for all three skill domains, including reporting skills (ICC = .53), diagnostic reasoning skills (ICC = .64), and decision-making skills (ICC = .63). Fourth, there was a significant correlation between IDEA rating scores (2010-2013) and final Internal Medicine clerkship grades (r = .24), 95% confidence interval (CI) [.15, .33]. The IDEA assessment tool is a novel tool with validity evidence to support its use in the assessment of students' reporting, diagnostic reasoning, and decision-making skills. The moderate reliability achieved supports formative or lower stakes summative uses rather than high-stakes summative judgments.
Issues in developing valid assessments of speech pathology students' performance in the workplace.
McAllister, Sue; Lincoln, Michelle; Ferguson, Alison; McAllister, Lindy
2010-01-01
Workplace-based learning is a critical component of professional preparation in speech pathology. A validated assessment of this learning is seen to be 'the gold standard', but it is difficult to develop because of design and validation issues. These issues include the role and nature of judgement in assessment, challenges in measuring quality, and the relationship between assessment and learning. Valid assessment of workplace-based performance needs to capture the development of competence over time and account for both occupation specific and generic competencies. This paper reviews important conceptual issues in the design of valid and reliable workplace-based assessments of competence including assessment content, process, impact on learning, measurement issues, and validation strategies. It then goes on to share what has been learned about quality assessment and validation of a workplace-based performance assessment using competency-based ratings. The outcomes of a four-year national development and validation of an assessment tool are described. A literature review of issues in conceptualizing, designing, and validating workplace-based assessments was conducted. Key factors to consider in the design of a new tool were identified and built into the cycle of design, trialling, and data analysis in the validation stages of the development process. This paper provides an accessible overview of factors to consider in the design and validation of workplace-based assessment tools. It presents strategies used in the development and national validation of a tool COMPASS, used in an every speech pathology programme in Australia, New Zealand, and Singapore. The paper also describes Rasch analysis, a model-based statistical approach which is useful for establishing validity and reliability of assessment tools. Through careful attention to conceptual and design issues in the development and trialling of workplace-based assessments, it has been possible to develop the world's first valid and reliable national assessment tool for the assessment of performance in speech pathology.
Intersession reliability of fMRI activation for heat pain and motor tasks
Quiton, Raimi L.; Keaser, Michael L.; Zhuo, Jiachen; Gullapalli, Rao P.; Greenspan, Joel D.
2014-01-01
As the practice of conducting longitudinal fMRI studies to assess mechanisms of pain-reducing interventions becomes more common, there is a great need to assess the test–retest reliability of the pain-related BOLD fMRI signal across repeated sessions. This study quantitatively evaluated the reliability of heat pain-related BOLD fMRI brain responses in healthy volunteers across 3 sessions conducted on separate days using two measures: (1) intraclass correlation coefficients (ICC) calculated based on signal amplitude and (2) spatial overlap. The ICC analysis of pain-related BOLD fMRI responses showed fair-to-moderate intersession reliability in brain areas regarded as part of the cortical pain network. Areas with the highest intersession reliability based on the ICC analysis included the anterior midcingulate cortex, anterior insula, and second somatosensory cortex. Areas with the lowest intersession reliability based on the ICC analysis also showed low spatial reliability; these regions included pregenual anterior cingulate cortex, primary somatosensory cortex, and posterior insula. Thus, this study found regional differences in pain-related BOLD fMRI response reliability, which may provide useful information to guide longitudinal pain studies. A simple motor task (finger-thumb opposition) was performed by the same subjects in the same sessions as the painful heat stimuli were delivered. Intersession reliability of fMRI activation in cortical motor areas was comparable to previously published findings for both spatial overlap and ICC measures, providing support for the validity of the analytical approach used to assess intersession reliability of pain-related fMRI activation. A secondary finding of this study is that the use of standard ICC alone as a measure of reliability may not be sufficient, as the underlying variance structure of an fMRI dataset can result in inappropriately high ICC values; a method to eliminate these false positive results was used in this study and is recommended for future studies of test–retest reliability. PMID:25161897
Reliability Analysis of the Adult Mentoring Assessment for Extension Professionals
ERIC Educational Resources Information Center
Denny, Marina D'Abreau
2017-01-01
The Adult Mentoring Assessment for Extension Professionals will help mentors develop an accurate profile of their mentoring style with adult learners and identify areas of proficiency and deficiency based on six constructs--relationship, information, facilitation, confrontation, modeling, and vision. This article reports on the reliability of this…
Economos, Christina D; Sacheck, Jennifer M; Kwan Ho Chui, Kenneth; Irizarry, Laura; Irizzary, Laura; Guillemont, Juliette; Collins, Jessica J; Hyatt, Raymond R
2008-04-01
Interventions aiming to modify the dietary and physical activity behaviors of young children require precise and accurate measurement tools. As part of a larger community-based project, three school-based questionnaires were developed to assess (a) fruit and vegetable intake, (b) physical activity and television (TV) viewing, and (c) perceived parental support for diet and physical activity. Test-retest reliability was performed on all questionnaires and validity was measured for fruit and vegetable intake, physical activity, and TV viewing. Eighty-four school children (8.3+/-1.1 years) were studied. Test-retest reliability was performed by administering questionnaires twice, 1 to 2 hours apart. Validity of the fruit and vegetable questionnaire was measured by direct observation, while the physical activity and TV questionnaire was validated by a parent phone interview. All three questionnaires yielded excellent test-retest reliability (P<0.001). The majority of fruit and vegetable questions and the questions regarding specific physical activities and TV viewing were valid. Low validity scores were found for questions on watching TV during breakfast or dinner. These questionnaires are reliable and valid tools to assess fruit and vegetable intake, physical activity, and TV viewing behaviors in early elementary school-aged children. Methods for assessment of children's TV viewing during meals should be further investigated because of parent-child discrepancies.
Methodology Series Module 9: Designing Questionnaires and Clinical Record Forms - Part II.
Setia, Maninder Singh
2017-01-01
This article is a continuation of the previous module on designing questionnaires and clinical record form in which we have discussed some basic points about designing the questionnaire and clinical record forms. In this section, we will discuss the reliability and validity of questionnaires. The different types of validity are face validity, content validity, criterion validity, and construct validity. The different types of reliability are test-retest reliability, inter-rater reliability, and intra-rater reliability. Some of these parameters are assessed by subject area experts. However, statistical tests should be used for evaluation of other parameters. Once the questionnaire has been designed, the researcher should pilot test the questionnaire. The items in the questionnaire should be changed based on the feedback from the pilot study participants and the researcher's experience. After the basic structure of the questionnaire has been finalized, the researcher should assess the validity and reliability of the questionnaire or the scale. If an existing standard questionnaire is translated in the local language, the researcher should assess the reliability and validity of the translated questionnaire, and these values should be presented in the manuscript. The decision to use a self- or interviewer-administered, paper- or computer-based questionnaire depends on the nature of the questions, literacy levels of the target population, and resources.
Methodology Series Module 9: Designing Questionnaires and Clinical Record Forms – Part II
Setia, Maninder Singh
2017-01-01
This article is a continuation of the previous module on designing questionnaires and clinical record form in which we have discussed some basic points about designing the questionnaire and clinical record forms. In this section, we will discuss the reliability and validity of questionnaires. The different types of validity are face validity, content validity, criterion validity, and construct validity. The different types of reliability are test-retest reliability, inter-rater reliability, and intra-rater reliability. Some of these parameters are assessed by subject area experts. However, statistical tests should be used for evaluation of other parameters. Once the questionnaire has been designed, the researcher should pilot test the questionnaire. The items in the questionnaire should be changed based on the feedback from the pilot study participants and the researcher's experience. After the basic structure of the questionnaire has been finalized, the researcher should assess the validity and reliability of the questionnaire or the scale. If an existing standard questionnaire is translated in the local language, the researcher should assess the reliability and validity of the translated questionnaire, and these values should be presented in the manuscript. The decision to use a self- or interviewer-administered, paper- or computer-based questionnaire depends on the nature of the questions, literacy levels of the target population, and resources. PMID:28584367
Janssen, Ellen M; Marshall, Deborah A; Hauber, A Brett; Bridges, John F P
2017-12-01
The recent endorsement of discrete-choice experiments (DCEs) and other stated-preference methods by regulatory and health technology assessment (HTA) agencies has placed a greater focus on demonstrating the validity and reliability of preference results. Areas covered: We present a practical overview of tests of validity and reliability that have been applied in the health DCE literature and explore other study qualities of DCEs. From the published literature, we identify a variety of methods to assess the validity and reliability of DCEs. We conceptualize these methods to create a conceptual model with four domains: measurement validity, measurement reliability, choice validity, and choice reliability. Each domain consists of three categories that can be assessed using one to four procedures (for a total of 24 tests). We present how these tests have been applied in the literature and direct readers to applications of these tests in the health DCE literature. Based on a stakeholder engagement exercise, we consider the importance of study characteristics beyond traditional concepts of validity and reliability. Expert commentary: We discuss study design considerations to assess the validity and reliability of a DCE, consider limitations to the current application of tests, and discuss future work to consider the quality of DCEs in healthcare.
A Laboratory Study on the Reliability Estimations of the Mini-CEX
ERIC Educational Resources Information Center
de Lima, Alberto Alves; Conde, Diego; Costabel, Juan; Corso, Juan; Van der Vleuten, Cees
2013-01-01
Reliability estimations of workplace-based assessments with the mini-CEX are typically based on real-life data. Estimations are based on the assumption of local independence: the object of the measurement should not be influenced by the measurement itself and samples should be completely independent. This is difficult to achieve. Furthermore, the…
Lau, Nathan; Jamieson, Greg A; Skraaning, Gyrd
2016-03-01
The Process Overview Measure is a query-based measure developed to assess operator situation awareness (SA) from monitoring process plants. A companion paper describes how the measure has been developed according to process plant properties and operator cognitive work. The Process Overview Measure demonstrated practicality, sensitivity, validity and reliability in two full-scope simulator experiments investigating dramatically different operational concepts. Practicality was assessed based on qualitative feedback of participants and researchers. The Process Overview Measure demonstrated sensitivity and validity by revealing significant effects of experimental manipulations that corroborated with other empirical results. The measure also demonstrated adequate inter-rater reliability and practicality for measuring SA in full-scope simulator settings based on data collected on process experts. Thus, full-scope simulator studies can employ the Process Overview Measure to reveal the impact of new control room technology and operational concepts on monitoring process plants. Practitioner Summary: The Process Overview Measure is a query-based measure that demonstrated practicality, sensitivity, validity and reliability for assessing operator situation awareness (SA) from monitoring process plants in representative settings.
Rosen, Jules; Mulsant, Benoit H; Marino, Patricia; Groening, Christopher; Young, Robert C; Fox, Debra
2008-10-30
Despite the importance of establishing shared scoring conventions and assessing interrater reliability in clinical trials in psychiatry, these elements are often overlooked. Obstacles to rater training and reliability testing include logistic difficulties in providing live training sessions, or mailing videotapes of patients to multiple sites and collecting the data for analysis. To address some of these obstacles, a web-based interactive video system was developed. It uses actors of diverse ages, gender and race to train raters how to score the Hamilton Depression Rating Scale and to assess interrater reliability. This system was tested with a group of experienced and novice raters within a single site. It was subsequently used to train raters of a federally funded multi-center clinical trial on scoring conventions and to test their interrater reliability. The advantages and limitations of using interactive video technology to improve the quality of clinical trials are discussed.
Rollover risk prediction of heavy vehicles by reliability index and empirical modelling
NASA Astrophysics Data System (ADS)
Sellami, Yamine; Imine, Hocine; Boubezoul, Abderrahmane; Cadiou, Jean-Charles
2018-03-01
This paper focuses on a combination of a reliability-based approach and an empirical modelling approach for rollover risk assessment of heavy vehicles. A reliability-based warning system is developed to alert the driver to a potential rollover before entering into a bend. The idea behind the proposed methodology is to estimate the rollover risk by the probability that the vehicle load transfer ratio (LTR) exceeds a critical threshold. Accordingly, a so-called reliability index may be used as a measure to assess the vehicle safe functioning. In the reliability method, computing the maximum of LTR requires to predict the vehicle dynamics over the bend which can be in some cases an intractable problem or time-consuming. With the aim of improving the reliability computation time, an empirical model is developed to substitute the vehicle dynamics and rollover models. This is done by using the SVM (Support Vector Machines) algorithm. The preliminary obtained results demonstrate the effectiveness of the proposed approach.
A low-cost, tablet-based option for prehospital neurologic assessment: The iTREAT Study.
Chapman Smith, Sherita N; Govindarajan, Prasanthi; Padrick, Matthew M; Lippman, Jason M; McMurry, Timothy L; Resler, Brian L; Keenan, Kevin; Gunnell, Brian S; Mehndiratta, Prachi; Chee, Christina Y; Cahill, Elizabeth A; Dietiker, Cameron; Cattell-Gordon, David C; Smith, Wade S; Perina, Debra G; Solenski, Nina J; Worrall, Bradford B; Southerland, Andrew M
2016-07-05
In this 2-center study, we assessed the technical feasibility and reliability of a low cost, tablet-based mobile telestroke option for ambulance transport and hypothesized that the NIH Stroke Scale (NIHSS) could be performed with similar reliability between remote and bedside examinations. We piloted our mobile telemedicine system in 2 geographic regions, central Virginia and the San Francisco Bay Area, utilizing commercial cellular networks for videoconferencing transmission. Standardized patients portrayed scripted stroke scenarios during ambulance transport and were evaluated by independent raters comparing bedside to remote mobile telestroke assessments. We used a mixed-effects regression model to determine intraclass correlation of the NIHSS between bedside and remote examinations (95% confidence interval). We conducted 27 ambulance runs at both sites and successfully completed the NIHSS for all prehospital assessments without prohibitive technical interruption. The mean difference between bedside (face-to-face) and remote (video) NIHSS scores was 0.25 (1.00 to -0.50). Overall, correlation of the NIHSS between bedside and mobile telestroke assessments was 0.96 (0.92-0.98). In the mixed-effects regression model, there were no statistically significant differences accounting for method of evaluation or differences between sites. Utilizing a low-cost, tablet-based platform and commercial cellular networks, we can reliably perform prehospital neurologic assessments in both rural and urban settings. Further research is needed to establish the reliability and validity of prehospital mobile telestroke assessment in live patients presenting with acute neurologic symptoms. © 2016 American Academy of Neurology.
A low-cost, tablet-based option for prehospital neurologic assessment
Chapman Smith, Sherita N.; Govindarajan, Prasanthi; Padrick, Matthew M.; Lippman, Jason M.; McMurry, Timothy L.; Resler, Brian L.; Keenan, Kevin; Gunnell, Brian S.; Mehndiratta, Prachi; Chee, Christina Y.; Cahill, Elizabeth A.; Dietiker, Cameron; Cattell-Gordon, David C.; Smith, Wade S.; Perina, Debra G.; Solenski, Nina J.; Worrall, Bradford B.
2016-01-01
Objectives: In this 2-center study, we assessed the technical feasibility and reliability of a low cost, tablet-based mobile telestroke option for ambulance transport and hypothesized that the NIH Stroke Scale (NIHSS) could be performed with similar reliability between remote and bedside examinations. Methods: We piloted our mobile telemedicine system in 2 geographic regions, central Virginia and the San Francisco Bay Area, utilizing commercial cellular networks for videoconferencing transmission. Standardized patients portrayed scripted stroke scenarios during ambulance transport and were evaluated by independent raters comparing bedside to remote mobile telestroke assessments. We used a mixed-effects regression model to determine intraclass correlation of the NIHSS between bedside and remote examinations (95% confidence interval). Results: We conducted 27 ambulance runs at both sites and successfully completed the NIHSS for all prehospital assessments without prohibitive technical interruption. The mean difference between bedside (face-to-face) and remote (video) NIHSS scores was 0.25 (1.00 to −0.50). Overall, correlation of the NIHSS between bedside and mobile telestroke assessments was 0.96 (0.92–0.98). In the mixed-effects regression model, there were no statistically significant differences accounting for method of evaluation or differences between sites. Conclusions: Utilizing a low-cost, tablet-based platform and commercial cellular networks, we can reliably perform prehospital neurologic assessments in both rural and urban settings. Further research is needed to establish the reliability and validity of prehospital mobile telestroke assessment in live patients presenting with acute neurologic symptoms. PMID:27281534
Electronic Quality of Life Assessment Using Computer-Adaptive Testing
2016-01-01
Background Quality of life (QoL) questionnaires are desirable for clinical practice but can be time-consuming to administer and interpret, making their widespread adoption difficult. Objective Our aim was to assess the performance of the World Health Organization Quality of Life (WHOQOL)-100 questionnaire as four item banks to facilitate adaptive testing using simulated computer adaptive tests (CATs) for physical, psychological, social, and environmental QoL. Methods We used data from the UK WHOQOL-100 questionnaire (N=320) to calibrate item banks using item response theory, which included psychometric assessments of differential item functioning, local dependency, unidimensionality, and reliability. We simulated CATs to assess the number of items administered before prespecified levels of reliability was met. Results The item banks (40 items) all displayed good model fit (P>.01) and were unidimensional (fewer than 5% of t tests significant), reliable (Person Separation Index>.70), and free from differential item functioning (no significant analysis of variance interaction) or local dependency (residual correlations < +.20). When matched for reliability, the item banks were between 45% and 75% shorter than paper-based WHOQOL measures. Across the four domains, a high standard of reliability (alpha>.90) could be gained with a median of 9 items. Conclusions Using CAT, simulated assessments were as reliable as paper-based forms of the WHOQOL with a fraction of the number of items. These properties suggest that these item banks are suitable for computerized adaptive assessment. These item banks have the potential for international development using existing alternative language versions of the WHOQOL items. PMID:27694100
Reliability Quantification of Advanced Stirling Convertor (ASC) Components
NASA Technical Reports Server (NTRS)
Shah, Ashwin R.; Korovaichuk, Igor; Zampino, Edward
2010-01-01
The Advanced Stirling Convertor, is intended to provide power for an unmanned planetary spacecraft and has an operational life requirement of 17 years. Over this 17 year mission, the ASC must provide power with desired performance and efficiency and require no corrective maintenance. Reliability demonstration testing for the ASC was found to be very limited due to schedule and resource constraints. Reliability demonstration must involve the application of analysis, system and component level testing, and simulation models, taken collectively. Therefore, computer simulation with limited test data verification is a viable approach to assess the reliability of ASC components. This approach is based on physics-of-failure mechanisms and involves the relationship among the design variables based on physics, mechanics, material behavior models, interaction of different components and their respective disciplines such as structures, materials, fluid, thermal, mechanical, electrical, etc. In addition, these models are based on the available test data, which can be updated, and analysis refined as more data and information becomes available. The failure mechanisms and causes of failure are included in the analysis, especially in light of the new information, in order to develop guidelines to improve design reliability and better operating controls to reduce the probability of failure. Quantified reliability assessment based on fundamental physical behavior of components and their relationship with other components has demonstrated itself to be a superior technique to conventional reliability approaches based on utilizing failure rates derived from similar equipment or simply expert judgment.
Reliability, Validity and Utility of a Multiple Intelligences Assessment for Career Planning.
ERIC Educational Resources Information Center
Shearer, C. Branton
"The Multiple Intelligences Developmental Assessment Scales" (MIDAS) is a self- (or other-) completed instrument which is based upon the theory of multiple intelligences. The validity, reliability, and utility data regarding the MIDAS are reported here. The measure consists of 7 main scales and 24 subscales which summarize a person's intellectual…
NASA Applications and Lessons Learned in Reliability Engineering
NASA Technical Reports Server (NTRS)
Safie, Fayssal M.; Fuller, Raymond P.
2011-01-01
Since the Shuttle Challenger accident in 1986, communities across NASA have been developing and extensively using quantitative reliability and risk assessment methods in their decision making process. This paper discusses several reliability engineering applications that NASA has used over the year to support the design, development, and operation of critical space flight hardware. Specifically, the paper discusses several reliability engineering applications used by NASA in areas such as risk management, inspection policies, components upgrades, reliability growth, integrated failure analysis, and physics based probabilistic engineering analysis. In each of these areas, the paper provides a brief discussion of a case study to demonstrate the value added and the criticality of reliability engineering in supporting NASA project and program decisions to fly safely. Examples of these case studies discussed are reliability based life limit extension of Shuttle Space Main Engine (SSME) hardware, Reliability based inspection policies for Auxiliary Power Unit (APU) turbine disc, probabilistic structural engineering analysis for reliability prediction of the SSME alternate turbo-pump development, impact of ET foam reliability on the Space Shuttle System risk, and reliability based Space Shuttle upgrade for safety. Special attention is given in this paper to the physics based probabilistic engineering analysis applications and their critical role in evaluating the reliability of NASA development hardware including their potential use in a research and technology development environment.
Automated Portable Test System (APTS) - A performance envelope assessment tool
NASA Technical Reports Server (NTRS)
Kennedy, R. S.; Dunlap, W. P.; Jones, M. B.; Wilkes, R. L.; Bittner, A. C., Jr.
1985-01-01
The reliability and stability of microcomputer-based psychological tests are evaluated. The hardware, test programs, and system control of the Automated Portable Test System, which assesses human performance and subjective status, are described. Subjects were administered 11 pen-and-pencil and microcomputer-based tests for 10 sessions. The data reveal that nine of the 10 tests stabilized by the third administration; inertial correlations were high and consistent. It is noted that the microcomputer-based tests display good psychometric properties in terms of differential stability and reliability.
Staggs, Vincent S; Cramer, Emily
2016-08-01
Hospital performance reports often include rankings of unit pressure ulcer rates. Differentiating among units on the basis of quality requires reliable measurement. Our objectives were to describe and apply methods for assessing reliability of hospital-acquired pressure ulcer rates and evaluate a standard signal-noise reliability measure as an indicator of precision of differentiation among units. Quarterly pressure ulcer data from 8,199 critical care, step-down, medical, surgical, and medical-surgical nursing units from 1,299 US hospitals were analyzed. Using beta-binomial models, we estimated between-unit variability (signal) and within-unit variability (noise) in annual unit pressure ulcer rates. Signal-noise reliability was computed as the ratio of between-unit variability to the total of between- and within-unit variability. To assess precision of differentiation among units based on ranked pressure ulcer rates, we simulated data to estimate the probabilities of a unit's observed pressure ulcer rate rank in a given sample falling within five and ten percentiles of its true rank, and the probabilities of units with ulcer rates in the highest quartile and highest decile being identified as such. We assessed the signal-noise measure as an indicator of differentiation precision by computing its correlations with these probabilities. Pressure ulcer rates based on a single year of quarterly or weekly prevalence surveys were too susceptible to noise to allow for precise differentiation among units, and signal-noise reliability was a poor indicator of precision of differentiation. To ensure precise differentiation on the basis of true differences, alternative methods of assessing reliability should be applied to measures purported to differentiate among providers or units based on quality. © 2016 The Authors. Research in Nursing & Health published by Wiley Periodicals, Inc. © 2016 The Authors. Research in Nursing & Health published by Wiley Periodicals, Inc.
Berger, Aaron J; Momeni, Arash; Ladd, Amy L
2014-04-01
Trapeziometacarpal, or thumb carpometacarpal (CMC), arthritis is a common problem with a variety of treatment options. Although widely used, the Eaton radiographic staging system for CMC arthritis is of questionable clinical utility, as disease severity does not predictably correlate with symptoms or treatment recommendations. A possible reason for this is that the classification itself may not be reliable, but the literature on this has not, to our knowledge, been systematically reviewed. We therefore performed a systematic review to determine the intra- and interobserver reliability of the Eaton staging system. We systematically reviewed English-language studies published between 1973 and 2013 to assess the degree of intra- and interobserver reliability of the Eaton classification for determining the stage of trapeziometacarpal joint arthritis and pantrapezial arthritis based on plain radiographic imaging. Search engines included: PubMed, Scopus(®), and CINAHL. Four studies, which included a total of 163 patients, met our inclusion criteria and were evaluated. The level of evidence of the studies included in this analysis was determined using the Oxford Centre for Evidence Based Medicine Levels of Evidence Classification by two independent observers. A limited number of studies have been performed to assess intra- and interobserver reliability of the Eaton classification system. The four studies included were determined to be Level 3b. These studies collectively indicate that the Eaton classification demonstrates poor to fair interobserver reliability (kappa values: 0.11-0.56) and fair to moderate intraobserver reliability (kappa values: 0.54-0.657). Review of the literature demonstrates that radiographs assist in the assessment of CMC joint disease, but there is not a reliable system for classification of disease severity. Currently, diagnosis and treatment of thumb CMC arthritis are based on the surgeon's qualitative assessment combining history, physical examination, and radiographic evaluation. Inconsistent agreement using the current common radiographic classification system suggests a need for better radiographic tools to quantify disease severity.
Weyers, Simone; Jemi, Iman; Karger, André; Raski, Bianca; Rotthoff, Thomas; Pentzek, Michael; Mortsiefer, Achim
2016-01-01
Background: Imparting communication skills has been given great importance in medical curricula. In addition to standardized assessments, students should communicate with real patients in actual clinical situations during workplace-based assessments and receive structured feedback on their performance. The aim of this project was to pilot a formative testing method for workplace-based assessment. Our investigation centered in particular on whether or not physicians view the method as feasible and how high acceptance is among students. In addition, we assessed the reliability of the method. Method: As part of the project, 16 students held two consultations each with chronically ill patients at the medical practice where they were completing GP training. These consultations were video-recorded. The trained mentoring physician rated the student’s performance and provided feedback immediately following the consultations using the Berlin Global Rating scale (BGR). Two impartial, trained raters also evaluated the videos using BGR. For qualitative and quantitative analysis, information on how physicians and students viewed feasibility and their levels of acceptance was collected in written form in a partially standardized manner. To test for reliability, the test-retest reliability was calculated for both of the overall evaluations given by each rater. The inter-rater reliability was determined for the three evaluations of each individual consultation. Results: The formative assessment method was rated positively by both physicians and students. It is relatively easy to integrate into daily routines. Its significant value lies in the personal, structured and recurring feedback. The two overall scores for each patient consultation given by the two impartial raters correlate moderately. The degree of uniformity among the three raters in respect to the individual consultations is low. Discussion: Within the scope of this pilot project, only a small sample of physicians and students could be surveyed to a limited extent. There are indications that the assessment can be improved by integrating more information on medical context and student self-assessments. Despite the current limitations regarding test criteria, it is clear that workplace-based assessment of communication skills in the clinical setting is a valuable addition to the communication curricula of medical schools. PMID:27990466
Weyers, Simone; Jemi, Iman; Karger, André; Raski, Bianca; Rotthoff, Thomas; Pentzek, Michael; Mortsiefer, Achim
2016-01-01
Background: Imparting communication skills has been given great importance in medical curricula. In addition to standardized assessments, students should communicate with real patients in actual clinical situations during workplace-based assessments and receive structured feedback on their performance. The aim of this project was to pilot a formative testing method for workplace-based assessment. Our investigation centered in particular on whether or not physicians view the method as feasible and how high acceptance is among students. In addition, we assessed the reliability of the method. Method: As part of the project, 16 students held two consultations each with chronically ill patients at the medical practice where they were completing GP training. These consultations were video-recorded. The trained mentoring physician rated the student's performance and provided feedback immediately following the consultations using the Berlin Global Rating scale (BGR). Two impartial, trained raters also evaluated the videos using BGR. For qualitative and quantitative analysis, information on how physicians and students viewed feasibility and their levels of acceptance was collected in written form in a partially standardized manner. To test for reliability, the test-retest reliability was calculated for both of the overall evaluations given by each rater. The inter-rater reliability was determined for the three evaluations of each individual consultation. Results: The formative assessment method was rated positively by both physicians and students. It is relatively easy to integrate into daily routines. Its significant value lies in the personal, structured and recurring feedback. The two overall scores for each patient consultation given by the two impartial raters correlate moderately. The degree of uniformity among the three raters in respect to the individual consultations is low. Discussion: Within the scope of this pilot project, only a small sample of physicians and students could be surveyed to a limited extent. There are indications that the assessment can be improved by integrating more information on medical context and student self-assessments. Despite the current limitations regarding test criteria, it is clear that workplace-based assessment of communication skills in the clinical setting is a valuable addition to the communication curricula of medical schools.
Lyon, Aaron R; Pullmann, Michael D; Dorsey, Shannon; Martin, Prerna; Grigore, Alexandra A; Becker, Emily M; Jensen-Doss, Amanda
2018-05-11
Measurement-based care (MBC) is an increasingly popular, evidence-based practice, but there are no tools with established psychometrics to evaluate clinician use of MBC practices in mental health service delivery. The current study evaluated the reliability, validity, and factor structure of scores generated from a brief, standardized tool to measure MBC practices, the Current Assessment Practice Evaluation-Revised (CAPER). Survey data from a national sample of 479 mental health clinicians were used to conduct exploratory and confirmatory factor analyses, as well as reliability and validity analyses (e.g., relationships between CAPER subscales and clinician MBC attitudes). Analyses revealed competing two- and three-factor models. Regardless of the model used, scores from CAPER subscales demonstrated good reliability and convergent and divergent validity with MBC attitudes in the expected directions. The CAPER appears to be a psychometrically sound tool for assessing clinician MBC practices. Future directions for development and application of the tool are discussed.
Rahman, Mohd Nasrull Abdol; Mohamad, Siti Shafika
2017-01-01
Computer works are associated with Musculoskeletal Disorders (MSDs). There are several methods have been developed to assess computer work risk factor related to MSDs. This review aims to give an overview of current techniques available for pen-and-paper-based observational methods in assessing ergonomic risk factors of computer work. We searched an electronic database for materials from 1992 until 2015. The selected methods were focused on computer work, pen-and-paper observational methods, office risk factors and musculoskeletal disorders. This review was developed to assess the risk factors, reliability and validity of pen-and-paper observational method associated with computer work. Two evaluators independently carried out this review. Seven observational methods used to assess exposure to office risk factor for work-related musculoskeletal disorders were identified. The risk factors involved in current techniques of pen and paper based observational tools were postures, office components, force and repetition. From the seven methods, only five methods had been tested for reliability. They were proven to be reliable and were rated as moderate to good. For the validity testing, from seven methods only four methods were tested and the results are moderate. Many observational tools already exist, but no single tool appears to cover all of the risk factors including working posture, office component, force, repetition and office environment at office workstations and computer work. Although the most important factor in developing tool is proper validation of exposure assessment techniques, the existing observational method did not test reliability and validity. Futhermore, this review could provide the researchers with ways on how to improve the pen-and-paper-based observational method for assessing ergonomic risk factors of computer work.
A Protocol for Advanced Psychometric Assessment of Surveys
Squires, Janet E.; Hayduk, Leslie; Hutchinson, Alison M.; Cranley, Lisa A.; Gierl, Mark; Cummings, Greta G.; Norton, Peter G.; Estabrooks, Carole A.
2013-01-01
Background and Purpose. In this paper, we present a protocol for advanced psychometric assessments of surveys based on the Standards for Educational and Psychological Testing. We use the Alberta Context Tool (ACT) as an exemplar survey to which this protocol can be applied. Methods. Data mapping, acceptability, reliability, and validity are addressed. Acceptability is assessed with missing data frequencies and the time required to complete the survey. Reliability is assessed with internal consistency coefficients and information functions. A unitary approach to validity consisting of accumulating evidence based on instrument content, response processes, internal structure, and relations to other variables is taken. We also address assessing performance of survey data when aggregated to higher levels (e.g., nursing unit). Discussion. In this paper we present a protocol for advanced psychometric assessment of survey data using the Alberta Context Tool (ACT) as an exemplar survey; application of the protocol to the ACT survey is underway. Psychometric assessment of any survey is essential to obtaining reliable and valid research findings. This protocol can be adapted for use with any nursing survey. PMID:23401759
La Padula, Simone; Hersant, Barbara; SidAhmed, Mounia; Niddam, Jeremy; Meningaud, Jean Paul
2016-07-01
Most patients requesting aesthetic rejuvenation treatment expect to look healthier and younger. Some scales for ageing assessment have been proposed, but none is focused on patient age prediction. The aim of this study was to develop and validate a new facial rating scale assessing facial ageing sign severity. One thousand Caucasian patients were included and assessed. The Rasch model was used as part of the validation process. A score was attributed to each patient, based on the scales we developed. The correlation between the real age and scores obtained, the inter-rater reliability and test-retest reliability were analysed. The objective was to develop a tool enabling the assigning of a patient to a specific age range based on the calculated score. All scales exceeded criteria for acceptability, reliability and validity. The real age strongly correlated with the total facial score in both sex groups. The test-retest reliability confirmed this strong correlation. We developed a facial ageing scale which could be a useful tool to assess patients before and after rejuvenation treatment and an important new metrics to be used in facial rejuvenation and regenerative clinical research. Copyright © 2016 European Association for Cranio-Maxillo-Facial Surgery. Published by Elsevier Ltd. All rights reserved.
Reliability evaluation of microgrid considering incentive-based demand response
NASA Astrophysics Data System (ADS)
Huang, Ting-Cheng; Zhang, Yong-Jun
2017-07-01
Incentive-based demand response (IBDR) can guide customers to adjust their behaviour of electricity and curtail load actively. Meanwhile, distributed generation (DG) and energy storage system (ESS) can provide time for the implementation of IBDR. The paper focus on the reliability evaluation of microgrid considering IBDR. Firstly, the mechanism of IBDR and its impact on power supply reliability are analysed. Secondly, the IBDR dispatch model considering customer’s comprehensive assessment and the customer response model are developed. Thirdly, the reliability evaluation method considering IBDR based on Monte Carlo simulation is proposed. Finally, the validity of the above models and method is studied through numerical tests on modified RBTS Bus6 test system. Simulation results demonstrated that IBDR can improve the reliability of microgrid.
Training and Maintaining System-Wide Reliability in Outcome Management.
Barwick, Melanie A; Urajnik, Diana J; Moore, Julia E
2014-01-01
The Child and Adolescent Functional Assessment Scale (CAFAS) is widely used for outcome management, for providing real time client and program level data, and the monitoring of evidence-based practices. Methods of reliability training and the assessment of rater drift are critical for service decision-making within organizations and systems of care. We assessed two approaches for CAFAS training: external technical assistance and internal technical assistance. To this end, we sampled 315 practitioners trained by external technical assistance approach from 2,344 Ontario practitioners who had achieved reliability on the CAFAS. To assess the internal technical assistance approach as a reliable alternative training method, 140 practitioners trained internally were selected from the same pool of certified raters. Reliabilities were high for both practitioners trained by external technical assistance and internal technical assistance approaches (.909-.995, .915-.997, respectively). 1 and 3-year estimates showed some drift on several scales. High and consistent reliabilities over time and training method has implications for CAFAS training of behavioral health care practitioners, and the maintenance of CAFAS as a global outcome management tool in systems of care.
Snow, Nicholas J; Peters, Sue; Borich, Michael R; Shirzad, Navid; Auriat, Angela M; Hayward, Kathryn S; Boyd, Lara A
2016-01-15
Diffusion-weighted magnetic resonance imaging (DW-MRI) is commonly used to assess white matter properties after stroke. Novel work is utilizing constrained spherical deconvolution (CSD) to estimate complex intra-voxel fiber architecture unaccounted for with tensor-based fiber tractography. However, the reliability of CSD-based tractography has not been established in people with chronic stroke. Establishing the reliability of CSD-based DW-MRI in chronic stroke. High-resolution DW-MRI was performed in ten adults with chronic stroke during two separate sessions. Deterministic region of interest-based fiber tractography using CSD was performed by two raters. Mean fractional anisotropy (FA), apparent diffusion coefficient (ADC), tract number, and tract volume were extracted from reconstructed fiber pathways in the corticospinal tract (CST) and superior longitudinal fasciculus (SLF). Callosal fiber pathways connecting the primary motor cortices were also evaluated. Inter-rater and test-retest reliability were determined by intra-class correlation coefficients (ICCs). ICCs revealed excellent reliability for FA and ADC in ipsilesional (0.86-1.00; p<0.05) and contralesional hemispheres (0.94-1.00; p<0.0001), for CST and SLF fibers; and excellent reliability for all metrics in callosal fibers (0.85-1.00; p<0.05). ICC ranged from poor to excellent for tract number and tract volume in ipsilesional (-0.11 to 0.92; p≤0.57) and contralesional hemispheres (-0.27 to 0.93; p≤0.64), for CST and SLF fibers. Like other select DW-MRI approaches, CSD-based tractography is a reliable approach to evaluate FA and ADC in major white matter pathways, in chronic stroke. Future work should address the reproducibility and utility of CSD-based metrics of tract number and tract volume. Copyright © 2015 Elsevier B.V. All rights reserved.
Human Reliability and the Cost of Doing Business
NASA Technical Reports Server (NTRS)
DeMott, Diana
2014-01-01
Most businesses recognize that people will make mistakes and assume errors are just part of the cost of doing business, but does it need to be? Companies with high risk, or major consequences, should consider the effect of human error. In a variety of industries, Human Errors have caused costly failures and workplace injuries. These have included: airline mishaps, medical malpractice, administration of medication and major oil spills have all been blamed on human error. A technique to mitigate or even eliminate some of these costly human errors is the use of Human Reliability Analysis (HRA). Various methodologies are available to perform Human Reliability Assessments that range from identifying the most likely areas for concern to detailed assessments with human error failure probabilities calculated. Which methodology to use would be based on a variety of factors that would include: 1) how people react and act in different industries, and differing expectations based on industries standards, 2) factors that influence how the human errors could occur such as tasks, tools, environment, workplace, support, training and procedure, 3) type and availability of data and 4) how the industry views risk & reliability influences ( types of emergencies, contingencies and routine tasks versus cost based concerns). The Human Reliability Assessments should be the first step to reduce, mitigate or eliminate the costly mistakes or catastrophic failures. Using Human Reliability techniques to identify and classify human error risks allows a company more opportunities to mitigate or eliminate these risks and prevent costly failures.
Test-Retest Reliability of the Short-Form Survivor Unmet Needs Survey.
Taylor, Karen; Bulsara, Max; Monterosso, Leanne
2018-01-01
Reliable and valid needs assessment measures are important assessment tools in cancer survivorship care. A new 30-item short-form version of the Survivor Unmet Needs Survey (SF-SUNS) was developed and validated with cancer survivors, including hematology cancer survivors; however, test-retest reliability has not been established. The objective of this study was to assess the test-retest reliability of the SF-SUNS with a cohort of lymphoma survivors ( n = 40). Test-retest reliability of the SF-SUNS was conducted at two time points: baseline (time 1) and 5 days later (time 2). Test-retest data were collected from lymphoma cancer survivors ( n = 40) in a large tertiary cancer center in Western Australia. Intraclass correlation analyses compared data at time 1 (baseline) and time 2 (5 days later). Cronbach's alpha analyses were performed to assess the internal consistency at both time points. The majority (23/30, 77%) of items achieved test-retest reliability scores 0.45-0.74 (fair to good). A high degree of overall internal consistency was demonstrated (time 1 = 0.92, time 2 = 0.95), with scores 0.65-0.94 across subscales for both time points. Mixed test-retest reliability of the SF-SUNS was established. Our results indicate the SF-SUNS is responsive to the changing needs of lymphoma cancer survivors. Routine use of cancer survivorship specific needs-based assessments is required in oncology care today. Nurses are well placed to administer these assessments and provide tailored information and resources. Further assessment of test-retest reliability in hematology and other cancer cohorts is warranted.
Larsen, Camilla Marie; Juul-Kristensen, Birgit; Lund, Hans; Søgaard, Karen
2014-10-01
The aims were to compile a schematic overview of clinical scapular assessment methods and critically appraise the methodological quality of the involved studies. A systematic, computer-assisted literature search using Medline, CINAHL, SportDiscus and EMBASE was performed from inception to October 2013. Reference lists in articles were also screened for publications. From 50 articles, 54 method names were identified and categorized into three groups: (1) Static positioning assessment (n = 19); (2) Semi-dynamic (n = 13); and (3) Dynamic functional assessment (n = 22). Fifteen studies were excluded for evaluation due to no/few clinimetric results, leaving 35 studies for evaluation. Graded according to the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN checklist), the methodological quality in the reliability and validity domains was "fair" (57%) to "poor" (43%), with only one study rated as "good". The reliability domain was most often investigated. Few of the assessment methods in the included studies that had "fair" or "good" measurement property ratings demonstrated acceptable results for both reliability and validity. We found a substantially larger number of clinical scapular assessment methods than previously reported. Using the COSMIN checklist the methodological quality of the included measurement properties in the reliability and validity domains were in general "fair" to "poor". None were examined for all three domains: (1) reliability; (2) validity; and (3) responsiveness. Observational evaluation systems and assessment of scapular upward rotation seem suitably evidence-based for clinical use. Future studies should test and improve the clinimetric properties, and especially diagnostic accuracy and responsiveness, to increase utility for clinical practice.
Gasq, David; Labrunée, Marc; Amarantini, David; Dupui, Philippe; Montoya, Richard; Marque, Philippe
2014-03-21
Stroke patients have impaired postural balance that increases the risk of falls and impairs their mobility. Assessment of postural balance is commonly carried out by recording centre of pressure (CoP) displacements, but the lack of data concerning reliability of these measures compromises their interpretation. The purpose of this study was to investigate the between-day reliability of six CoP-based variables, in order to provide i) reliability data for monitoring postural sway and weight-bearing asymmetry of stroke patients in clinical practice and ii) consistent assessment method of measurement error for applications in physical medicine and rehabilitation. Postural balance of 20 stroke patients was assessed in quiet standing on a force platform, in two sessions, 7 days apart. Six CoP-based variables were collected in eyes open and eyes closed conditions: postural sway was assessed with mean and standart deviation of CoP-velocity, CoP-velocity along the mediolateral and anteroposterior axes, and confidence ellipse area (CE(AREA)); weight-bearing asymmetry was assessed with mean CoP position along the mediolateral axis (CoP(ML)). The intraclass correlation coefficient (ICC) was used to determine the level of agreement between test-retest. Small real difference (SRD), corresponding to the smallest change that indicates a real improvement for a single individual, was used to determine the extent of measurement error. ICCs were satisfactory (>0.9) for all CoP-based variables, except for CE(AREA) in eyes open condition and CoP(ML) (<0.8). The SRDs (eyes open/closed conditions) were: 6.1/9.5 mm.s(-1) for mean velocity; 12.3/12.2 mm.s(-1) for standard deviation of CoP-velocity; 3.6/5.5 mm.s(-1) and 4.9/7.3 mm.s(-1) for CoP-velocity in mediolateral and anteroposterior axes, respectively; 17.4/21.4 mm for CoP(ML). Because CE(AREA) showed heteroscedasticity of measurement error distribution, SRD (eyes open/closed conditions) was expressed as a percentage (121/75%) and a ratio (3.68/2.16) obtained after log-antilog procedure. In clinical practice, the CoP-based velocity variables should be prefer to CE(AREA) to assess and monitor postural sway over time in hemiplegic stroke patients. The poor reliability of CoP(ML) compromises its use to assess weight-bearing asymmetry. The procedure we used could be applied in reliability studies concerning other CoP-based variables or other biological variables in the field of physical medicine and rehabilitation.
Duff, Kevin
2012-01-01
Repeated assessments are a relatively common occurrence in clinical neuropsychology. The current paper will review some of the relevant concepts (e.g., reliability, practice effects, alternate forms) and methods (e.g., reliable change index, standardized based regression) that are used in repeated neuropsychological evaluations. The focus will be on the understanding and application of these concepts and methods in the evaluation of the individual patient through examples. Finally, some future directions for assessing change will be described. PMID:22382384
The effect of Web-based Braden Scale training on the reliability of Braden subscale ratings.
Magnan, Morris A; Maklebust, JoAnn
2009-01-01
The primary purpose of this study was to evaluate the effect of Web-based Braden Scale training on the reliability of Braden Scale subscale ratings made by nurses working in acute care hospitals. A secondary purpose was to describe the distribution of reliable Braden subscale ratings before and after Web-based Braden Scale training. Secondary analysis of data from a recently completed quasi-experimental, pretest-posttest, interrater reliability study. A convenience sample of RNs working at 3 Michigan medical centers voluntarily participated in the study. RN participants included nurses who used the Braden Scale regularly at their place of employment ("regular users") as well as nurses who did not use the Braden Scale at their place of employment ("new users"). Using a pretest-posttest, quasi-experimental design, pretest interrater reliability data were collected to identify the percentage of nurses making reliable Braden subscale assessments. Nurses then completed a Web-based Braden Scale training module after which posttest interrater reliability data were collected. The reliability of nurses' Braden subscale ratings was determined by examining the level of agreement/disagreement between ratings made by an RN and an "expert" rating the same patient. In total, 381 RN-to-expert dyads were available for analysis. During both the pretest and posttest periods, the percentage of reliable subscale ratings was highest for the activity subscale, lowest for the moisture subscale, and second lowest for the nutrition subscale. With Web-based Braden Scale training, the percentage of reliable Braden subscale ratings made by new users increased for all 6 subscales with statistically significant improvements in the percentage of reliable assessments made on 3 subscales: sensory-perception, moisture, and mobility. Training had virtually no effect on the percentage of reliable subscale ratings made by regular users of the Braden Scale. With Web-based Braden Scale training the percentage of nurses making reliable ratings increased for all 6 subscales, but this was true for new users only. Additional research is needed to identify educational approaches that effectively improve and sustain the reliability of subscale ratings among regular users of the Braden Scale. Moreover, special attention needs to be given to ensuring that all nurses working with the Braden Scale have a clear understanding of the intended meanings and correct approaches to rating moisture and nutrition subscales.
Valle, Susanne Collier; Støen, Ragnhild; Sæther, Rannei; Jensenius, Alexander Refsum; Adde, Lars
2015-10-01
A computer-based video analysis has recently been presented for quantitative assessment of general movements (GMs). This method's test-retest reliability, however, has not yet been evaluated. The aim of the current study was to evaluate the test-retest reliability of computer-based video analysis of GMs, and to explore the association between computer-based video analysis and the temporal organization of fidgety movements (FMs). Test-retest reliability study. 75 healthy, term-born infants were recorded twice the same day during the FMs period using a standardized video set-up. The computer-based movement variables "quantity of motion mean" (Qmean), "quantity of motion standard deviation" (QSD) and "centroid of motion standard deviation" (CSD) were analyzed, reflecting the amount of motion and the variability of the spatial center of motion of the infant, respectively. In addition, the association between the variable CSD and the temporal organization of FMs was explored. Intraclass correlation coefficients (ICC 1.1 and ICC 3.1) were calculated to assess test-retest reliability. The ICC values for the variables CSD, Qmean and QSD were 0.80, 0.80 and 0.86 for ICC (1.1), respectively; and 0.80, 0.86 and 0.90 for ICC (3.1), respectively. There were significantly lower CSD values in the recordings with continual FMs compared to the recordings with intermittent FMs (p<0.05). This study showed high test-retest reliability of computer-based video analysis of GMs, and a significant association between our computer-based video analysis and the temporal organization of FMs. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
TENI: A comprehensive battery for cognitive assessment based on games and technology.
Delgado, Marcela Tenorio; Uribe, Paulina Arango; Alonso, Andrés Aparicio; Díaz, Ricardo Rosas
2016-01-01
TENI (Test de Evaluación Neuropsicológica Infantil) is an instrument developed to assess cognitive abilities in children between 3 and 9 years of age. It is based on a model that incorporates games and technology as tools to improve the assessment of children's capacities. The test was standardized with two Chilean samples of 524 and 82 children living in urban zones. Evidence of reliability and validity based on current standards is presented. Data show good levels of reliability for all subtests. Some evidence of validity in terms of content, test structure, and association with other variables is presented. This instrument represents a novel approach and a new frontier in cognitive assessment. Further studies with clinical, rural, and cross-cultural populations are required.
The Reliability, Impact, and Cost-Effectiveness of Value-Added Teacher Assessment Methods
ERIC Educational Resources Information Center
Yeh, Stuart S.
2012-01-01
This article reviews evidence regarding the intertemporal reliability of teacher rankings based on value-added methods. Value-added methods exhibit low reliability, yet are broadly supported by prominent educational researchers and are increasingly being used to evaluate and fire teachers. The article then presents a cost-effectiveness analysis…
Web-Based Assessment of Mental Well-Being in Early Adolescence: A Reliability Study.
Hamann, Christoph; Schultze-Lutter, Frauke; Tarokh, Leila
2016-06-15
The ever-increasing use of the Internet among adolescents represents an emerging opportunity for researchers to gain access to larger samples, which can be queried over several years longitudinally. Among adolescents, young adolescents (ages 11 to 13 years) are of particular interest to clinicians as this is a transitional stage, during which depressive and anxiety symptoms often emerge. However, it remains unclear whether these youngest adolescents can accurately answer questions about their mental well-being using a Web-based platform. The aim of the study was to examine the accuracy of responses obtained from Web-based questionnaires by comparing Web-based with paper-and-pencil versions of depression and anxiety questionnaires. The primary outcome was the score on the depression and anxiety questionnaires under two conditions: (1) paper-and-pencil and (2) Web-based versions. Twenty-eight adolescents (aged 11-13 years, mean age 12.78 years and SD 0.78; 18 females, 64%) were randomly assigned to complete either the paper-and-pencil or the Web-based questionnaire first. Intraclass correlation coefficients (ICCs) were calculated to measure intrarater reliability. Intraclass correlation coefficients were calculated separately for depression (Children's Depression Inventory, CDI) and anxiety (Spence Children's Anxiety Scale, SCAS) questionnaires. On average, it took participants 17 minutes (SD 6) to answer 116 questions online. Intraclass correlation coefficient analysis revealed high intrarater reliability when comparing Web-based with paper-and-pencil responses for both CDI (ICC=.88; P<.001) and the SCAS (ICC=.95; P<.001). According to published criteria, both of these values are in the "almost perfect" category indicating the highest degree of reliability. The results of the study show an excellent reliability of Web-based assessment in 11- to 13-year-old children as compared with the standard paper-pencil assessment. Furthermore, we found that Web-based assessments with young adolescents are highly feasible, with all enrolled participants completing the Web-based form. As early adolescence is a time of remarkable social and behavioral changes, these findings open up new avenues for researchers from diverse fields who are interested in studying large samples of young adolescents over time.
Reliability of a smartphone-based goniometer for knee joint goniometry.
Ferriero, Giorgio; Vercelli, Stefano; Sartorio, Francesco; Muñoz Lasa, Susana; Ilieva, Elena; Brigatti, Elisa; Ruella, Carolina; Foti, Calogero
2013-06-01
The aim of this study was to assess the reliability of a smartphone-based application developed for photographic-based goniometry, DrGoniometer (DrG), by comparing its measurement of the knee joint angle with that made by a universal goniometer (UG). Joint goniometry is a common mode of clinical assessment used in many disciplines, in particular in rehabilitation. One validated method is photographic-based goniometry, but the procedure is usually complex: the image has to be downloaded from the camera to a computer and then edited using dedicated software. This disadvantage may be overcome by the new generation of mobile phones (smartphones) that have computer-like functionality and an integrated digital camera. This validation study was carried out under two different controlled conditions: (i) with the participant to measure in a fixed position and (ii) with a battery of pictures to assess. In the first part, four raters performed repeated measurements with DrG and UG at different knee joint angles. Then, 10 other raters measured the knee at different flexion angles ranging 20-145° on a battery of 35 pictures taken in a clinical setting. The results showed that inter-rater and intra-rater correlations were always more than 0.958. Agreement with the UG showed a width of 18.2° [95% limits of agreement (LoA)=-7.5/+10.7°] and 14.1° (LoA=-6.6/+7.5°). In conclusion, DrG seems to be a reliable method for measuring knee joint angle. This mHealth application can be an alternative/additional method of goniometry, easier to use than other photographic-based goniometric assessments. Further studies are required to assess its reliability for the measurement of other joints.
2013-01-01
Background The Parent-Infant Relationship Global Assessment Scale (PIR-GAS) signifies a conceptually relevant development in the multi-axial, developmentally sensitive classification system DC:0-3R for preschool children. However, information about the reliability and validity of the PIR-GAS is rare. A review of the available empirical studies suggests that in research, PIR-GAS ratings can be based on a ten-minute videotaped interaction sequence. The qualification of raters may be very heterogeneous across studies. Methods To test whether the use of the PIR-GAS still allows for a reliable assessment of the parent-infant relationship, our study compared a PIR-GAS ratings based on a full-information procedure across multiple settings with ratings based on a ten-minute video by two doctoral candidates of medicine. For each mother-child dyad at a family day hospital (N = 48), we obtained two video ratings and one full-information rating at admission to therapy and at discharge. This pre-post design allowed for a replication of our findings across the two measurement points. We focused on the inter-rater reliability between the video coders, as well as between the video and full-information procedure, including mean differences and correlations between the raters. Additionally, we examined aspects of the validity of video and full-information ratings based on their correlation with measures of child and maternal psychopathology. Results Our results showed that a ten-minute video and full-information PIR-GAS ratings were not interchangeable. Most results at admission could be replicated by the data obtained at discharge. We concluded that a higher degree of standardization of the assessment procedure should increase the reliability of the PIR-GAS, and a more thorough theoretical foundation of the manual should increase its validity. PMID:23705962
Moore, Amy Lawson; Miller, Terissa M
2018-01-01
The purpose of the current study is to evaluate the validity and reliability of the revised Gibson Test of Cognitive Skills, a computer-based battery of tests measuring short-term memory, long-term memory, processing speed, logic and reasoning, visual processing, as well as auditory processing and word attack skills. This study included 2,737 participants aged 5-85 years. A series of studies was conducted to examine the validity and reliability using the test performance of the entire norming group and several subgroups. The evaluation of the technical properties of the test battery included content validation by subject matter experts, item analysis and coefficient alpha, test-retest reliability, split-half reliability, and analysis of concurrent validity with the Woodcock Johnson III Tests of Cognitive Abilities and Tests of Achievement. Results indicated strong sources of evidence of validity and reliability for the test, including internal consistency reliability coefficients ranging from 0.87 to 0.98, test-retest reliability coefficients ranging from 0.69 to 0.91, split-half reliability coefficients ranging from 0.87 to 0.91, and concurrent validity coefficients ranging from 0.53 to 0.93. The Gibson Test of Cognitive Skills-2 is a reliable and valid tool for assessing cognition in the general population across the lifespan.
Validity and inter-observer reliability of subjective hand-arm vibration assessments.
Coenen, Pieter; Formanoy, Margriet; Douwes, Marjolein; Bosch, Tim; de Kraker, Heleen
2014-07-01
Exposure to mechanical vibrations at work (e.g., due to handling powered tools) is a potential occupational risk as it may cause upper extremity complaints. However, reliable and valid assessment methods for vibration exposure at work are lacking. Measuring hand-arm vibration objectively is often difficult and expensive, while often used information provided by manufacturers lacks detail. Therefore, a subjective hand-arm vibration assessment method was tested on validity and inter-observer reliability. In an experimental protocol, sixteen tasks handling powered tools were executed by two workers. Hand-arm vibration was assessed subjectively by 16 observers according to the proposed subjective assessment method. As a gold standard reference, hand-arm vibration was measured objectively using a vibration measurement device. Weighted κ's were calculated to assess validity, intra-class-correlation coefficients (ICCs) were calculated to assess inter-observer reliability. Inter-observer reliability of the subjective assessments depicting the agreement among observers can be expressed by an ICC of 0.708 (0.511-0.873). The validity of the subjective assessments as compared to the gold-standard reference can be expressed by a weighted κ of 0.535 (0.285-0.785). Besides, the percentage of exact agreement of the subjective assessment compared to the objective measurement was relatively low (i.e., 52% of all tasks). This study shows that subjectively assessed hand-arm vibrations are fairly reliable among observers and moderately valid. This assessment method is a first attempt to use subjective risk assessments of hand-arm vibration. Although, this assessment method can benefit from some future improvement, it can be of use in future studies and in field-based ergonomic assessments. Copyright © 2014 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Bridge reliability assessment based on the PDF of long-term monitored extreme strains
NASA Astrophysics Data System (ADS)
Jiao, Meiju; Sun, Limin
2011-04-01
Structural health monitoring (SHM) systems can provide valuable information for the evaluation of bridge performance. As the development and implementation of SHM technology in recent years, the data mining and use has received increasingly attention and interests in civil engineering. Based on the principle of probabilistic and statistics, a reliability approach provides a rational basis for analysis of the randomness in loads and their effects on structures. A novel approach combined SHM systems with reliability method to evaluate the reliability of a cable-stayed bridge instrumented with SHM systems was presented in this paper. In this study, the reliability of the steel girder of the cable-stayed bridge was denoted by failure probability directly instead of reliability index as commonly used. Under the assumption that the probability distributions of the resistance are independent to the responses of structures, a formulation of failure probability was deduced. Then, as a main factor in the formulation, the probability density function (PDF) of the strain at sensor locations based on the monitoring data was evaluated and verified. That Donghai Bridge was taken as an example for the application of the proposed approach followed. In the case study, 4 years' monitoring data since the operation of the SHM systems was processed, and the reliability assessment results were discussed. Finally, the sensitivity and accuracy of the novel approach compared with FORM was discussed.
Test-Retest Reliability of the Preschool Age Psychiatric Assessment (PAPA)
ERIC Educational Resources Information Center
Egger, Helen Link; Erkanli, Alaattin; Keeler, Gordon; Potts, Edward; Walter, Barbara Keith; Angold, Adrian
2006-01-01
Objective: To examine the test-retest reliability of a new interviewer-based psychiatric diagnostic measure (the Preschool Age Psychiatric Assessment) for use with parents of preschoolers 2 to 5 years old. Method: A total of 1,073 parents of children attending a large pediatric clinic completed the Child Behavior Checklist 1 1/2-5. For 18 months,…
Guo, Yiting Emily; Togher, Leanne; Power, Emma; Hutomo, Edwin; Yang, Yi-Fei; Tay, Arthur; Yen, Shih-Cheng; Koh, Gerald Choon-Huat
2017-04-01
Access2Aphasia™ is an iPad™-based aphasia assessment application that enables real-time audiovisual communication between people with aphasia (PWA) and speech-language pathologists (SLPs), and the use of supported conversation techniques. This study aimed to establish the reliability of aphasia assessment across the International Classification of Functioning, Disability and Health (ICF) using Access2Aphasia, and compare it with face-to-face (FTF) assessment. Consumer perspectives of Access2Aphasia were also examined. Thirty PWA were randomized into two conditions: online-led and FTF assessment. Participants in the online-led group were assessed remotely using Access2Aphasia™ in their own homes, while an FTF SLP scored silently simultaneously. Participants in the FTF group were assessed FTF using standard administration materials. Assessment included two subtests of the Psycholinguistic Assessment of Language Processing Activities (PALPA) and the Assessment of Living with Aphasia (ALA) to allow for outcomes to be captured across the ICF domains. Consumer perspectives on Access2Aphasia were obtained from both PWA and research SLPs in the online-led group. Kappa statistics indicated moderate to almost perfect agreement between online and FTF SLPs (k = 0.71-1.00). Intrarater and interrater reliability was excellent (ICC = 0.99-1.00) and equivalent for the online-led and FTF conditions. Both PWA and research SLPs in the online-led group reported being satisfied with the experience overall, with suggestions provided by research SLPs to improve Access2Aphasia. This study supports the provision of iPad-based aphasia assessments across the ICF in the online environment, with comparable reliability to FTF assessments. Future research is warranted to support the development of iPad-based aphasia assessment and treatment as an alternative mode of service delivery to PWA.
Development and evaluation of an instrument for assessing brief behavioral change interventions.
Strayer, Scott M; Martindale, James R; Pelletier, Sandra L; Rais, Salehin; Powell, Jon; Schorling, John B
2011-04-01
To develop an observational coding instrument for evaluating the fidelity and quality of brief behavioral change interventions based on the behavioral theories of the 5 A's, Stages of Change and Motivational Interviewing. Content and face validity were assessed prior to an intervention where psychometric properties were evaluated with a prospective cohort of 116 medical students. Properties assessed included the inter-rater reliability of the instrument, internal consistency of the full scale and sub-scales and descriptive statistics of the instrument. Construct validity was assessed based on student's scores. Inter-rater reliability for the instrument was 0.82 (intraclass correlation). Internal consistency for the full scale was 0.70 (KR20). Internal consistencies for the sub-scales were as follows: MI intervention component (KR20=.7); stage-appropriate MI-based intervention (KR20=.55); MI spirit (KR20=.5); appropriate assessment (KR20=.45) and appropriate assisting (KR20=.56). The instrument demonstrated good inter-rater reliability and moderate overall internal consistency when used to assess performing brief behavioral change interventions by medical students. This practical instrument can be used with minimal training and demonstrates promising psychometric properties when evaluated with medical students counseling standardized patients. Further testing is required to evaluate its usefulness in clinical settings. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Vinco, L J; Giacomelli, S; Campana, L; Chiari, M; Vitale, N; Lombardi, G; Veldkamp, T; Hocking, P M
2018-02-01
1. An experiment was conducted to compare 5 different methods for the evaluation of litter moisture. 2. For litter collection and assessment, 55 farms were selected, one shed from each farm was inspected and 9 points were identified within each shed. 3. For each device, used for the evaluation of litter moisture, mean and standard deviation of wetness measures per collection point were assessed. 4. The reliability and overall consistency between the 5 instruments used to measure wetness were high (α = 0.72). 5. Measurement of three out of the 9 collection points were sufficient to provide a reliable assessment of litter moisture throughout the shed. 6. Based on the direct correlation between litter moisture and footpad lesions, litter moisture measurement can be used as a resource based on-farm animal welfare indicator. 7. Among the 5 methods analysed, visual scoring is the most simple and practical, and therefore the best candidate to be used on-farm for animal welfare assessment.
A reliability analysis of the revised competitiveness index.
Harris, Paul B; Houston, John M
2010-06-01
This study examined the reliability of the Revised Competitiveness Index by investigating the test-retest reliability, interitem reliability, and factor structure of the measure based on a sample of 280 undergraduates (200 women, 80 men) ranging in age from 18 to 28 years (M = 20.1, SD = 2.1). The findings indicate that the Revised Competitiveness Index has high test-retest reliability, high inter-item reliability, and a stable factor structure. The results support the assertion that the Revised Competitiveness Index assesses competitiveness as a stable trait rather than a dynamic state.
The risk of bias in systematic reviews tool showed fair reliability and good construct validity.
Bühn, Stefanie; Mathes, Tim; Prengel, Peggy; Wegewitz, Uta; Ostermann, Thomas; Robens, Sibylle; Pieper, Dawid
2017-11-01
There is a movement from generic quality checklists toward a more domain-based approach in critical appraisal tools. This study aimed to report on a first experience with the newly developed risk of bias in systematic reviews (ROBIS) tool and compare it with A Measurement Tool to Assess Systematic Reviews (AMSTAR), that is, the most common used tool to assess methodological quality of systematic reviews while assessing validity, reliability, and applicability. Validation study with four reviewers based on 16 systematic reviews in the field of occupational health. Interrater reliability (IRR) of all four raters was highest for domain 2 (Fleiss' kappa κ = 0.56) and lowest for domain 4 (κ = 0.04). For ROBIS, median IRR was κ = 0.52 (range 0.13-0.88) for the experienced pair of raters compared to κ = 0.32 (range 0.12-0.76) for the less experienced pair of raters. The percentage of "yes" scores of each review of ROBIS ratings was strongly correlated with the AMSTAR ratings (r s = 0.76; P = 0.01). ROBIS has fair reliability and good construct validity to assess the risk of bias in systematic reviews. More validation studies are needed to investigate reliability and applicability, in particular. Copyright © 2017 Elsevier Inc. All rights reserved.
Frost, Rachael; Levati, Sara; McClurg, Doreen; Brady, Marian; Williams, Brian
2017-06-01
To systematically review methods for measuring adherence used in home-based rehabilitation trials and to evaluate their validity, reliability, and acceptability. In phase 1 we searched the CENTRAL database, NHS Economic Evaluation Database, and Health Technology Assessment Database (January 2000 to April 2013) to identify adherence measures used in randomized controlled trials of allied health professional home-based rehabilitation interventions. In phase 2 we searched the databases of MEDLINE, Embase, CINAHL, Allied and Complementary Medicine Database, PsycINFO, CENTRAL, ProQuest Nursing and Allied Health, and Web of Science (inception to April 2015) for measurement property assessments for each measure. Studies assessing the validity, reliability, or acceptability of adherence measures. Two reviewers independently extracted data on participant and measure characteristics, measurement properties evaluated, evaluation methods, and outcome statistics and assessed study quality using the COnsensus-based Standards for the selection of health Measurement INstruments checklist. In phase 1 we included 8 adherence measures (56 trials). In phase 2, from the 222 measurement property assessments identified in 109 studies, 22 high-quality measurement property assessments were narratively synthesized. Low-quality studies were used as supporting data. StepWatch Activity Monitor validly and acceptably measured short-term step count adherence. The Problematic Experiences of Therapy Scale validly and reliably assessed adherence to vestibular rehabilitation exercises. Adherence diaries had moderately high validity and acceptability across limited populations. The Borg 6 to 20 scale, Bassett and Prapavessis scale, and Yamax CW series had insufficient validity. Low-quality evidence supported use of the Joint Protection Behaviour Assessment. Polar A1 series heart monitors were considered acceptable by 1 study. Current rehabilitation adherence measures are limited. Some possess promising validity and acceptability for certain parameters of adherence, situations, and populations and should be used in these situations. Rigorous evaluation of adherence measures in a broader range of populations is needed. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Robot-Assisted Arm Assessments in Spinal Cord Injured Patients: A Consideration of Concept Study
Albisser, Urs; Rudhe, Claudia; Curt, Armin; Riener, Robert; Klamroth-Marganska, Verena
2015-01-01
Robotic assistance is increasingly used in neurological rehabilitation for enhanced training. Furthermore, therapy robots have the potential for accurate assessment of motor function in order to diagnose the patient status, to measure therapy progress or to feedback the movement performance to the patient and therapist in real time. We investigated whether a set of robot-based assessments that encompasses kinematic, kinetic and timing metrics is applicable, safe, reliable and comparable to clinical metrics for measurement of arm motor function. Twenty-four healthy subjects and five patients after spinal cord injury underwent robot-based assessments using the exoskeleton robot ARMin. Five different tasks were performed with aid of a visual display. Ten kinematic, kinetic and timing assessment parameters were extracted on joint- and end-effector level (active and passive range of motion, cubic reaching volume, movement time, distance-path ratio, precision, smoothness, reaction time, joint torques and joint stiffness). For cubic volume, joint torques and the range of motion for most joints, good inter- and intra-rater reliability were found whereas precision, movement time, distance-path ratio and smoothness showed weak to moderate reliability. A comparison with clinical scores revealed good correlations between robot-based joint torques and the Manual Muscle Test. Reaction time and distance-path ratio showed good correlation with the “Graded and Redefined Assessment of Strength, Sensibility and Prehension” (GRASSP) and the Van Lieshout Test (VLT) for movements towards a predefined position in the center of the frontal plane. In conclusion, the therapy robot ARMin provides a comprehensive set of assessments that are applicable and safe. The first results with spinal cord injured patients and healthy subjects suggest that the measurements are widely reliable and comparable to clinical scales for arm motor function. The methods applied and results can serve as a basis for the future development of end-effector and exoskeleton-based robotic assessments. PMID:25996374
Human Reliability Assessments: Using the Past (Shuttle) to Predict the Future (Orion)
NASA Technical Reports Server (NTRS)
DeMott, Diana L.; Bigler, Mark A.
2017-01-01
NASA (National Aeronautics and Space Administration) Johnson Space Center (JSC) Safety and Mission Assurance (S&MA) uses two human reliability analysis (HRA) methodologies. The first is a simplified method which is based on how much time is available to complete the action, with consideration included for environmental and personal factors that could influence the human's reliability. This method is expected to provide a conservative value or placeholder as a preliminary estimate. This preliminary estimate or screening value is used to determine which placeholder needs a more detailed assessment. The second methodology is used to develop a more detailed human reliability assessment on the performance of critical human actions. This assessment needs to consider more than the time available, this would include factors such as: the importance of the action, the context, environmental factors, potential human stresses, previous experience, training, physical design interfaces, available procedures/checklists and internal human stresses. The more detailed assessment is expected to be more realistic than that based primarily on time available. When performing an HRA on a system or process that has an operational history, we have information specific to the task based on this history and experience. In the case of a Probabilistic Risk Assessment (PRA) that is based on a new design and has no operational history, providing a "reasonable" assessment of potential crew actions becomes more challenging. To determine what is expected of future operational parameters, the experience from individuals who had relevant experience and were familiar with the system and process previously implemented by NASA was used to provide the "best" available data. Personnel from Flight Operations, Flight Directors, Launch Test Directors, Control Room Console Operators, and Astronauts were all interviewed to provide a comprehensive picture of previous NASA operations. Verification of the assumptions and expectations expressed in the assessments will be needed when the procedures, flight rules, and operational requirements are developed and then finalized.
Human Reliability Assessments: Using the Past (Shuttle) to Predict the Future (Orion)
NASA Technical Reports Server (NTRS)
DeMott, Diana; Bigler, Mark
2016-01-01
NASA (National Aeronautics and Space Administration) Johnson Space Center (JSC) Safety and Mission Assurance (S&MA) uses two human reliability analysis (HRA) methodologies. The first is a simplified method which is based on how much time is available to complete the action, with consideration included for environmental and personal factors that could influence the human's reliability. This method is expected to provide a conservative value or placeholder as a preliminary estimate. This preliminary estimate or screening value is used to determine which placeholder needs a more detailed assessment. The second methodology is used to develop a more detailed human reliability assessment on the performance of critical human actions. This assessment needs to consider more than the time available, this would include factors such as: the importance of the action, the context, environmental factors, potential human stresses, previous experience, training, physical design interfaces, available procedures/checklists and internal human stresses. The more detailed assessment is expected to be more realistic than that based primarily on time available. When performing an HRA on a system or process that has an operational history, we have information specific to the task based on this history and experience. In the case of a Probabilistic Risk Assessment (PRA) that is based on a new design and has no operational history, providing a "reasonable" assessment of potential crew actions becomes more challenging. In order to determine what is expected of future operational parameters, the experience from individuals who had relevant experience and were familiar with the system and process previously implemented by NASA was used to provide the "best" available data. Personnel from Flight Operations, Flight Directors, Launch Test Directors, Control Room Console Operators and Astronauts were all interviewed to provide a comprehensive picture of previous NASA operations. Verification of the assumptions and expectations expressed in the assessments will be needed when the procedures, flight rules and operational requirements are developed and then finalized.
Reliability and Probabilistic Risk Assessment - How They Play Together
NASA Technical Reports Server (NTRS)
Safie, Fayssal; Stutts, Richard; Huang, Zhaofeng
2015-01-01
Since the Space Shuttle Challenger accident in 1986, NASA has extensively used probabilistic analysis methods to assess, understand, and communicate the risk of space launch vehicles. Probabilistic Risk Assessment (PRA), used in the nuclear industry, is one of the probabilistic analysis methods NASA utilizes to assess Loss of Mission (LOM) and Loss of Crew (LOC) risk for launch vehicles. PRA is a system scenario based risk assessment that uses a combination of fault trees, event trees, event sequence diagrams, and probability distributions to analyze the risk of a system, a process, or an activity. It is a process designed to answer three basic questions: 1) what can go wrong that would lead to loss or degraded performance (i.e., scenarios involving undesired consequences of interest), 2) how likely is it (probabilities), and 3) what is the severity of the degradation (consequences). Since the Challenger accident, PRA has been used in supporting decisions regarding safety upgrades for launch vehicles. Another area that was given a lot of emphasis at NASA after the Challenger accident is reliability engineering. Reliability engineering has been a critical design function at NASA since the early Apollo days. However, after the Challenger accident, quantitative reliability analysis and reliability predictions were given more scrutiny because of their importance in understanding failure mechanism and quantifying the probability of failure, which are key elements in resolving technical issues, performing design trades, and implementing design improvements. Although PRA and reliability are both probabilistic in nature and, in some cases, use the same tools, they are two different activities. Specifically, reliability engineering is a broad design discipline that deals with loss of function and helps understand failure mechanism and improve component and system design. PRA is a system scenario based risk assessment process intended to assess the risk scenarios that could lead to a major/top undesirable system event, and to identify those scenarios that are high-risk drivers. PRA output is critical to support risk informed decisions concerning system design. This paper describes the PRA process and the reliability engineering discipline in detail. It discusses their differences and similarities and how they work together as complementary analyses to support the design and risk assessment processes. Lessons learned, applications, and case studies in both areas are also discussed in the paper to demonstrate and explain these differences and similarities.
NASA Technical Reports Server (NTRS)
Kleinhammer, Roger K.; Graber, Robert R.; DeMott, D. L.
2016-01-01
Reliability practitioners advocate getting reliability involved early in a product development process. However, when assigned to estimate or assess the (potential) reliability of a product or system early in the design and development phase, they are faced with lack of reasonable models or methods for useful reliability estimation. Developing specific data is costly and time consuming. Instead, analysts rely on available data to assess reliability. Finding data relevant to the specific use and environment for any project is difficult, if not impossible. Instead, analysts attempt to develop the "best" or composite analog data to support the assessments. Industries, consortia and vendors across many areas have spent decades collecting, analyzing and tabulating fielded item and component reliability performance in terms of observed failures and operational use. This data resource provides a huge compendium of information for potential use, but can also be compartmented by industry, difficult to find out about, access, or manipulate. One method used incorporates processes for reviewing these existing data sources and identifying the available information based on similar equipment, then using that generic data to derive an analog composite. Dissimilarities in equipment descriptions, environment of intended use, quality and even failure modes impact the "best" data incorporated in an analog composite. Once developed, this composite analog data provides a "better" representation of the reliability of the equipment or component. It can be used to support early risk or reliability trade studies, or analytical models to establish the predicted reliability data points. It also establishes a baseline prior that may updated based on test data or observed operational constraints and failures, i.e., using Bayesian techniques. This tutorial presents a descriptive compilation of historical data sources across numerous industries and disciplines, along with examples of contents and data characteristics. It then presents methods for combining failure information from different sources and mathematical use of this data in early reliability estimation and analyses.
NASA Astrophysics Data System (ADS)
McPhee, J.; William, Y. W.
2005-12-01
This work presents a methodology for pumping test design based on the reliability requirements of a groundwater model. Reliability requirements take into consideration the application of the model results in groundwater management, expressed in this case as a multiobjective management model. The pumping test design is formulated as a mixed-integer nonlinear programming (MINLP) problem and solved using a combination of genetic algorithm (GA) and gradient-based optimization. Bayesian decision theory provides a formal framework for assessing the influence of parameter uncertainty over the reliability of the proposed pumping test. The proposed methodology is useful for selecting a robust design that will outperform all other candidate designs under most potential 'true' states of the system
A Comparison of Laser and Video Techniques for Determining Displacement and Velocity during Running
ERIC Educational Resources Information Center
Harrison, Andrew J.; Jensen, Randall L.; Donoghue, Orna
2005-01-01
The reliability of a laser system was compared with the reliability of a video-based kinematic analysis in measuring displacement and velocity during running. Validity and reliability of the laser on static measures was also assessed at distances between 10 m and 70 m by evaluating the coefficient of variation and intraclass correlation…
NASA Astrophysics Data System (ADS)
Tamura, Yoshinobu; Yamada, Shigeru
OSS (open source software) systems which serve as key components of critical infrastructures in our social life are still ever-expanding now. Especially, embedded OSS systems have been gaining a lot of attention in the embedded system area, i.e., Android, BusyBox, TRON, etc. However, the poor handling of quality problem and customer support prohibit the progress of embedded OSS. Also, it is difficult for developers to assess the reliability and portability of embedded OSS on a single-board computer. In this paper, we propose a method of software reliability assessment based on flexible hazard rates for the embedded OSS. Also, we analyze actual data of software failure-occurrence time-intervals to show numerical examples of software reliability assessment for the embedded OSS. Moreover, we compare the proposed hazard rate model for the embedded OSS with the typical conventional hazard rate models by using the comparison criteria of goodness-of-fit. Furthermore, we discuss the optimal software release problem for the porting-phase based on the total expected software maintenance cost.
Pennathur, Arunkumar; Magham, Rohini; Contreras, Luis Rene; Dowling, Winifred
2004-01-01
The objective of the work reported in this paper is to assess test-retest reliability of Yale Physical Activity Survey Total Time, Estimated Energy Expenditure, Activity Dimension Indices, and Activities Check-list in older Mexican American men and women. A convenience-based healthy sample of 49 (42 women and 7 men) older Mexican American adults recruited from senior recreation centers aged 68 to 80 years volunteered to participate in this pilot study. Forty-nine older Mexican American adults filled out the Yale Physical Activity Survey for this study. Fifteen (12 women and 3 men) of the 49 volunteers responded twice to the Yale Physical Activity Survey after a 2-week period, and helped assess the test-retest reliability of the Yale Physical Activity Survey. Results indicate that based on a 2-week test-retest administration, the Yale Physical Activity Survey was found to have moderate (rhoI= .424, p < .05) to good reliability (rs = .789, p < .01) for physical activity assessment in older Mexican American adults who responded.
Molander, Linda; Hanberg, Annika; Rudén, Christina; Ågerstrand, Marlene; Beronius, Anna
2017-03-01
Different tools have been developed that facilitate systematic and transparent evaluation and handling of toxicity data in the risk assessment process. The present paper sets out to explore the combined use of two web-based tools for study evaluation and identification of reliable data relevant to health risk assessment. For this purpose, a case study was performed using in vivo toxicity studies investigating low-dose effects of bisphenol A on mammary gland development. The reliability of the mammary gland studies was evaluated using the Science in Risk Assessment and Policy (SciRAP) criteria for toxicity studies. The Health Assessment Workspace Collaborative (HAWC) was used for characterizing and visualizing the mammary gland data in terms of type of effects investigated and reported, and the distribution of these effects within the dose interval. It was then investigated whether there was any relationship between study reliability and the type of effects reported and/or their distribution in the dose interval. The combination of the SciRAP and HAWC tools allowed for transparent evaluation and visualization of the studies investigating developmental effects of BPA on the mammary gland. The use of these tools showed that there were no apparent differences in the type of effects and their distribution in the dose interval between the five studies assessed as most reliable and the whole data set. Combining the SciRAP and HAWC tools was found to be a useful approach for evaluating in vivo toxicity studies and identifying reliable and sensitive information relevant to regulatory risk assessment of chemicals. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Interrater reliability of the mind map assessment rubric in a cohort of medical students.
D'Antoni, Anthony V; Zipp, Genevieve Pinto; Olson, Valerie G
2009-04-28
Learning strategies are thinking tools that students can use to actively acquire information. Examples of learning strategies include mnemonics, charts, and maps. One strategy that may help students master the tsunami of information presented in medical school is the mind map learning strategy. Currently, there is no valid and reliable rubric to grade mind maps and this may contribute to their underutilization in medicine. Because concept maps and mind maps engage learners similarly at a metacognitive level, a valid and reliable concept map assessment scoring system was adapted to form the mind map assessment rubric (MMAR). The MMAR can assess mind map depth based upon concept-links, cross-links, hierarchies, examples, pictures, and colors. The purpose of this study was to examine interrater reliability of the MMAR. This exploratory study was conducted at a US medical school as part of a larger investigation on learning strategies. Sixty-six (N = 66) first-year medical students were given a 394-word text passage followed by a 30-minute presentation on mind mapping. After the presentation, subjects were again given the text passage and instructed to create mind maps based upon the passage. The mind maps were collected and independently scored using the MMAR by 3 examiners. Interrater reliability was measured using the intraclass correlation coefficient (ICC) statistic. Statistics were calculated using SPSS version 12.0 (Chicago, IL). Analysis of the mind maps revealed the following: concept-links ICC = .05 (95% CI, -.42 to .38), cross-links ICC = .58 (95% CI, .37 to .73), hierarchies ICC = .23 (95% CI, -.15 to .50), examples ICC = .53 (95% CI, .29 to .69), pictures ICC = .86 (95% CI, .79 to .91), colors ICC = .73 (95% CI, .59 to .82), and total score ICC = .86 (95% CI, .79 to .91). The high ICC value for total mind map score indicates strong MMAR interrater reliability. Pictures and colors demonstrated moderate to strong interrater reliability. We conclude that the MMAR may be a valid and reliable tool to assess mind maps in medicine. However, further research on the validity and reliability of the MMAR is necessary.
Interrater reliability of the mind map assessment rubric in a cohort of medical students
D'Antoni, Anthony V; Zipp, Genevieve Pinto; Olson, Valerie G
2009-01-01
Background Learning strategies are thinking tools that students can use to actively acquire information. Examples of learning strategies include mnemonics, charts, and maps. One strategy that may help students master the tsunami of information presented in medical school is the mind map learning strategy. Currently, there is no valid and reliable rubric to grade mind maps and this may contribute to their underutilization in medicine. Because concept maps and mind maps engage learners similarly at a metacognitive level, a valid and reliable concept map assessment scoring system was adapted to form the mind map assessment rubric (MMAR). The MMAR can assess mind map depth based upon concept-links, cross-links, hierarchies, examples, pictures, and colors. The purpose of this study was to examine interrater reliability of the MMAR. Methods This exploratory study was conducted at a US medical school as part of a larger investigation on learning strategies. Sixty-six (N = 66) first-year medical students were given a 394-word text passage followed by a 30-minute presentation on mind mapping. After the presentation, subjects were again given the text passage and instructed to create mind maps based upon the passage. The mind maps were collected and independently scored using the MMAR by 3 examiners. Interrater reliability was measured using the intraclass correlation coefficient (ICC) statistic. Statistics were calculated using SPSS version 12.0 (Chicago, IL). Results Analysis of the mind maps revealed the following: concept-links ICC = .05 (95% CI, -.42 to .38), cross-links ICC = .58 (95% CI, .37 to .73), hierarchies ICC = .23 (95% CI, -.15 to .50), examples ICC = .53 (95% CI, .29 to .69), pictures ICC = .86 (95% CI, .79 to .91), colors ICC = .73 (95% CI, .59 to .82), and total score ICC = .86 (95% CI, .79 to .91). Conclusion The high ICC value for total mind map score indicates strong MMAR interrater reliability. Pictures and colors demonstrated moderate to strong interrater reliability. We conclude that the MMAR may be a valid and reliable tool to assess mind maps in medicine. However, further research on the validity and reliability of the MMAR is necessary. PMID:19400964
Assessing the Reliability of Curriculum-Based Measurement: An Application of Latent Growth Modeling
ERIC Educational Resources Information Center
Yeo, Seungsoo; Kim, Dong-Il; Branum-Martin, Lee; Wayman, Miya Miura; Espin, Christine A.
2012-01-01
The purpose of this study was to demonstrate the use of Latent Growth Modeling (LGM) as a method for estimating reliability of Curriculum-Based Measurement (CBM) progress-monitoring data. The LGM approach permits the error associated with each measure to differ at each time point, thus providing an alternative method for examining of the…
Two Prophecy Formulas for Assessing the Reliability of Item Response Theory-Based Ability Estimates
ERIC Educational Resources Information Center
Raju, Nambury S.; Oshima, T.C.
2005-01-01
Two new prophecy formulas for estimating item response theory (IRT)-based reliability of a shortened or lengthened test are proposed. Some of the relationships between the two formulas, one of which is identical to the well-known Spearman-Brown prophecy formula, are examined and illustrated. The major assumptions underlying these formulas are…
Scharmanski, Sara; Renner, Ilona
2016-12-01
Health professionals in early childhood intervention and prevention make an important contribution by helping burdened families with young children cope with everyday life and child raising issues. A prerequisite for success is the health professionals' ability to tailor their services to the specific needs of families. The "Systematic Exploration and Process Inventory for health professionals in early childhood intervention services (SEVG)" can be used to identify each family's individual resources and needs, enabling a valid, reliable and objective assessment of the conditions and the process of counseling service. The present paper presents the statistical analyses that were used to confirm the reliability of the inventory. Based on the results of the reliability analysis and principal component analysis (PCA), the SEVG seems to be a reliable and objective inventory for assessing families' need for support. It also allows for calculation of average values of each scale. The development of valid and reliable assessments is essential to quality assurance and the professionalization of interventions in early childhood service. Copyright © 2016. Published by Elsevier GmbH.
de Witte, Annemarie M H; Hoozemans, Marco J M; Berger, Monique A M; van der Slikke, Rienk M A; van der Woude, Lucas H V; Veeger, Dirkjan H E J
2018-01-01
The aim of this study was to develop and describe a wheelchair mobility performance test in wheelchair basketball and to assess its construct validity and reliability. To mimic mobility performance of wheelchair basketball matches in a standardised manner, a test was designed based on observation of wheelchair basketball matches and expert judgement. Forty-six players performed the test to determine its validity and 23 players performed the test twice for reliability. Independent-samples t-tests were used to assess whether the times needed to complete the test were different for classifications, playing standards and sex. Intraclass correlation coefficients (ICC) were calculated to quantify reliability of performance times. Males performed better than females (P < 0.001, effect size [ES] = -1.26) and international men performed better than national men (P < 0.001, ES = -1.62). Performance time of low (≤2.5) and high (≥3.0) classification players was borderline not significant with a moderate ES (P = 0.06, ES = 0.58). The reliability was excellent for overall performance time (ICC = 0.95). These results show that the test can be used as a standardised mobility performance test to validly and reliably assess the capacity in mobility performance of elite wheelchair basketball athletes. Furthermore, the described methodology of development is recommended for use in other sports to develop sport-specific tests.
ASSESSING AND COMBINING RELIABILITY OF PROTEIN INTERACTION SOURCES
LEACH, SONIA; GABOW, AARON; HUNTER, LAWRENCE; GOLDBERG, DEBRA S.
2008-01-01
Integrating diverse sources of interaction information to create protein networks requires strategies sensitive to differences in accuracy and coverage of each source. Previous integration approaches calculate reliabilities of protein interaction information sources based on congruity to a designated ‘gold standard.’ In this paper, we provide a comparison of the two most popular existing approaches and propose a novel alternative for assessing reliabilities which does not require a gold standard. We identify a new method for combining the resultant reliabilities and compare it against an existing method. Further, we propose an extrinsic approach to evaluation of reliability estimates, considering their influence on the downstream tasks of inferring protein function and learning regulatory networks from expression data. Results using this evaluation method show 1) our method for reliability estimation is an attractive alternative to those requiring a gold standard and 2) the new method for combining reliabilities is less sensitive to noise in reliability assignments than the similar existing technique. PMID:17990508
A probability-based approach for assessment of roadway safety hardware.
DOT National Transportation Integrated Search
2017-03-14
This report presents a general probability-based approach for assessment of roadway safety hardware (RSH). It was achieved using a reliability : analysis method and computational techniques. With the development of high-fidelity finite element (FE) m...
Asgari, Fatemeh; Haghdoost, Faraidoon; Masjedi, Samaneh Sadat; Manouchehri, Navid; Banihashemi, Mahboobeh; Ghorbani, Abbas; Najafi, Mohammad Reza; Saadatnia, Mohammad; Lipton, Richard B.
2014-01-01
Introduction. MIDAS is a valid and reliable short questionnaire for assessment of headache related disability. Linguistic validation of Persian MIDAS and assessment of psychometric properties between tension type headache (TTH) and migraine were the aims of this study. Methods. Patients with migraine or TTH were included. At the first visit, we administered a headache symptom questionnaire, MIDAS, and SF-36. Patients filled out MIDAS in second and third visit within three and eight weeks after base line visit. Internal consistency (Cronbach α) and test-retest reproducibility (Spearman correlation coefficient) were used to assess reliability. Convergent validity and MIDAS capability to differentiate between chronic and episodic headaches (migraine and TTH) were also assessed. Results. The 267 participants had episodic migraine (EM-64%), chronic migraine (CM-13.5%), episodic TTH (ETTH-13.5%), and chronic TTH (CTTH-9). Internal consistency reliability was 0.8 for the entire sample, 0.72 for TTH, and 0.82 for migraine. Test-retest reliability for all questions between visit 1 and visit 2 varied from 0.54 to 0.71. Convergent validity was assessed using SF-36 as an external referent. Patients with episodic headaches (EM and ETTH) had significantly lower MIDAS scores than chronic headaches (CM and CTTH). Conclusion. Persian MIDAS is a valid and reliable questionnaire for migraine and TTH that can differentiate between episodic headache and chronic headache. PMID:24527462
Reliability-based structural optimization: A proposed analytical-experimental study
NASA Technical Reports Server (NTRS)
Stroud, W. Jefferson; Nikolaidis, Efstratios
1993-01-01
An analytical and experimental study for assessing the potential of reliability-based structural optimization is proposed and described. In the study, competing designs obtained by deterministic and reliability-based optimization are compared. The experimental portion of the study is practical because the structure selected is a modular, actively and passively controlled truss that consists of many identical members, and because the competing designs are compared in terms of their dynamic performance and are not destroyed if failure occurs. The analytical portion of this study is illustrated on a 10-bar truss example. In the illustrative example, it is shown that reliability-based optimization can yield a design that is superior to an alternative design obtained by deterministic optimization. These analytical results provide motivation for the proposed study, which is underway.
Choosing a reliability inspection plan for interval censored data
Lu, Lu; Anderson-Cook, Christine Michaela
2017-04-19
Reliability test plans are important for producing precise and accurate assessment of reliability characteristics. This paper explores different strategies for choosing between possible inspection plans for interval censored data given a fixed testing timeframe and budget. A new general cost structure is proposed for guiding precise quantification of total cost in inspection test plan. Multiple summaries of reliability are considered and compared as the criteria for choosing the best plans using an easily adapted method. Different cost structures and representative true underlying reliability curves demonstrate how to assess different strategies given the logistical constraints and nature of the problem. Resultsmore » show several general patterns exist across a wide variety of scenarios. Given the fixed total cost, plans that inspect more units with less frequency based on equally spaced time points are favored due to the ease of implementation and consistent good performance across a large number of case study scenarios. Plans with inspection times chosen based on equally spaced probabilities offer improved reliability estimates for the shape of the distribution, mean lifetime, and failure time for a small fraction of population only for applications with high infant mortality rates. The paper uses a Monte Carlo simulation based approach in addition to the common evaluation based on the asymptotic variance and offers comparison and recommendation for different applications with different objectives. Additionally, the paper outlines a variety of different reliability metrics to use as criteria for optimization, presents a general method for evaluating different alternatives, as well as provides case study results for different common scenarios.« less
Choosing a reliability inspection plan for interval censored data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lu, Lu; Anderson-Cook, Christine Michaela
Reliability test plans are important for producing precise and accurate assessment of reliability characteristics. This paper explores different strategies for choosing between possible inspection plans for interval censored data given a fixed testing timeframe and budget. A new general cost structure is proposed for guiding precise quantification of total cost in inspection test plan. Multiple summaries of reliability are considered and compared as the criteria for choosing the best plans using an easily adapted method. Different cost structures and representative true underlying reliability curves demonstrate how to assess different strategies given the logistical constraints and nature of the problem. Resultsmore » show several general patterns exist across a wide variety of scenarios. Given the fixed total cost, plans that inspect more units with less frequency based on equally spaced time points are favored due to the ease of implementation and consistent good performance across a large number of case study scenarios. Plans with inspection times chosen based on equally spaced probabilities offer improved reliability estimates for the shape of the distribution, mean lifetime, and failure time for a small fraction of population only for applications with high infant mortality rates. The paper uses a Monte Carlo simulation based approach in addition to the common evaluation based on the asymptotic variance and offers comparison and recommendation for different applications with different objectives. Additionally, the paper outlines a variety of different reliability metrics to use as criteria for optimization, presents a general method for evaluating different alternatives, as well as provides case study results for different common scenarios.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hamachi La Commare, Kristina
Metrics for reliability, such as the frequency and duration of power interruptions, have been reported by electric utilities for many years. This study examines current utility practices for collecting and reporting electricity reliability information and discusses challenges that arise in assessing reliability because of differences among these practices. The study is based on reliability information for year 2006 reported by 123 utilities in 37 states representing over 60percent of total U.S. electricity sales. We quantify the effects that inconsistencies among current utility reporting practices have on comparisons of System Average Interruption Duration Index (SAIDI) and System Average Interruption Frequency Indexmore » (SAIFI) reported by utilities. We recommend immediate adoption of IEEE Std. 1366-2003 as a consistent method for measuring and reporting reliability statistics.« less
The Future Value of Serious Games for Assessment: Where Do We Go Now?
ERIC Educational Resources Information Center
de Klerk, Sebastiaan; Kato, Pamela M.
2017-01-01
Game-based assessments will most likely be an increasing part of testing programs in future generations because they provide promising possibilities for more valid and reliable measurement of students' skills as compared to the traditional methods of assessment like paper-and-pencil tests or performance-based assessments. The current status of…
Paesani, Daniel A; Guarda-Nardini, Luca; Gelos, Carlota; Salmaso, Luigi; Manfredini, Daniele
2014-03-01
The aim was to answer the clinical research question: is incisal/occlusal tooth wear assessment on dental casts performed by five professionals with expertise in different fields of dentistry reliable? Five examiners with different fields of expertise in the dental profession assessed tooth wear on dental casts of 45 subjects, based on a six-degree rating of incisal/occlusal wear. After a calibration meeting, the examiners evaluated the casts individually and various issues concerning interexaminer agreement and reliability were assessed. A total of 872 teeth were evaluated. The five examiners agreed only for the rating of 6.6% of the teeth. The teeth with the highest percentage of agreement were the premolars. Pairwise comparison of the assessments of the examiners #1 (bruxism expert), #2 (orthodontist), #3 (temporomandibular disorders [TMD] and occlusion expert), #4 (dental nurse) showed fair to moderate agreement, with κ-values ranging from 0.306 to 0.577, whilst the examiner #5 (lab technician) achieved low interexaminer reliability values with all the other four examiners. The interexaminer reliability of tooth wear assessment on dental casts performed by five professionals with expertise in different fields of dentistry is highly variable. General practitioners should keep in mind that consensus decisions by the examiners and assessment by raters belonging to the same dental discipline are recommended strategies to increase the reliability of tooth wear evaluation in the clinical setting. This investigation adds to the literature suggesting that, in a clinical setting, a single examiner's assessment of tooth wear on dental casts does not have optimal reliability and that it may be source of internal validity problems in the research setting.
Dedy, Nicolas J; Szasz, Peter; Louridas, Marisa; Bonrath, Esther M; Husslein, Heinrich; Grantcharov, Teodor P
2015-06-01
Nontechnical skills are critical for patient safety in the operating room (OR). As a result, regulatory bodies for accreditation and certification have mandated the integration of these competencies into postgraduate education. A generally accepted approach to the in-training assessment of nontechnical skills, however, is lacking. The goal of the present study was to develop an evidence-based and reliable tool for the in-training assessment of residents' nontechnical performance in the OR. The Objective Structured Assessment of Nontechnical Skills tool was designed as a 5-point global rating scale with descriptive anchors for each item, based on existing evidence-based frameworks of nontechnical skills, as well as resident training requirements. The tool was piloted on scripted videos and refined in an iterative process. The final version was used to rate residents' performance in recorded OR crisis simulations and during live observations in the OR. A total of 37 simulations and 10 live procedures were rated. Interrater agreement was good for total mean scores, both in simulation and in the real OR, with intraclass correlation coefficients >0.90 in all settings for average and single measures. Internal consistency of the scale was high (Cronbach's alpha = 0.80). The Objective Structured Assessment of Nontechnical Skills global rating scale was developed as an evidence-based tool for the in-training assessment of residents' nontechnical performance in the OR. Unique descriptive anchors allow for a criterion-referenced assessment of performance. Good reliability was demonstrated in different settings, supporting applications in research and education. Copyright © 2015 Elsevier Inc. All rights reserved.
Reliability modeling of fault-tolerant computer based systems
NASA Technical Reports Server (NTRS)
Bavuso, Salvatore J.
1987-01-01
Digital fault-tolerant computer-based systems have become commonplace in military and commercial avionics. These systems hold the promise of increased availability, reliability, and maintainability over conventional analog-based systems through the application of replicated digital computers arranged in fault-tolerant configurations. Three tightly coupled factors of paramount importance, ultimately determining the viability of these systems, are reliability, safety, and profitability. Reliability, the major driver affects virtually every aspect of design, packaging, and field operations, and eventually produces profit for commercial applications or increased national security. However, the utilization of digital computer systems makes the task of producing credible reliability assessment a formidable one for the reliability engineer. The root of the problem lies in the digital computer's unique adaptability to changing requirements, computational power, and ability to test itself efficiently. Addressed here are the nuances of modeling the reliability of systems with large state sizes, in the Markov sense, which result from systems based on replicated redundant hardware and to discuss the modeling of factors which can reduce reliability without concomitant depletion of hardware. Advanced fault-handling models are described and methods of acquiring and measuring parameters for these models are delineated.
Bond, Mary Lou; Cason, Carolyn L
2014-01-01
To assess the content validity and internal consistency reliability of the Healthcare Professions Education Program Self-Assessment (PSA) and the Institutional Self-Assessment for Factors Supporting Hispanic Student Retention (ISA). Health disparities among vulnerable populations are among the top priorities demanding attention in the United States. Efforts to recruit and retain Hispanic nursing students are essential. Based on a sample of provosts, deans/directors, and an author of the Model of Institutional Support, participants commented on the perceived validity and usefulness of each item of the PSA and ISA. Internal consistency reliability was calculated by Cronbach's alpha using responses from nursing schools in states with large Hispanic populations. The ISA and PSA were found to be reliable and valid tools for assessing institutional friendliness. The instruments highlight strengths and identify potential areas of improvement at institutional and program levels.
Cordier, Reinie; Munro, Natalie; Wilkes-Gillan, Sarah; Speyer, Renée; Pearce, Wendy M
2014-07-01
There is a need for a reliable and valid assessment of childhood pragmatic language skills during peer-peer interactions. This study aimed to evaluate the psychometric properties of a newly developed pragmatic assessment, the Pragmatic Observational Measure (POM). The psychometric properties of the POM were investigated from observational data of two studies - study 1 involved 342 children aged 5-11 years (108 children with ADHD; 108 typically developing playmates; 126 children in the control group), and study 2 involved 9 children with ADHD who attended a 7-week play-based intervention. The psychometric properties of the POM were determined based on the COnsensus-based Standards for the selection of health status Measurement INstruments (COSMIN) taxonomy of psychometric properties and definitions for health-related outcomes; the Pragmatic Protocol was used as the reference tool against which the POM was evaluated. The POM demonstrated sound psychometric properties in all the reliability, validity and interpretability criteria against which it was assessed. The findings showed that the POM is a reliable and valid measure of pragmatic language skills of children with ADHD between the age of 5 and 11 years and has clinical utility in identifying children with pragmatic language difficulty. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
Visconti, Luca; Martin, Conchita
2013-01-01
The aim of this study was to evaluate both intra- and interoperator reliability of a radiological three-dimensional classification system (KPG index) for the assessment of degree of difficulty for orthodontic treatment of maxillary canine impactions. Cone beam computed tomography (CBCT) scans of fifty impacted canines, obtained using three different scanners (NewTom, Kodak, and Planmeca), were classified using the KPG index by three independent orthodontists. Measurements were repeated one month later. Based on these two sessions, several recommendations on KPG Index scoring were elaborated. After a joint calibration session, these recommendations were explained to nine orthodontists and the two measurement sessions were repeated. There was a moderate intrarater agreement in the precalibration measurement sessions. After the calibration session, both intra- and interrater agreement were almost perfect. Indexes assessed with Kodak Dental Imaging 3D module software showed a better reliability in z-axis values, whereas indexes assessed with Planmeca Romexis software showed a better reliability in x- and y-axis values. No differences were found between the CBCT scanners used. Taken together, these findings indicate that the application of the instructions elaborated during this study improved KPG index reliability, which was nevertheless variously influenced by the use of different software for images evaluation. PMID:24235889
NASA Technical Reports Server (NTRS)
Bean, E. E.; Bloomquist, C. E.
1972-01-01
A summary of the KSC program for investigating the reliability aspects of the ground support activities is presented. An analysis of unsatisfactory condition reports (RC), and the generation of reliability assessment of components based on the URC are discussed along with the design considerations for attaining reliable real time hardware/software configurations.
Deskovitz, Mark A; Weed, Nathan C; McLaughlan, Joseph K; Williams, John E
2016-04-01
The reliability of six Minnesota Multiphasic Personality Inventory-Second edition (MMPI-2) computer-based test interpretation (CBTI) programs was evaluated across a set of 20 commonly appearing MMPI-2 profile codetypes in clinical settings. Evaluation of CBTI reliability comprised examination of (a) interrater reliability, the degree to which raters arrive at similar inferences based on the same CBTI profile and (b) interprogram reliability, the level of agreement across different CBTI systems. Profile inferences drawn by four raters were operationalized using q-sort methodology. Results revealed no significant differences overall with regard to interrater and interprogram reliability. Some specific CBTI/profile combinations (e.g., the CBTI by Automated Assessment Associates on a within normal limits profile) and specific profiles (e.g., the 4/9 profile displayed greater interprogram reliability than the 2/4 profile) were interpreted with variable consensus (α range = .21-.95). In practice, users should consider that certain MMPI-2 profiles are interpreted more or less consensually and that some CBTIs show variable reliability depending on the profile. © The Author(s) 2015.
Risk assessment for construction projects of transport infrastructure objects
NASA Astrophysics Data System (ADS)
Titarenko, Boris
2017-10-01
The paper analyzes and compares different methods of risk assessment for construction projects of transport objects. The management of such type of projects demands application of special probabilistic methods due to large level of uncertainty of their implementation. Risk management in the projects requires the use of probabilistic and statistical methods. The aim of the work is to develop a methodology for using traditional methods in combination with robust methods that allow obtaining reliable risk assessments in projects. The robust approach is based on the principle of maximum likelihood and in assessing the risk allows the researcher to obtain reliable results in situations of great uncertainty. The application of robust procedures allows to carry out a quantitative assessment of the main risk indicators of projects when solving the tasks of managing innovation-investment projects. Calculation of damage from the onset of a risky event is possible by any competent specialist. And an assessment of the probability of occurrence of a risky event requires the involvement of special probabilistic methods based on the proposed robust approaches. Practice shows the effectiveness and reliability of results. The methodology developed in the article can be used to create information technologies and their application in automated control systems for complex projects.
Støre-Valen, Jakob; Ryum, Truls; Pedersen, Geir A F; Pripp, Are H; Jose, Paul E; Karterud, Sigmund
2015-09-01
The Global Assessment of Functioning (GAF) Scale is used in routine clinical practice and research to estimate symptom and functional severity and longitudinal change. Concerns about poor interrater reliability have been raised, and the present study evaluated the effect of a Web-based GAF training program designed to improve interrater reliability in routine clinical practice. Clinicians rated up to 20 vignettes online, and received deviation scores as immediate feedback (i.e., own scores compared with expert raters) after each rating. Growth curves of absolute SD scores across the vignettes were modeled. A linear mixed effects model, using the clinician's deviation scores from expert raters as the dependent variable, indicated an improvement in reliability during training. Moderation by content of scale (symptoms; functioning), scale range (average; extreme), previous experience with GAF rating, profession, and postgraduate training were assessed. Training reduced deviation scores for inexperienced GAF raters, for individuals in clinical professions other than nursing and medicine, and for individuals with no postgraduate specialization. In addition, training was most beneficial for cases with average severity of symptoms compared with cases with extreme severity. The results support the use of Web-based training with feedback routines as a means to improve the reliability of GAF ratings performed by clinicians in mental health practice. These results especially pertain to clinicians in mental health practice who do not have a masters or doctoral degree. (c) 2015 APA, all rights reserved.
Developing a Danish version of the "Impact on Participation and Autonomy Questionnaire".
Ghaziani, Emma; Krogh, Anne Grethe; Lund, Hans
2013-05-01
To translate the "Impact on Participation and Autonomy Questionnaire" into Danish (IPAQ-DK), and estimate its internal consistency and test-retest reliability in order to promote participation-based interventions and research. Translation and two successive reliability assessments through test-retest. 137 adults with varying degrees of impairment; of these, 67 participated in the final reliability assessment. The translation followed guidelines set forth by the "European Group for Quality of Life Assessment and Health Measurement". Internal consistency for subscales was estimated by Chronbach's alpha. Weighted kappa coefficients and intraclass correlation coefficients were calculated to assess the test-retest reliability at item and subscale level, respectively. A preliminary reliability assessment revealed residual issues regarding the translation and cultural adaptation of the instrument. The revised version (IPAQ-DK) was subsequently subjected to a similar assessment demonstrating Chronbach's alpha values from 0.698 to 0.817. Weighted kappa ranged from 0.370 to 0.880; 78% of these values were higher than 0.600. The intraclass correlation coefficient covered values from 0.701 to 0.818. IPAQ-DK is a useful instrument for identifying person-perceived participation restrictions and satisfaction with participation. Further studies of IPAQ-DK's floor/ceiling effects and responsiveness to change are recommended, and whether there is a need for further linguistic improvement of certain items.
Reliability Assessment Approach for Stirling Convertors and Generators
NASA Technical Reports Server (NTRS)
Shah, Ashwin R.; Schreiber, Jeffrey G.; Zampino, Edward; Best, Timothy
2004-01-01
Stirling power conversion is being considered for use in a Radioisotope Power System for deep-space science missions because it offers a multifold increase in the conversion efficiency of heat to electric power. Quantifying the reliability of a Radioisotope Power System that utilizes Stirling power conversion technology is important in developing and demonstrating the capability for long-term success. A description of the Stirling power convertor is provided, along with a discussion about some of the key components. Ongoing efforts to understand component life, design variables at the component and system levels, related sources, and the nature of uncertainties is discussed. The requirement for reliability also is discussed, and some of the critical areas of concern are identified. A section on the objectives of the performance model development and a computation of reliability is included to highlight the goals of this effort. Also, a viable physics-based reliability plan to model the design-level variable uncertainties at the component and system levels is outlined, and potential benefits are elucidated. The plan involves the interaction of different disciplines, maintaining the physical and probabilistic correlations at all the levels, and a verification process based on rational short-term tests. In addition, both top-down and bottom-up coherency were maintained to follow the physics-based design process and mission requirements. The outlined reliability assessment approach provides guidelines to improve the design and identifies governing variables to achieve high reliability in the Stirling Radioisotope Generator design.
Sevdalis, Nick; Undre, Shabnam; Henry, Janet; Sydney, Elaine; Koutantji, Mary; Darzi, Ara; Vincent, Charles A
2009-09-01
The recent emergence of the Systems Approach to the safety and quality of surgical care has triggered individual and team skills training modules for surgeons and anaesthetists and relevant observational assessment tools have been developed. To develop an observational tool that captures operating room (OR) nurses' technical skill and can be used for assessment and training. The Imperial College Assessment of Technical Skills for Nurses (ICATS-N) assesses (i) gowning and gloving, (ii) setting up instrumentation, (iii) draping, and (iv) maintaining sterility. Three to five observable behaviours have been identified for each skill and are rated on 1-6 scales. Feasibility and aspects of reliability and validity were assessed in 20 simulation-based crisis management training modules for trainee nurses and doctors, carried out in a Simulated Operating Room. The tool was feasible to use in the context of simulation-based training. Satisfactory reliability (Cronbach alpha) was obtained across trainers' and trainees' scores (analysed jointly and separately). Moreover, trainer nurse's ratings of the four skills correlated positively, thus indicating adequate content validity. Trainer's and trainees' ratings did not correlate. Assessment of OR nurses' technical skill is becoming a training priority. The present evidence suggests that the ICATS-N could be considered for use as an assessment/training tool for junior OR nurses.
ERIC Educational Resources Information Center
Srsen, Katja Groleger; Vidmar, Gaj; Pikl, Masa; Vrecar, Irena; Burja, Cirila; Krusec, Klavdija
2012-01-01
The Halliwick concept is widely used in different settings to promote joyful movement in water and swimming. To assess the swimming skills and progression of an individual swimmer, a valid and reliable measure should be used. The Halliwick-concept-based Swimming with Independent Measure (SWIM) was introduced for this purpose. We aimed to determine…
Assessing I-Grid(TM) web-based monitoring for power quality and reliability benchmarking
DOE Office of Scientific and Technical Information (OSTI.GOV)
Divan, Deepak; Brumsickle, William; Eto, Joseph
2003-04-30
This paper presents preliminary findings from DOEs pilot program. The results show how a web-based monitoring system can form the basis for aggregation of data and correlation and benchmarking across broad geographical lines. A longer report describes additional findings from the pilot, including impacts of power quality and reliability on customers operations [Divan, Brumsickle, Eto 2003].
Ceramic component reliability with the restructured NASA/CARES computer program
NASA Technical Reports Server (NTRS)
Powers, Lynn M.; Starlinger, Alois; Gyekenyesi, John P.
1992-01-01
The Ceramics Analysis and Reliability Evaluation of Structures (CARES) integrated design program on statistical fast fracture reliability and monolithic ceramic components is enhanced to include the use of a neutral data base, two-dimensional modeling, and variable problem size. The data base allows for the efficient transfer of element stresses, temperatures, and volumes/areas from the finite element output to the reliability analysis program. Elements are divided to insure a direct correspondence between the subelements and the Gaussian integration points. Two-dimensional modeling is accomplished by assessing the volume flaw reliability with shell elements. To demonstrate the improvements in the algorithm, example problems are selected from a round-robin conducted by WELFEP (WEakest Link failure probability prediction by Finite Element Postprocessors).
Test-retest reliability of the Military Pre-training Questionnaire.
Robinson, M; Stokes, K; Bilzon, J; Standage, M; Brown, P; Thompson, D
2010-09-01
Musculoskeletal injuries are a significant cause of morbidity during military training. A brief, inexpensive and user-friendly tool that demonstrates reliability and validity is warranted to effectively monitor the relationship between multiple predictor variables and injury incidence in military populations. To examine the test-retest reliability of the Military Pre-training Questionnaire (MPQ), designed specifically to assess risk factors for injury among military trainees across five domains (physical activity, injury history, diet, alcohol and smoking). Analyses were based on a convenience sample of 58 male British Army trainees. Kappa (kappa), weighted kappa (kappa(w)) and intraclass correlation coefficients (ICC) were used to evaluate the 2-week test-retest reliability of the MPQ. For index measures constituting the assessment of a given construct, internal consistency was assessed by Cronbach's alpha (alpha) coefficients. Reliability of individual items ranged from poor to almost perfect (kappa range = 0.45-0.86; kappa(w) range = 0.11-0.91; ICC range = 0.34-0.86) with most items demonstrating moderate reliability. Overall scores related to physical activity, diet, alcohol and smoking constructs were reliable between both administrations (ICC = 0.63-0.85). Support for the internal consistency of the incorporated alcohol (alpha = 0.78) and cigarette (alpha = 0.75) scales was also provided. The MPQ is a reliable self-report instrument for assessing multiple injury-related risk factors during initial military training. Further assessment of the psychometric properties of the MPQ (e.g. different types of validity) with military populations/samples will support its interpretation and use in future surveillance and epidemiological studies.
Reliability and Probabilistic Risk Assessment - How They Play Together
NASA Technical Reports Server (NTRS)
Safie, Fayssal M.; Stutts, Richard G.; Zhaofeng, Huang
2015-01-01
PRA methodology is one of the probabilistic analysis methods that NASA brought from the nuclear industry to assess the risk of LOM, LOV and LOC for launch vehicles. PRA is a system scenario based risk assessment that uses a combination of fault trees, event trees, event sequence diagrams, and probability and statistical data to analyze the risk of a system, a process, or an activity. It is a process designed to answer three basic questions: What can go wrong? How likely is it? What is the severity of the degradation? Since 1986, NASA, along with industry partners, has conducted a number of PRA studies to predict the overall launch vehicles risks. Planning Research Corporation conducted the first of these studies in 1988. In 1995, Science Applications International Corporation (SAIC) conducted a comprehensive PRA study. In July 1996, NASA conducted a two-year study (October 1996 - September 1998) to develop a model that provided the overall Space Shuttle risk and estimates of risk changes due to proposed Space Shuttle upgrades. After the Columbia accident, NASA conducted a PRA on the Shuttle External Tank (ET) foam. This study was the most focused and extensive risk assessment that NASA has conducted in recent years. It used a dynamic, physics-based, integrated system analysis approach to understand the integrated system risk due to ET foam loss in flight. Most recently, a PRA for Ares I launch vehicle has been performed in support of the Constellation program. Reliability, on the other hand, addresses the loss of functions. In a broader sense, reliability engineering is a discipline that involves the application of engineering principles to the design and processing of products, both hardware and software, for meeting product reliability requirements or goals. It is a very broad design-support discipline. It has important interfaces with many other engineering disciplines. Reliability as a figure of merit (i.e. the metric) is the probability that an item will perform its intended function(s) for a specified mission profile. In general, the reliability metric can be calculated through the analyses using reliability demonstration and reliability prediction methodologies. Reliability analysis is very critical for understanding component failure mechanisms and in identifying reliability critical design and process drivers. The following sections discuss the PRA process and reliability engineering in detail and provide an application where reliability analysis and PRA were jointly used in a complementary manner to support a Space Shuttle flight risk assessment.
Interrater Reliability of the Power Mobility Road Test in the Virtual Reality-Based Simulator-2.
Kamaraj, Deepan C; Dicianno, Brad E; Mahajan, Harshal P; Buhari, Alhaji M; Cooper, Rory A
2016-07-01
To assess interrater reliability of the Power Mobility Road Test (PMRT) when administered through the Virtual Reality-based SIMulator-version 2 (VRSIM-2). Within-subjects repeated-measures design. Participants interacted with VRSIM-2 through 2 display options (desktop monitor vs immersive virtual reality screens) using 2 control interfaces (roller system vs conventional movement-sensing joystick), providing 4 different driving scenarios (driving conditions 1-4). Participants performed 3 virtual driving sessions for each of the 2 display screens and 1 session through a real-world driving course (driving condition 5). The virtual PMRT was conducted in a simulated indoor office space, and an equivalent course was charted in an open space for the real-world assessment. After every change in driving condition, participants completed a self-reported workload assessment questionnaire, the Task Load Index, developed by the National Aeronautics and Space Administration. A convenience sample of electric-powered wheelchair (EPW) athletes (N=21) recruited at the 31st National Veterans Wheelchair Games. Not applicable. Total composite PMRT score. The PMRT had high interrater reliability (intraclass correlation coefficient [ICC]>.75) between the 2 raters in all 5 driving conditions. Post hoc analyses revealed that the reliability analyses had >80% power to detect high ICCs in driving conditions 1 and 4. The PMRT has high interrater reliability in conditions 1 and 4 and could be used to assess EPW driving performance virtually in VRSIM-2. However, further psychometric assessment is necessary to assess the feasibility of administering the PMRT using the different interfaces of VRSIM-2. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Factor structure, validity and reliability of the Cambridge Worry Scale in a pregnant population.
Green, Josephine M; Kafetsios, Konstantinos; Statham, Helen E; Snowdon, Claire M
2003-11-01
This article presents the Cambridge Worry Scale (CWS), a content-based measure for assessing worries, and discusses its psychometric properties based on a longitudinal study of 1,207 pregnant women. Principal components analysis revealed a four-factor structure of women's concerns during pregnancy: socio-medical, own health, socio-economic and relational. The measure demonstrated good reliability and validity. Total CWS scores were strongly associated with state and trait anxiety (convergent validity) but also had significant and unique predictive value for mood outcomes (discriminant validity). The CWS discriminated better between women with different reproductive histories than measures of state and trait anxiety. We conclude that the CWS is a reliable and valid tool for assessing the extent and content of worries in specific situations.
The development and validation of a golf swing and putt skill assessment for children.
Barnett, Lisa M; Hardy, Louise L; Brian, Ali S; Robertson, Sam
2015-03-01
The aim was to describe development of a process-oriented instrument designed to assess the golf swing and putt stroke, and to assess the instrument's discriminative validity in terms of age and reliability (intra-rater and re-test). A Delphi consultation (with golf industry professionals and researchers in movement skill assessment) was used to develop an assessment for each skill based on existing skill assessment protocols. Each skill had six components to be marked as present/absent. Individual scores were based on the number of performance components successfully demonstrated over two trials for each skill (potential score range 0 to 24). Children (n = 43) aged 6-10 years (M = 7.8 years, SD = 1.3) were assessed in both skills live in the field by one rater at Time 1(T1). A subset of children (n = 28) had consent for assessments to be videoed. Six weeks later 19 children were reassessed, five days apart (T2, T3). An ANOVA assessed discriminative validity i.e. whether skill competence at T1 differed by age (6 years, 7/8 years and 9/10 years). Intraclass correlations (ICC) assessed intra-rater reliability between the live and video assessment at T1 and test-retest reliability (between T2 and T3). Paired t-tests assessed any systematic differences between live and video assessments (T1) and between T2 and T3. Older children were more skilled (F (2, 40) = 11.18, p < 0.001). The live assessment reflected the video assessment (ICC = 0.79, 95% CI 0.59, 0.90) and scores did not differ between live and video assessments. Test retest reliability was acceptable (ICC = 0.60, 95% CI 0.23, 0.82), although the mean score was slightly higher at retest. This instrument could be used reliably by golf coaches and physical education teachers as part of systematic early player assessment and feedback. Key pointsGolf is becoming an increasingly popular sport among young children, however there is no standard protocol available to assess and identify skill deficits, mastery level, and talent identification in beginner young golf players.Process rather than product oriented outcomes better identify areas of skill deficit in young children.The proposed swing and putt instrument can reliably identify skill deficits in children of elementary school age who are new to golf and can be used by a range of stakeholders including golf coaches, generalist sport coaches and physical education teachers.
The Development and Validation of a Golf Swing and Putt Skill Assessment for Children
Barnett, Lisa M.; Hardy, Louise L.; Brian, Ali S.; Robertson, Sam
2015-01-01
The aim was to describe development of a process-oriented instrument designed to assess the golf swing and putt stroke, and to assess the instrument’s discriminative validity in terms of age and reliability (intra-rater and re-test). A Delphi consultation (with golf industry professionals and researchers in movement skill assessment) was used to develop an assessment for each skill based on existing skill assessment protocols. Each skill had six components to be marked as present/absent. Individual scores were based on the number of performance components successfully demonstrated over two trials for each skill (potential score range 0 to 24). Children (n = 43) aged 6-10 years (M = 7.8 years, SD = 1.3) were assessed in both skills live in the field by one rater at Time 1(T1). A subset of children (n = 28) had consent for assessments to be videoed. Six weeks later 19 children were reassessed, five days apart (T2, T3). An ANOVA assessed discriminative validity i.e. whether skill competence at T1 differed by age (6 years, 7/8 years and 9/10 years). Intraclass correlations (ICC) assessed intra-rater reliability between the live and video assessment at T1 and test-retest reliability (between T2 and T3). Paired t-tests assessed any systematic differences between live and video assessments (T1) and between T2 and T3. Older children were more skilled (F (2, 40) = 11.18, p < 0.001). The live assessment reflected the video assessment (ICC = 0.79, 95% CI 0.59, 0.90) and scores did not differ between live and video assessments. Test retest reliability was acceptable (ICC = 0.60, 95% CI 0.23, 0.82), although the mean score was slightly higher at retest. This instrument could be used reliably by golf coaches and physical education teachers as part of systematic early player assessment and feedback. Key points Golf is becoming an increasingly popular sport among young children, however there is no standard protocol available to assess and identify skill deficits, mastery level, and talent identification in beginner young golf players. Process rather than product oriented outcomes better identify areas of skill deficit in young children. The proposed swing and putt instrument can reliably identify skill deficits in children of elementary school age who are new to golf and can be used by a range of stakeholders including golf coaches, generalist sport coaches and physical education teachers. PMID:25729302
Training less-experienced faculty improves reliability of skills assessment in cardiac surgery.
Lou, Xiaoying; Lee, Richard; Feins, Richard H; Enter, Daniel; Hicks, George L; Verrier, Edward D; Fann, James I
2014-12-01
Previous work has demonstrated high inter-rater reliability in the objective assessment of simulated anastomoses among experienced educators. We evaluated the inter-rater reliability of less-experienced educators and the impact of focused training with a video-embedded coronary anastomosis assessment tool. Nine less-experienced cardiothoracic surgery faculty members from different institutions evaluated 2 videos of simulated coronary anastomoses (1 by a medical student and 1 by a resident) at the Thoracic Surgery Directors Association Boot Camp. They then underwent a 30-minute training session using an assessment tool with embedded videos to anchor rating scores for 10 components of coronary artery anastomosis. Afterward, they evaluated 2 videos of a different student and resident performing the task. Components were scored on a 1 to 5 Likert scale, yielding an average composite score. Inter-rater reliabilities of component and composite scores were assessed using intraclass correlation coefficients (ICCs) and overall pass/fail ratings with kappa. All components of the assessment tool exhibited improvement in reliability, with 4 (bite, needle holder use, needle angles, and hand mechanics) improving the most from poor (ICC range, 0.09-0.48) to strong (ICC range, 0.80-0.90) agreement. After training, inter-rater reliabilities for composite scores improved from moderate (ICC, 0.76) to strong (ICC, 0.90) agreement, and for overall pass/fail ratings, from poor (kappa = 0.20) to moderate (kappa = 0.78) agreement. Focused, video-based anchor training facilitates greater inter-rater reliability in the objective assessment of simulated coronary anastomoses. Among raters with less teaching experience, such training may be needed before objective evaluation of technical skills. Published by Elsevier Inc.
Smith, Justin D; Dishion, Thomas J; Brown, Kimbree; Ramos, Karina; Knoble, Naomi B; Shaw, Daniel S; Wilson, Melvin N
2016-01-01
The valid and reliable assessment of fidelity is critical at all stages of intervention research and is particularly germane to interpreting the results of efficacy and implementation trials. Ratings of protocol adherence typically are reliable, but ratings of therapist competence are plagued by low reliability. Because family context and case conceptualization guide the therapist's delivery of interventions, the reliability of fidelity ratings might be improved if the coder is privy to client context in the form of an ecological assessment. We conducted a randomized experiment to test this hypothesis. A subsample of 46 families with 5-year-old children from a multisite randomized trial who participated in the feedback session of the Family Check-Up (FCU) intervention were selected. We randomly assigned FCU feedback sessions to be rated for fidelity to the protocol using the COACH rating system either after the coder reviewed the results of a recent ecological assessment or had not. Inter-rater reliability estimates of fidelity ratings were meaningfully higher for the assessment information condition compared to the no-information condition. Importantly, the reliability of the COACH mean score was found to be statistically significantly higher in the information condition. These findings suggest that the reliability of observational ratings of fidelity, particularly when the competence or quality of delivery is considered, could be improved by providing assessment data to the coders. Our findings might be most applicable to assessment-driven interventions, where assessment data explicitly guides therapist's selection of intervention strategies tailored to the family's context and needs, but they could also apply to other intervention programs and observational coding of context-dependent therapy processes, such as the working alliance.
Smith, Justin D.; Dishion, Thomas J.; Brown, Kimbree; Ramos, Karina; Knoble, Naomi B.; Shaw, Daniel S.; Wilson, Melvin N.
2015-01-01
The valid and reliable assessment of fidelity is critical at all stages of intervention research and is particularly germane to interpreting the results of efficacy and implementation trials. Ratings of protocol adherence typically are reliable, but ratings of therapist competence are plagued by low reliability. Because family context and case conceptualization guide the therapist's delivery of interventions, the reliability of fidelity ratings might be improved if the coder is privy to client context in the form of an ecological assessment. We conducted a randomized experiment to test this hypothesis. A subsample of 46 families with 5-year-old children from a multisite randomized trial who participated in the feedback session of the Family Check-Up (FCU) intervention were selected. We randomly assigned FCU feedback sessions to be rated for fidelity to the protocol using the COACH rating system either after the coder reviewed the results of a recent ecological assessment or had not. Inter-rater reliability estimates of fidelity ratings were meaningfully higher for the assessment information condition compared to the no-information condition. Importantly, the reliability of the COACH mean score was found to be statistically significantly higher in the information condition. These findings suggest that the reliability of observational ratings of fidelity, particularly when the competence or quality of delivery is considered, could be improved by providing assessment data to the coders. Our findings might be most applicable to assessment-driven interventions, where assessment data explicitly guides therapist's selection of intervention strategies tailored to the family's context and needs, but they could also apply to other intervention programs and observational coding of context-dependent therapy processes, such as the working alliance. PMID:26271300
The reliability of the Glasgow Coma Scale: a systematic review.
Reith, Florence C M; Van den Brande, Ruben; Synnot, Anneliese; Gruen, Russell; Maas, Andrew I R
2016-01-01
The Glasgow Coma Scale (GCS) provides a structured method for assessment of the level of consciousness. Its derived sum score is applied in research and adopted in intensive care unit scoring systems. Controversy exists on the reliability of the GCS. The aim of this systematic review was to summarize evidence on the reliability of the GCS. A literature search was undertaken in MEDLINE, EMBASE and CINAHL. Observational studies that assessed the reliability of the GCS, expressed by a statistical measure, were included. Methodological quality was evaluated with the consensus-based standards for the selection of health measurement instruments checklist and its influence on results considered. Reliability estimates were synthesized narratively. We identified 52 relevant studies that showed significant heterogeneity in the type of reliability estimates used, patients studied, setting and characteristics of observers. Methodological quality was good (n = 7), fair (n = 18) or poor (n = 27). In good quality studies, kappa values were ≥0.6 in 85%, and all intraclass correlation coefficients indicated excellent reliability. Poor quality studies showed lower reliability estimates. Reliability for the GCS components was higher than for the sum score. Factors that may influence reliability include education and training, the level of consciousness and type of stimuli used. Only 13% of studies were of good quality and inconsistency in reported reliability estimates was found. Although the reliability was adequate in good quality studies, further improvement is desirable. From a methodological perspective, the quality of reliability studies needs to be improved. From a clinical perspective, a renewed focus on training/education and standardization of assessment is required.
Pirkle, Catherine M; Dumont, Alexandre; Traore, Mamadou; Zunzunegui, Maria-Victoria
2012-10-29
In Mali and Senegal, over 1% of women die giving birth in hospital. At some hospitals, over a third of infants are stillborn. Many deaths are due to substandard medical practices. Criterion-based clinical audits (CBCA) are increasingly used to measure and improve obstetrical care in resource-limited settings, but their measurement properties have not been formally evaluated. In 2011, we published a systematic review of obstetrical CBCA highlighting insufficient considerations of validity and reliability. The objective of this study is to develop an obstetrical CBCA adapted to the West African context and assess its reliability and validity. This work was conducted as a sub-study within a cluster randomized trial known as QUARITE. Criteria were selected based on extensive literature review and expert opinion. Early 2010, two auditors applied the CBCA to identical samples at 8 sites in Mali and Senegal (n = 185) to evaluate inter-rater reliability. In 2010-11, we conducted CBCA at 32 hospitals to assess construct validity (n = 633 patients). We correlated hospital characteristics (resource availability, facility perinatal and maternal mortality) with mean hospital CBCA scores. We used generalized estimating equations to assess whether patient CBCA scores were associated with perinatal mortality. Results demonstrate substantial (ICC = 0.67, 95% CI 0.54; 0.76) to elevated inter-rater reliability (ICC = 0.84, 95% CI 0.77; 0.89) in Senegal and Mali, respectively. Resource availability positively correlated with mean hospital CBCA scores and maternal and perinatal mortality were inversely correlated with hospital CBCA scores. Poor CBCA scores, adjusted for hospital and patient characteristics, were significantly associated with perinatal mortality (OR 1.84, 95% CI 1.01-3.34). Our CBCA has substantial inter-rater reliability and there is compelling evidence of its validity as the tool performs according to theory. Current Controlled Trials ISRCTN46950658.
Faudeux, Camille; Tran, Antoine; Dupont, Audrey; Desmontils, Jonathan; Montaudié, Isabelle; Bréaud, Jean; Braun, Marc; Fournier, Jean-Paul; Bérard, Etienne; Berlengi, Noémie; Schweitzer, Cyril; Haas, Hervé; Caci, Hervé; Gatin, Amélie; Giovannini-Chami, Lisa
2017-09-01
To develop a reliable and validated tool to evaluate technical resuscitation skills in a pediatric simulation setting. Four Resuscitation and Emergency Simulation Checklist for Assessment in Pediatrics (RESCAPE) evaluation tools were created, following international guidelines: intraosseous needle insertion, bag mask ventilation, endotracheal intubation, and cardiac massage. We applied a modified Delphi methodology evaluation to binary rating items. Reliability was assessed comparing the ratings of 2 observers (1 in real time and 1 after a video-recorded review). The tools were assessed for content, construct, and criterion validity, and for sensitivity to change. Inter-rater reliability, evaluated with Cohen kappa coefficients, was perfect or near-perfect (>0.8) for 92.5% of items and each Cronbach alpha coefficient was ≥0.91. Principal component analyses showed that all 4 tools were unidimensional. Significant increases in median scores with increasing levels of medical expertise were demonstrated for RESCAPE-intraosseous needle insertion (P = .0002), RESCAPE-bag mask ventilation (P = .0002), RESCAPE-endotracheal intubation (P = .0001), and RESCAPE-cardiac massage (P = .0037). Significantly increased median scores over time were also demonstrated during a simulation-based educational program. RESCAPE tools are reliable and validated tools for the evaluation of technical resuscitation skills in pediatric settings during simulation-based educational programs. They might also be used for medical practice performance evaluations. Copyright © 2017 Elsevier Inc. All rights reserved.
Brady, Karen; Cracknell, Nina; Zulch, Helen; Mills, Daniel Simon
2018-01-01
Working dogs are selected based on predictions from tests that they will be able to perform specific tasks in often challenging environments. However, withdrawal from service in working dogs is still a big problem, bringing into question the reliability of the selection tests used to make these predictions. A systematic review was undertaken aimed at bringing together available information on the reliability and predictive validity of the assessment of behavioural characteristics used with working dogs to establish the quality of selection tests currently available for use to predict success in working dogs. The search procedures resulted in 16 papers meeting the criteria for inclusion. A large range of behaviour tests and parameters were used in the identified papers, and so behaviour tests and their underpinning constructs were grouped on the basis of their relationship with positive core affect (willingness to work, human-directed social behaviour, object-directed play tendencies) and negative core affect (human-directed aggression, approach withdrawal tendencies, sensitivity to aversives). We then examined the papers for reports of inter-rater reliability, within-session intra-rater reliability, test-retest validity and predictive validity. The review revealed a widespread lack of information relating to the reliability and validity of measures to assess behaviour and inconsistencies in terminologies, study parameters and indices of success. There is a need to standardise the reporting of these aspects of behavioural tests in order to improve the knowledge base of what characteristics are predictive of optimal performance in working dog roles, improving selection processes and reducing working dog redundancy. We suggest the use of a framework based on explaining the direct or indirect relationship of the test with core affect.
Reducing random measurement error in assessing postural load on the back in epidemiologic surveys.
Burdorf, A
1995-02-01
The goal of this study was to design strategies to assess postural load on the back in occupational epidemiology by taking into account the reliability of measurement methods and the variability of exposure among the workers under study. Intermethod reliability studies were evaluated to estimate the systematic bias (accuracy) and random measurement error (precision) of various methods to assess postural load on the back. Intramethod reliability studies were reviewed to estimate random variability of back load over time. Intermethod surveys have shown that questionnaires have a moderate reliability for gross activities such as sitting, whereas duration of trunk flexion and rotation should be assessed by observation methods or inclinometers. Intramethod surveys indicate that exposure variability can markedly affect the reliability of estimates of back load if the estimates are based upon a single measurement over a certain time period. Equations have been presented to evaluate various study designs according to the reliability of the measurement method, the optimum allocation of the number of repeated measurements per subject, and the number of subjects in the study. Prior to a large epidemiologic study, an exposure-oriented survey should be conducted to evaluate the performance of measurement instruments and to estimate sources of variability for back load. The strategy for assessing back load can be optimized by balancing the number of workers under study and the number of repeated measurements per worker.
One-year test-retest reliability of intrinsic connectivity network fMRI in older adults
Guo, Cong C.; Kurth, Florian; Zhou, Juan; Mayer, Emeran A.; Eickhoff, Simon B; Kramer, Joel H.; Seeley, William W.
2014-01-01
“Resting-state” or task-free fMRI can assess intrinsic connectivity network (ICN) integrity in health and disease, suggesting a potential for use of these methods as disease-monitoring biomarkers. Numerous analytical options are available, including model-driven ROI-based correlation analysis and model-free, independent component analysis (ICA). High test-retest reliability will be a necessary feature of a successful ICN biomarker, yet available reliability data remains limited. Here, we examined ICN fMRI test-retest reliability in 24 healthy older subjects scanned roughly one year apart. We focused on the salience network, a disease-relevant ICN not previously subjected to reliability analysis. Most ICN analytical methods proved reliable (intraclass coefficients > 0.4) and could be further improved by wavelet analysis. Seed-based ROI correlation analysis showed high map-wise reliability, whereas graph theoretical measures and temporal concatenation group ICA produced the most reliable individual unit-wise outcomes. Including global signal regression in ROI-based correlation analyses reduced reliability. Our study provides a direct comparison between the most commonly used ICN fMRI methods and potential guidelines for measuring intrinsic connectivity in aging control and patient populations over time. PMID:22446491
Large-Scale Multiobjective Static Test Generation for Web-Based Testing with Integer Programming
ERIC Educational Resources Information Center
Nguyen, M. L.; Hui, Siu Cheung; Fong, A. C. M.
2013-01-01
Web-based testing has become a ubiquitous self-assessment method for online learning. One useful feature that is missing from today's web-based testing systems is the reliable capability to fulfill different assessment requirements of students based on a large-scale question data set. A promising approach for supporting large-scale web-based…
Mosmuller, David; Tan, Robin; Mulder, Frans; Bachour, Yara; de Vet, Henrica; Don Griot, Peter
2016-10-01
It is essential to have a reliable assessment method in order to compare the results of cleft lip and palate surgery. In this study the computer-based program SymNose, a method for quantitative assessment of the nose and lip, will be assessed on usability and reliability. The symmetry of the nose and lip was measured twice in 50 six-year-old complete and incomplete unilateral cleft lip and palate patients by four observers. For the frontal view the asymmetry level of the nose and upper lip were evaluated and for the basal view the asymmetry level of the nose and nostrils were evaluated. A mean inter-observer reliability when tracing each image once or twice was 0.70 and 0.75, respectively. Tracing the photographs with 2 observers and 4 observers gave a mean inter-observer score of 0.86 and 0.92, respectively. The mean intra-observer reliability varied between 0.80 and 0.84. SymNose is a practical and reliable tool for the retrospective assessment of large caseloads of 2D photographs of cleft patients for research purposes. Moderate to high single inter-observer reliability was found. For future research with SymNose reliable outcomes can be achieved by using the average outcomes of single tracings of two observers. Copyright © 2016 European Association for Cranio-Maxillo-Facial Surgery. Published by Elsevier Ltd. All rights reserved.
Photograph-based ergonomic evaluations using the Rapid Office Strain Assessment (ROSA).
Liebregts, J; Sonne, M; Potvin, J R
2016-01-01
The Rapid Office Strain Assessment (ROSA) was developed to assess musculoskeletal disorder (MSD) risk factors for computer workstations. This study examined the validity and reliability of remotely conducted, photo-based assessments using ROSA. Twenty-three office workstations were assessed on-site by an ergonomist, and 5 photos were obtained. Photo-based assessments were conducted by three ergonomists. The sensitivity and specificity of the photo-based assessors' ability to correctly classify workstations was 79% and 55%, respectively. The moderate specificity associated with false positive errors committed by the assessors could lead to unnecessary costs to the employer. Error between on-site and photo-based final scores was a considerable ∼2 points on the 10-point ROSA scale (RMSE = 2.3), with a moderate relationship (ρ = 0.33). Interrater reliability ranged from fairly good to excellent (ICC = 0.667-0.856) and was comparable to previous results. Sources of error include the parallax effect, poor estimations of small joint (e.g. hand/wrist) angles, and boundary errors in postural binning. While this method demonstrated potential validity, further improvements should be made with respect to photo-collection and other protocols for remotely-based ROSA assessments. Copyright © 2015 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Approach to developing reliable space reactor power systems
NASA Technical Reports Server (NTRS)
Mondt, Jack F.; Shinbrot, Charles H.
1991-01-01
During Phase II, the Engineering Development Phase, the SP-100 Project has defined and is pursuing a new approach to developing reliable power systems. The approach to developing such a system during the early technology phase is described along with some preliminary examples to help explain the approach. Developing reliable components to meet space reactor power system requirements is based on a top-down systems approach which includes a point design based on a detailed technical specification of a 100-kW power system. The SP-100 system requirements implicitly recognize the challenge of achieving a high system reliability for a ten-year lifetime, while at the same time using technologies that require very significant development efforts. A low-cost method for assessing reliability, based on an understanding of fundamental failure mechanisms and design margins for specific failure mechanisms, is being developed as part of the SP-100 Program.
A proposed method to investigate reliability throughout a questionnaire.
Wentzel-Larsen, Tore; Norekvål, Tone M; Ulvik, Bjørg; Nygård, Ottar; Pripp, Are H
2011-10-05
Questionnaires are used extensively in medical and health care research and depend on validity and reliability. However, participants may differ in interest and awareness throughout long questionnaires, which can affect reliability of their answers. A method is proposed for "screening" of systematic change in random error, which could assess changed reliability of answers. A simulation study was conducted to explore whether systematic change in reliability, expressed as changed random error, could be assessed using unsupervised classification of subjects by cluster analysis (CA) and estimation of intraclass correlation coefficient (ICC). The method was also applied on a clinical dataset from 753 cardiac patients using the Jalowiec Coping Scale. The simulation study showed a relationship between the systematic change in random error throughout a questionnaire and the slope between the estimated ICC for subjects classified by CA and successive items in a questionnaire. This slope was proposed as an awareness measure--to assessing if respondents provide only a random answer or one based on a substantial cognitive effort. Scales from different factor structures of Jalowiec Coping Scale had different effect on this awareness measure. Even though assumptions in the simulation study might be limited compared to real datasets, the approach is promising for assessing systematic change in reliability throughout long questionnaires. Results from a clinical dataset indicated that the awareness measure differed between scales.
Constructing the "Best" Reliability Data for the Job
NASA Technical Reports Server (NTRS)
DeMott, D. L.; Kleinhammer, R. K.
2014-01-01
Modern business and technical decisions are based on the results of analyses. When considering assessments using "reliability data", the concern is how long a system will continue to operate as designed. Generally, the results are only as good as the data used. Ideally, a large set of pass/fail tests or observations to estimate the probability of failure of the item under test would produce the best data. However, this is a costly endeavor if used for every analysis and design. Developing specific data is costly and time consuming. Instead, analysts rely on available data to assess reliability. Finding data relevant to the specific use and environment for any project is difficult, if not impossible. Instead, we attempt to develop the "best" or composite analog data to support our assessments. One method used incorporates processes for reviewing existing data sources and identifying the available information based on similar equipment, then using that generic data to derive an analog composite. Dissimilarities in equipment descriptions, environment of intended use, quality and even failure modes impact the "best" data incorporated in an analog composite. Once developed, this composite analog data provides a "better" representation of the reliability of the equipment or component can be used to support early risk or reliability trade studies, or analytical models to establish the predicted reliability data points. Data that is more representative of reality and more project specific would provide more accurate analysis, and hopefully a better final decision.
Constructing the Best Reliability Data for the Job
NASA Technical Reports Server (NTRS)
Kleinhammer, R. K.; Kahn, J. C.
2014-01-01
Modern business and technical decisions are based on the results of analyses. When considering assessments using "reliability data", the concern is how long a system will continue to operate as designed. Generally, the results are only as good as the data used. Ideally, a large set of pass/fail tests or observations to estimate the probability of failure of the item under test would produce the best data. However, this is a costly endeavor if used for every analysis and design. Developing specific data is costly and time consuming. Instead, analysts rely on available data to assess reliability. Finding data relevant to the specific use and environment for any project is difficult, if not impossible. Instead, we attempt to develop the "best" or composite analog data to support our assessments. One method used incorporates processes for reviewing existing data sources and identifying the available information based on similar equipment, then using that generic data to derive an analog composite. Dissimilarities in equipment descriptions, environment of intended use, quality and even failure modes impact the "best" data incorporated in an analog composite. Once developed, this composite analog data provides a "better" representation of the reliability of the equipment or component can be used to support early risk or reliability trade studies, or analytical models to establish the predicted reliability data points. Data that is more representative of reality and more project specific would provide more accurate analysis, and hopefully a better final decision.
Development of self and peer performance assessment on iodometric titration experiment
NASA Astrophysics Data System (ADS)
Nahadi; Siswaningsih, W.; Kusumaningtyas, H.
2018-05-01
This study aims to describe the process in developing of reliable and valid assessment to measure students’ performance on iodometric titration and the effect of the self and peer assessment on students’ performance. The self and peer-instrument provides valuable feedback for the student performance improvement. The developed assessment contains rubric and task for facilitating self and peer assessment. The participants are 24 students at the second-grade student in certain vocational high school in Bandung. The participants divided into two groups. The first 12 students involved in the validity test of the developed assessment, while the remain 12 students participated for the reliability test. The content validity was evaluated based on the judgment experts. Test result of content validity based on judgment expert show that the developed performance assessment instrument categorized as valid on each task with the realibity classified as very good. Analysis of the impact of the self and peer assessment implementation showed that the peer instrument supported the self assessment.
NASA Astrophysics Data System (ADS)
Gurov, V. V.
2017-01-01
Software tools for educational purposes, such as e-lessons, computer-based testing system, from the point of view of reliability, have a number of features. The main ones among them are the need to ensure a sufficiently high probability of their faultless operation for a specified time, as well as the impossibility of their rapid recovery by the way of replacing it with a similar running program during the classes. The article considers the peculiarities of reliability evaluation of programs in contrast to assessments of hardware reliability. The basic requirements to reliability of software used for carrying out practical and laboratory classes in the form of computer-based training programs are given. The essential requirements applicable to the reliability of software used for conducting the practical and laboratory studies in the form of computer-based teaching programs are also described. The mathematical tool based on Markov chains, which allows to determine the degree of debugging of the training program for use in the educational process by means of applying the graph of the software modules interaction, is presented.
DOT National Transportation Integrated Search
2016-09-01
Travel time and travel-time reliability have been used as performance : measures to evaluate traffic system conditions and develop advanced : traveler information and traffic management systems. The objectives of this research were to: : - Quantify s...
Organizational readiness for implementing change: a psychometric assessment of a new measure.
Shea, Christopher M; Jacobs, Sara R; Esserman, Denise A; Bruce, Kerry; Weiner, Bryan J
2014-01-10
Organizational readiness for change in healthcare settings is an important factor in successful implementation of new policies, programs, and practices. However, research on the topic is hindered by the absence of a brief, reliable, and valid measure. Until such a measure is developed, we cannot advance scientific knowledge about readiness or provide evidence-based guidance to organizational leaders about how to increase readiness. This article presents results of a psychometric assessment of a new measure called Organizational Readiness for Implementing Change (ORIC), which we developed based on Weiner's theory of organizational readiness for change. We conducted four studies to assess the psychometric properties of ORIC. In study one, we assessed the content adequacy of the new measure using quantitative methods. In study two, we examined the measure's factor structure and reliability in a laboratory simulation. In study three, we assessed the reliability and validity of an organization-level measure of readiness based on aggregated individual-level data from study two. In study four, we conducted a small field study utilizing the same analytic methods as in study three. Content adequacy assessment indicated that the items developed to measure change commitment and change efficacy reflected the theoretical content of these two facets of organizational readiness and distinguished the facets from hypothesized determinants of readiness. Exploratory and confirmatory factor analysis in the lab and field studies revealed two correlated factors, as expected, with good model fit and high item loadings. Reliability analysis in the lab and field studies showed high inter-item consistency for the resulting individual-level scales for change commitment and change efficacy. Inter-rater reliability and inter-rater agreement statistics supported the aggregation of individual level readiness perceptions to the organizational level of analysis. This article provides evidence in support of the ORIC measure. We believe this measure will enable testing of theories about determinants and consequences of organizational readiness and, ultimately, assist healthcare leaders to reduce the number of health organization change efforts that do not achieve desired benefits. Although ORIC shows promise, further assessment is needed to test for convergent, discriminant, and predictive validity.
Organizational readiness for implementing change: a psychometric assessment of a new measure
2014-01-01
Background Organizational readiness for change in healthcare settings is an important factor in successful implementation of new policies, programs, and practices. However, research on the topic is hindered by the absence of a brief, reliable, and valid measure. Until such a measure is developed, we cannot advance scientific knowledge about readiness or provide evidence-based guidance to organizational leaders about how to increase readiness. This article presents results of a psychometric assessment of a new measure called Organizational Readiness for Implementing Change (ORIC), which we developed based on Weiner’s theory of organizational readiness for change. Methods We conducted four studies to assess the psychometric properties of ORIC. In study one, we assessed the content adequacy of the new measure using quantitative methods. In study two, we examined the measure’s factor structure and reliability in a laboratory simulation. In study three, we assessed the reliability and validity of an organization-level measure of readiness based on aggregated individual-level data from study two. In study four, we conducted a small field study utilizing the same analytic methods as in study three. Results Content adequacy assessment indicated that the items developed to measure change commitment and change efficacy reflected the theoretical content of these two facets of organizational readiness and distinguished the facets from hypothesized determinants of readiness. Exploratory and confirmatory factor analysis in the lab and field studies revealed two correlated factors, as expected, with good model fit and high item loadings. Reliability analysis in the lab and field studies showed high inter-item consistency for the resulting individual-level scales for change commitment and change efficacy. Inter-rater reliability and inter-rater agreement statistics supported the aggregation of individual level readiness perceptions to the organizational level of analysis. Conclusions This article provides evidence in support of the ORIC measure. We believe this measure will enable testing of theories about determinants and consequences of organizational readiness and, ultimately, assist healthcare leaders to reduce the number of health organization change efforts that do not achieve desired benefits. Although ORIC shows promise, further assessment is needed to test for convergent, discriminant, and predictive validity. PMID:24410955
Lee, Ya-Chen; Yu, Wan-Hui; Hsueh, I-Ping; Chen, Sheng-Shiung; Hsieh, Ching-Lin
2017-10-01
A lack of evidence on the test-retest reliability and responsiveness limits the utility of the BI-based Supplementary Scales (BI-SS) in both clinical and research settings. To examine the test-retest reliability and responsiveness of the BI-based Supplementary Scales (BI-SS) in patients with stroke. A repeated-assessments design (1 week apart) was used to examine the test-retest reliability of the BI-SS. For the responsiveness study, the participants were assessed with the BI-SS and BI (treated as an external criterion) at admission to and discharge from rehabilitation wards. Seven outpatient rehabilitation units and one inpatient rehabilitation unit. Outpatients with chronic stroke. Eighty-four outpatients with chronic stroke participated in the test-retest reliability study. Fifty-seven inpatients completed baseline and follow-up assessments in the responsiveness study. For the test-retest reliability study, the values of the intra-class correlation coefficient and the overall percentage of minimal detectable change for the Ability Scale and Self-perceived Difficulty Scale were 0.97, 12.8%, and 0.78, 35.8%, respectively. For the responsiveness study, the standardized effect size and standardized response mean (representing internal responsiveness) of the Ability Scale and Self-perceived Difficulty Scale were 1.17 and 1.56, and 0.78 and 0.89, respectively. Regarding external responsiveness, the change in score of the Ability Scale had significant and moderate association with that of the BI (r=0.61, P<0.001). The change in score of the Self-perceived Difficulty Scale had non-significant and weak association with that of the BI (r=0.23, P=0.080). The Ability Scale of the BI-SS has satisfactory test-retest reliability and sufficient responsiveness for patients with stroke. However, the Self-perceived Difficulty Scale of the BI-SS has substantial random measurement error and insufficient external responsiveness, which may affect its utility in clinical settings. The findings of this study provide empirical evidence of psychometric properties of the BI-SS for assessing ability and self-perceived difficulty of ADL in patients with stroke.
Duracinsky, Martin; Lalanne, Christophe; Goujard, Cécile; Herrmann, Susan; Cheung-Lung, Christian; Brosseau, Jean-Paul; Schwartz, Yannick; Chassany, Olivier
2014-04-25
Electronic patient-reported outcomes (PRO) provide quick and usually reliable assessments of patients' health-related quality of life (HRQL). An electronic version of the Patient-Reported Outcomes Quality of Life-human immunodeficiency virus (PROQOL-HIV) questionnaire was developed, and its face validity and reliability were assessed using standard psychometric methods. A sample of 80 French outpatients (66% male, 52/79; mean age 46.7 years, SD 10.9) were recruited. Paper-based and electronic questionnaires were completed in a randomized crossover design (2-7 day interval). Biomedical data were collected. Questionnaire version and order effects were tested on full-scale scores in a 2-way ANOVA with patients as random effects. Test-retest reliability was evaluated using Pearson and intraclass correlation coefficients (ICC, with 95% confidence interval) for each dimension. Usability testing was carried out from patients' survey reports, specifically, general satisfaction, ease of completion, quality and clarity of user interface, and motivation to participate in follow-up PROQOL-HIV electronic assessments. Questionnaire version and administration order effects (N=59 complete cases) were not significant at the 5% level, and no interaction was found between these 2 factors (P=.94). Reliability indexes were acceptable, with Pearson correlations greater than .7 and ICCs ranging from .708 to .939; scores were not statistically different between the two versions. A total of 63 (79%) complete patients' survey reports were available, and 55% of patients (30/55) reported being satisfied and interested in electronic assessment of their HRQL in clinical follow-up. Individual ratings of PROQOL-HIV user interface (85%-100% of positive responses) confirmed user interface clarity and usability. The electronic PROQOL-HIV introduces minor modifications to the original paper-based version, following International Society for Pharmacoeconomics and Outcomes Research (ISPOR) ePRO Task Force guidelines, and shows good reliability and face validity. Patients can complete the computerized PROQOL-HIV questionnaire and the scores from the paper or electronic versions share comparable accuracy and interpretation.
Lalanne, Christophe; Goujard, Cécile; Herrmann, Susan; Cheung-Lung, Christian; Brosseau, Jean-Paul; Schwartz, Yannick; Chassany, Olivier
2014-01-01
Background Electronic patient-reported outcomes (PRO) provide quick and usually reliable assessments of patients’ health-related quality of life (HRQL). Objective An electronic version of the Patient-Reported Outcomes Quality of Life-human immunodeficiency virus (PROQOL-HIV) questionnaire was developed, and its face validity and reliability were assessed using standard psychometric methods. Methods A sample of 80 French outpatients (66% male, 52/79; mean age 46.7 years, SD 10.9) were recruited. Paper-based and electronic questionnaires were completed in a randomized crossover design (2-7 day interval). Biomedical data were collected. Questionnaire version and order effects were tested on full-scale scores in a 2-way ANOVA with patients as random effects. Test-retest reliability was evaluated using Pearson and intraclass correlation coefficients (ICC, with 95% confidence interval) for each dimension. Usability testing was carried out from patients’ survey reports, specifically, general satisfaction, ease of completion, quality and clarity of user interface, and motivation to participate in follow-up PROQOL-HIV electronic assessments. Results Questionnaire version and administration order effects (N=59 complete cases) were not significant at the 5% level, and no interaction was found between these 2 factors (P=.94). Reliability indexes were acceptable, with Pearson correlations greater than .7 and ICCs ranging from .708 to .939; scores were not statistically different between the two versions. A total of 63 (79%) complete patients’ survey reports were available, and 55% of patients (30/55) reported being satisfied and interested in electronic assessment of their HRQL in clinical follow-up. Individual ratings of PROQOL-HIV user interface (85%-100% of positive responses) confirmed user interface clarity and usability. Conclusions The electronic PROQOL-HIV introduces minor modifications to the original paper-based version, following International Society for Pharmacoeconomics and Outcomes Research (ISPOR) ePRO Task Force guidelines, and shows good reliability and face validity. Patients can complete the computerized PROQOL-HIV questionnaire and the scores from the paper or electronic versions share comparable accuracy and interpretation. PMID:24769643
Questionnaire-based assessment of executive functioning: Psychometrics.
Castellanos, Irina; Kronenberger, William G; Pisoni, David B
2018-01-01
The psychometric properties of the Learning, Executive, and Attention Functioning (LEAF) scale were investigated in an outpatient clinical pediatric sample. As a part of clinical testing, the LEAF scale, which broadly measures neuropsychological abilities related to executive functioning and learning, was administered to parents of 118 children and adolescents referred for psychological testing at a pediatric psychology clinic; 85 teachers also completed LEAF scales to assess reliability across different raters and settings. Scores on neuropsychological tests of executive functioning and academic achievement were abstracted from charts. Psychometric analyses of the LEAF scale demonstrated satisfactory internal consistency, parent-teacher inter-rater reliability in the small to large effect size range, and test-retest reliability in the large effect size range, similar to values for other executive functioning checklists. Correlations between corresponding subscales on the LEAF and other behavior checklists were large, while most correlations with neuropsychological tests of executive functioning and achievement were significant but in the small to medium range. Results support the utility of the LEAF as a reliable and valid questionnaire-based assessment of delays and disturbances in executive functioning and learning. Applications and advantages of the LEAF and other questionnaire measures of executive functioning in clinical neuropsychology settings are discussed.
Makdissi, Michael; Davis, Gavin
2016-10-01
The objective of this study was to determine the reliability and validity of identifying clinical signs of concussion using video analysis in Australian football. Prospective cohort study. All impacts and collisions potentially resulting in a concussion were identified during 2012 and 2013 Australian Football League seasons. Consensus definitions were developed for clinical signs associated with concussion. For intra- and inter-rater reliability analysis, two experienced clinicians independently assessed 102 randomly selected videos on two occasions. Sensitivity, specificity, positive and negative predictive values were calculated based on the diagnosis provided by team medical staff. 212 incidents resulting in possible concussion were identified in 414 Australian Football League games. The intra-rater reliability of the video-based identification of signs associated with concussion was good to excellent. Inter-rater reliability was good to excellent for impact seizure, slow to get up, motor incoordination, ragdoll appearance (2 of 4 analyses), clutching at head and facial injury. Inter-rater reliability for loss of responsiveness and blank and vacant look was only fair and did not reach statistical significance. The feature with the highest sensitivity was slow to get up (87%), but this sign had a low specificity (19%). Other video signs had a high specificity but low sensitivity. Blank and vacant look (100%) and motor incoordination (81%) had the highest positive predictive value. Video analysis may be a useful adjunct to the side-line assessment of a possible concussion. Video analysis however should not replace the need for a thorough multimodal clinical assessment. Copyright © 2016 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Factors Influencing the Reliability of the Glasgow Coma Scale: A Systematic Review.
Reith, Florence Cm; Synnot, Anneliese; van den Brande, Ruben; Gruen, Russell L; Maas, Andrew Ir
2017-06-01
The Glasgow Coma Scale (GCS) characterizes patients with diminished consciousness. In a recent systematic review, we found overall adequate reliability across different clinical settings, but reliability estimates varied considerably between studies, and methodological quality of studies was overall poor. Identifying and understanding factors that can affect its reliability is important, in order to promote high standards for clinical use of the GCS. The aim of this systematic review was to identify factors that influence reliability and to provide an evidence base for promoting consistent and reliable application of the GCS. A comprehensive literature search was undertaken in MEDLINE, EMBASE, and CINAHL from 1974 to July 2016. Studies assessing the reliability of the GCS in adults or describing any factor that influences reliability were included. Two reviewers independently screened citations, selected full texts, and undertook data extraction and critical appraisal. Methodological quality of studies was evaluated with the consensus-based standards for the selection of health measurement instruments checklist. Data were synthesized narratively and presented in tables. Forty-one studies were included for analysis. Factors identified that may influence reliability are education and training, the level of consciousness, and type of stimuli used. Conflicting results were found for experience of the observer, the pathology causing the reduced consciousness, and intubation/sedation. No clear influence was found for the professional background of observers. Reliability of the GCS is influenced by multiple factors and as such is context dependent. This review points to the potential for improvement from training and education and standardization of assessment methods, for which recommendations are presented. Copyright © 2017 by the Congress of Neurological Surgeons.
Human Reliability Assessments: Using the Past (Shuttle) to Predict the Future (ORION)
NASA Technical Reports Server (NTRS)
Mott, Diana L.; Bigler, Mark A.
2017-01-01
NASA uses two HRA assessment methodologies. The first is a simplified method which is based on how much time is available to complete the action, with consideration included for environmental and personal factors that could influence the human's reliability. This method is expected to provide a conservative value or placeholder as a preliminary estimate. This preliminary estimate is used to determine which placeholder needs a more detailed assessment. The second methodology is used to develop a more detailed human reliability assessment on the performance of critical human actions. This assessment needs to consider more than the time available, this would include factors such as: the importance of the action, the context, environmental factors, potential human stresses, previous experience, training, physical design interfaces, available procedures/checklists and internal human stresses. The more detailed assessment is still expected to be more realistic than that based primarily on time available. When performing an HRA on a system or process that has an operational history, we have information specific to the task based on this history and experience. In the case of a PRA model that is based on a new design and has no operational history, providing a "reasonable" assessment of potential crew actions becomes more problematic. In order to determine what is expected of future operational parameters, the experience from individuals who had relevant experience and were familiar with the system and process previously implemented by NASA was used to provide the "best" available data. Personnel from Flight Operations, Flight Directors, Launch Test Directors, Control Room Console Operators and Astronauts were all interviewed to provide a comprehensive picture of previous NASA operations. Verification of the assumptions and expectations expressed in the assessments will be needed when the procedures, flight rules and operational requirements are developed and then finalized.
ERIC Educational Resources Information Center
Lee, Ming; Wimmers, Paul F.
2016-01-01
Although problem-based learning (PBL) has been widely used in medical schools, few studies have attended to the assessment of PBL processes using validated instruments. This study examined reliability and validity for an instrument assessing PBL performance in four domains: Problem Solving, Use of Information, Group Process, and Professionalism.…
ERIC Educational Resources Information Center
Spencer, Trina D.; Goldstein, Howard; Kelley, Elizabeth Spencer; Sherman, Amber; McCune, Luke
2017-01-01
Despite research demonstrating the importance of language comprehension to later reading abilities, curriculum-based measures to assess language comprehension abilities in preschoolers remain lacking. The Assessment of Story Comprehension (ASC) features brief, child-relevant stories and a series of literal and inferential questions with a focus on…
ERIC Educational Resources Information Center
Spencer, Trina D.; Goldstein, Howard; Kelley, Elizabeth Spencer; Sherman, Amber; McCune, Luke
2017-01-01
Despite research demonstrating the importance of language comprehension to later reading abilities, curriculum based measures to assess language comprehension abilities in preschoolers remain lacking. The Assessment of Story Comprehension (ASC) features brief, child-relevant stories and a series of literal and inferential questions with a focus on…
Skog, Alexander; Peyre, Sarah E; Pozner, Charles N; Thorndike, Mary; Hicks, Gloria; Dellaripa, Paul F
2012-01-01
The situational leadership model suggests that an effective leader adapts leadership style depending on the followers' level of competency. We assessed the applicability and reliability of the situational leadership model when observing residents in simulated hospital floor-based scenarios. Resident teams engaged in clinical simulated scenarios. Video recordings were divided into clips based on Emergency Severity Index v4 acuity scores. Situational leadership styles were identified in clips by two physicians. Interrater reliability was determined through descriptive statistical data analysis. There were 114 participants recorded in 20 sessions, and 109 clips were reviewed and scored. There was a high level of interrater reliability (weighted kappa r = .81) supporting situational leadership model's applicability to medical teams. A suggestive correlation was found between frequency of changes in leadership style and the ability to effectively lead a medical team. The situational leadership model represents a unique tool to assess medical leadership performance in the context of acuity changes.
Prediction of Software Reliability using Bio Inspired Soft Computing Techniques.
Diwaker, Chander; Tomar, Pradeep; Poonia, Ramesh C; Singh, Vijander
2018-04-10
A lot of models have been made for predicting software reliability. The reliability models are restricted to using particular types of methodologies and restricted number of parameters. There are a number of techniques and methodologies that may be used for reliability prediction. There is need to focus on parameters consideration while estimating reliability. The reliability of a system may increase or decreases depending on the selection of different parameters used. Thus there is need to identify factors that heavily affecting the reliability of the system. In present days, reusability is mostly used in the various area of research. Reusability is the basis of Component-Based System (CBS). The cost, time and human skill can be saved using Component-Based Software Engineering (CBSE) concepts. CBSE metrics may be used to assess those techniques which are more suitable for estimating system reliability. Soft computing is used for small as well as large-scale problems where it is difficult to find accurate results due to uncertainty or randomness. Several possibilities are available to apply soft computing techniques in medicine related problems. Clinical science of medicine using fuzzy-logic, neural network methodology significantly while basic science of medicine using neural-networks-genetic algorithm most frequently and preferably. There is unavoidable interest shown by medical scientists to use the various soft computing methodologies in genetics, physiology, radiology, cardiology and neurology discipline. CBSE boost users to reuse the past and existing software for making new products to provide quality with a saving of time, memory space, and money. This paper focused on assessment of commonly used soft computing technique like Genetic Algorithm (GA), Neural-Network (NN), Fuzzy Logic, Support Vector Machine (SVM), Ant Colony Optimization (ACO), Particle Swarm Optimization (PSO), and Artificial Bee Colony (ABC). This paper presents working of soft computing techniques and assessment of soft computing techniques to predict reliability. The parameter considered while estimating and prediction of reliability are also discussed. This study can be used in estimation and prediction of the reliability of various instruments used in the medical system, software engineering, computer engineering and mechanical engineering also. These concepts can be applied to both software and hardware, to predict the reliability using CBSE.
Examination of the Test-Retest Reliability of a Computerized Neurocognitive Test Battery.
Nakayama, Yusuke; Covassin, Tracey; Schatz, Philip; Nogle, Sally; Kovan, Jeff
2014-08-01
Test-retest reliability is a critical issue in the utility of computer-based neurocognitive assessment paradigms employing baseline and postconcussion tests. Researchers have reported low test-retest reliability for the Immediate Post Concussion Assessment and Cognitive Testing (ImPACT) across an interval of 45 and 50 days. To re-examine the test-retest reliability of the ImPACT between baseline, 45 days, and 50 days. Descriptive laboratory study. Eighty-five physically active college students (51 male, 34 female) volunteered for this study. Participants completed the ImPACT as well as a 15-item memory test at baseline, 45 days, and 50 days. Intraclass correlation coefficients (ICCs) were calculated for ImPACT composite scores, and change scores were calculated using reliable change indices (RCIs) and regression-based methods (RBMs) at 80% and 95% confidence intervals (CIs). The respective ICCs for baseline to day 45, day 45 to day 50, baseline to day 50, and overall were as follows: verbal memory (0.76, 0.69, 0.65, and 0.78), visual memory (0.72, 0.66, 0.60, and 0.74), visual motor (processing) speed (0.87, 0.88, 0.85, and 0.91), and reaction time (0.67, 0.81, 0.71, and 0.80). All ICCs exceeded the threshold value of 0.60 for acceptable test-retest reliability. All cases fell well within the 80% CI for both the RCI and RBM, while 1% to 5% of cases fell outside the 95% CI for the RCI and 1% for the RBM. Results suggest that the ImPACT is a reliable neurocognitive test battery at 45 and 50 days after the baseline assessment. The current findings agree with those of other reliability studies that have reported acceptable ICCs across 30-day to 1-year testing intervals, and they support the utility of the ImPACT for the multidisciplinary approach to concussion management. This study suggests that the computerized neurocognitive test battery, ImPACT, is a reliable test for postconcussion serial assessments. However, when managing concussed athletes, the ImPACT should not be used as a stand-alone measure. © 2014 The Author(s).
Hobbelen, Johannes S M; Koopmans, Raymond T C M; Verhey, Frans R J; Habraken, Kitty M; de Bie, Rob A
2008-08-01
Paratonia is one of the associated movement disorders characteristic of dementia. The aim of this study was to develop an assessment tool (the Paratonia Assessment Instrument, PAI), based on the new consensus definition of paratonia. An additional aim was to investigate the reliability and validity of the PAI. A three-phase cross-sectional survey was conducted. In the first two phases, the PAI was developed and validated. In the third phase, the inter-observer reliability and feasibility of the instrument was tested. The original PAI consisted of five criteria that all needed to be met in order to make the diagnosis. On the basis of a qualitative analysis, one criterion was reformulated and another was removed. Following this, inter-observer reliability between the two assessors resulted in an improvement of Cohen's kappa from 0.532 in the initial phase to 0.677 in the second phase. This improvement was substantiated in the third phase by two independent assessors with Cohen's kappa ranging from 0.625 to 1. The PAI is a reliable and valid assessment tool for diagnosing paratonia in elderly people with dementia that can be applied easily in daily practice.
Designing Computer-Based Assessments: Multidisciplinary Findings and Student Perspectives
ERIC Educational Resources Information Center
Dembitzer, Leah; Zelikovitz, Sarah; Kettler, Ryan J.
2017-01-01
A partnership was created between psychologists and computer programmers to develop a computer-based assessment program. Psychometric concerns of accessibility, reliability, and validity were juxtaposed with core development concepts of usability and user-centric design. Phases of development were iterative, with evaluation phases alternating with…
Stolarova, Margarita; Wolf, Corinna; Rinker, Tanja; Brielmann, Aenne
2014-01-01
This report has two main purposes. First, we combine well-known analytical approaches to conduct a comprehensive assessment of agreement and correlation of rating-pairs and to dis-entangle these often confused concepts, providing a best-practice example on concrete data and a tutorial for future reference. Second, we explore whether a screening questionnaire developed for use with parents can be reliably employed with daycare teachers when assessing early expressive vocabulary. A total of 53 vocabulary rating pairs (34 parent–teacher and 19 mother–father pairs) collected for two-year-old children (12 bilingual) are evaluated. First, inter-rater reliability both within and across subgroups is assessed using the intra-class correlation coefficient (ICC). Next, based on this analysis of reliability and on the test-retest reliability of the employed tool, inter-rater agreement is analyzed, magnitude and direction of rating differences are considered. Finally, Pearson correlation coefficients of standardized vocabulary scores are calculated and compared across subgroups. The results underline the necessity to distinguish between reliability measures, agreement and correlation. They also demonstrate the impact of the employed reliability on agreement evaluations. This study provides evidence that parent–teacher ratings of children's early vocabulary can achieve agreement and correlation comparable to those of mother–father ratings on the assessed vocabulary scale. Bilingualism of the evaluated child decreased the likelihood of raters' agreement. We conclude that future reports of agreement, correlation and reliability of ratings will benefit from better definition of terms and stricter methodological approaches. The methodological tutorial provided here holds the potential to increase comparability across empirical reports and can help improve research practices and knowledge transfer to educational and therapeutic settings. PMID:24994985
Storey, K E; McCargar, L J
2012-02-01
Web-based surveys are becoming increasing popular. The present study aimed to assess the reliability and validity of the Web-Survey of Physical Activity and Nutrition (Web-SPAN) for self-report of height and weight, diet and physical activity by youth. School children aged 11-15years (grades 7-9; n=459) participated in the school-based research (boys, n=225; girls, n=233; mean age, 12.8years). Students completed Web-SPAN (self-administered) twice and participated in on-site school assessments [height, weight, 3-day food/pedometer record, Physical Activity Questionnaire for Older Children (PAQ-C), shuttle run]. Intraclass (ICC) and Pearson's correlation coefficients and paired samples t-tests were used to assess the test-retest reliability of Web-SPAN and to compare Web-SPAN with the on-site assessments. Test-retest reliability for height (ICC=0.90), weight (ICC=0.98) and the PAQ-C (ICC=0.79) were highly correlated, whereas correlations for nutrients were not as strong (ICC=0.37-0.64). There were no differences between Web-SPAN times 1 and 2 for height and weight, although there were differences for the PAQ-C and most nutrients. Web-SPAN was strongly correlated with the on-site assessments, including height (ICC=0.88), weight (ICC=0.93) and the PAQ-C (ICC=0.70). Mean differences for height and the PAQ-C were not significant, whereas mean differences for weight were significant resulting in an underestimation of being overweight/obesity prevalence (84% agreement). Correlations for nutrients were in the range 0.24-0.40; mean differences were small but generally significantly different. Correlations were weak between the web-based PAQ-C and 3-day pedometer record (r=0.28) and 20-m shuttle run (r=0.28). Web-SPAN is a time- and cost-effective method that can be used to assess the diet and physical activity status of youth in large cross-sectional studies and to assess group trends (weight status). © 2011 The Authors. Journal of Human Nutrition and Dietetics © 2011 The British Dietetic Association Ltd.
Kobayashi, Sarah; Peduto, Anthony; Simic, Milena; Fransen, Marlene; Refshauge, Kathryn; Mah, Jean; Pappas, Evangelos
2018-04-01
This work aimed to assess inter-rater reliability and agreement of a magnetic resonance imaging (MRI)-based Kellgren and Lawrence (K&L) grading for patellofemoral joint osteoarthritis (OA) and to validate it against the MRI Osteoarthritis Knee Score (MOAKS). MRI scans from people aged 45 to 75 years with chronic knee pain participating in a randomised clinical trial evaluating dietary supplements were utilised. Fifty participants were randomly selected and scored using the MRI-based K&L grading using axial and sagittal MRI scans. Raters conducted inter-rater reliability, blinded to clinical information, radiology reports and other rater results. Intra- and inter-rater reliability and agreement were evaluated using the intra-class correlation coefficient (ICC) and Cohen's weighted kappa. There was a 2-week interval between the first and second readings for intra-rater reliability. Validity was assessed using the MOAKS and evaluated using Spearman's correlation coefficient. Intra-rater reliability of the K&L system was excellent: ICC 0.91 (95% CI 0.82-0.95); weighted kappa (ĸ = 0.69). Inter-rater reliability was high (ICC 0.88; 95% CI 0.79-0.93), while agreement between raters was moderate (ĸ = 0.49-0.57). Validity analysis demonstrated a strong correlation between the total MOAKS features score and the K&L grading system (ρ = 0.62-0.67) but weak correlations when compared with individual MOAKS features (ρ = 0.19-0.61). The high reliability and good agreement show consistency in grading the severity of patellofemoral OA with the MRI-based K&L score. Our validity results suggest that the scale may be useful, particularly in the clinical environment. Future research should validate this method against clinical findings.
Lee, Myungmo; Song, Changho; Lee, Kyoungjin; Shin, Doochul; Shin, Seungho
2014-07-14
Treadmill gait analysis was more advantageous than over-ground walking because it allowed continuous measurements of the gait parameters. The purpose of this study was to investigate the concurrent validity and the test-retest reliability of the OPTOGait photoelectric cell system against the treadmill-based gait analysis system by assessing spatio-temporal gait parameters. Twenty-six stroke patients and 18 healthy adults were asked to walk on the treadmill at their preferred speed. The concurrent validity was assessed by comparing data obtained from the 2 systems, and the test-retest reliability was determined by comparing data obtained from the 1st and the 2nd session of the OPTOGait system. The concurrent validity, identified by the intra-class correlation coefficients (ICC [2, 1]), coefficients of variation (CVME), and 95% limits of agreement (LOA) for the spatial-temporal gait parameters, were excellent but the temporal parameters expressed as a percentage of the gait cycle were poor. The test-retest reliability of the OPTOGait System, identified by ICC (3, 1), CVME, 95% LOA, standard error of measurement (SEM), and minimum detectable change (MDC95%) for the spatio-temporal gait parameters, was high. These findings indicated that the treadmill-based OPTOGait System had strong concurrent validity and test-retest reliability. This portable system could be useful for clinical assessments.
Just, Tino; Lankenau, Eva; Prall, Friedrich; Hüttmann, Gereon; Pau, Hans Wilhelm; Sommer, Konrad
2010-10-01
A newly developed microscope-based spectral-domain optical coherence tomography (SD-OCT) device and an endoscope-based time-domain OCT (TD-OCT) were used to assess the inter-rater reliability, sensitivity, specificity, and accuracy of benign and dysplastic laryngeal epithelial lesions. Prospective study. OCT during microlaryngoscopy was done on 35 patients with an endoscope-based TD-OCT, and on 26 patients by an SD-OCT system integrated into an operating microscope. Biopsies were taken from microscopically suspicious lesions allowing comparative study of OCT images and histology. Thickness of the epithelium was seen to be the main criterion for degree of dysplasia. The inter-rater reliability for two observers was found to be kappa = 0.74 (P <.001) for OCT. OCT provided test outcomes for differentiation between benign laryngeal lesions and dysplasia/CIS with sensitivity of 88%, specificity of 89%, PPV of 85%, NPV of 91%, and predictive accuracy of 88%. However, because of the limited penetration depth of the laser light primarily in hyperkeratotic lesions (thickness above 1.5 mm), the basal cell layer was no longer visible, precluding reliable assessment of such lesions. OCT allows for a fairly accurate assessment of benign and dysplastic laryngeal epithelial lesion and greatly facilitates the taking of precise biopsies. Laryngoscope, 2010.
Reliability Assessment for Low-cost Unmanned Aerial Vehicles
NASA Astrophysics Data System (ADS)
Freeman, Paul Michael
Existing low-cost unmanned aerospace systems are unreliable, and engineers must blend reliability analysis with fault-tolerant control in novel ways. This dissertation introduces the University of Minnesota unmanned aerial vehicle flight research platform, a comprehensive simulation and flight test facility for reliability and fault-tolerance research. An industry-standard reliability assessment technique, the failure modes and effects analysis, is performed for an unmanned aircraft. Particular attention is afforded to the control surface and servo-actuation subsystem. Maintaining effector health is essential for safe flight; failures may lead to loss of control incidents. Failure likelihood, severity, and risk are qualitatively assessed for several effector failure modes. Design changes are recommended to improve aircraft reliability based on this analysis. Most notably, the control surfaces are split, providing independent actuation and dual-redundancy. The simulation models for control surface aerodynamic effects are updated to reflect the split surfaces using a first-principles geometric analysis. The failure modes and effects analysis is extended by using a high-fidelity nonlinear aircraft simulation. A trim state discovery is performed to identify the achievable steady, wings-level flight envelope of the healthy and damaged vehicle. Tolerance of elevator actuator failures is studied using familiar tools from linear systems analysis. This analysis reveals significant inherent performance limitations for candidate adaptive/reconfigurable control algorithms used for the vehicle. Moreover, it demonstrates how these tools can be applied in a design feedback loop to make safety-critical unmanned systems more reliable. Control surface impairments that do occur must be quickly and accurately detected. This dissertation also considers fault detection and identification for an unmanned aerial vehicle using model-based and model-free approaches and applies those algorithms to experimental faulted and unfaulted flight test data. Flight tests are conducted with actuator faults that affect the plant input and sensor faults that affect the vehicle state measurements. A model-based detection strategy is designed and uses robust linear filtering methods to reject exogenous disturbances, e.g. wind, while providing robustness to model variation. A data-driven algorithm is developed to operate exclusively on raw flight test data without physical model knowledge. The fault detection and identification performance of these complementary but different methods is compared. Together, enhanced reliability assessment and multi-pronged fault detection and identification techniques can help to bring about the next generation of reliable low-cost unmanned aircraft.
Llorens, Roberto; Latorre, Jorge; Noé, Enrique; Keshner, Emily A
2016-01-01
Posturography systems that incorporate force platforms are considered to assess balance and postural control with greater sensitivity and objectivity than conventional clinical tests. The Wii Balance Board (WBB) system has been shown to have similar performance characteristics as other force platforms, but with lower cost and size. To determine the validity and reliability of a freely available WBB-based posturography system that combined the WBB with several traditional balance assessments, and to assess the performance of a cohort of stroke individuals with respect to healthy individuals. Healthy subjects and individuals with stroke were recruited. Both groups were assessed using the WBB-based posturography system. Individuals with stroke were also assessed using a laboratory grade posturography system and a battery of clinical tests to determine the concurrent validity of the system. A group of subjects were assessed twice with the WBB-based system to determine its reliability. A total of 144 healthy individuals and 53 individuals with stroke participated in the study. Concurrent validity with another posturography system was moderate to high. Correlations with clinical scales were consistent with previous research. The reliability of the system was excellent in almost all measures. In addition, the system successfully characterized individuals with stroke with respect to the healthy population. The WBB-based posturography system exhibited excellent psychometric properties and sensitivity for identifying balance performance of individuals with stroke in comparison with healthy subjects, which supports feasibility of the system as a clinical tool. Copyright © 2015 Elsevier B.V. All rights reserved.
Nikjooy, Afsaneh; Jafari, Hassan; Saba, Maryam A; Ebrahimi, Naghmeh; Mirzaei, Rezvan
2018-05-01
The Patient Assessment of Constipation Quality of Life (PAC-QOL) questionnaire is the most validated and the most specific tool for measuring the quality of life of patients with constipation. Over 120 million people live in countries whose official language is Persian. There is no reported Persian version of the PAC-QOL questionnaire yet. The aim of this study was to translate and culturally adapt the PAC-QOL questionnaire and to assess its reliability and validity among Persian patients with chronic constipation. Following the translation and cultural adaptation of the PAC-QOL questionnaire to Persian, 100 patients (mean±SD age=40.51±13.67) with constipation were recruited for validity measurement and 20 patients were re-examined for reliability. Content validity was assessed based on the opinions of an expert committee and the floor/ceiling effect. Construct validity was evaluated according to the hypothesis test. The SF-36 questionnaire was used for concurrent criterion validity, intra-class correlation coefficient for reliability, and Cronbach's alpha for internal consistency. The content validity of the PAC-QOL questionnaire was proven, and there was no floor/ceiling effect. Construct validity also was confirmed based on the hypothesis test. The overall Cronbach's alpha of the PAC-QOL questionnaire was 0.92 (range=0.72-0.92), and the overall intra-class correlation coefficient of the questionnaire was 0.88 (range=0.69-0.87). The correlation between the SF-36 and PAC-QOL questionnaires was moderate. The Persian version of the PAC-QOL questionnaire demonstrated good validity and reliability properties in chronic constipation. Accordingly, Persian researchers and clinicians can benefit from this questionnaire in further research and assessment of treatment outcomes.
Assessing the competences associated with a nursing Bachelor thesis by means of rubrics.
Llaurado-Serra, M; Rodríguez, E; Gallart, A; Fuster, P; Monforte-Royo, C; De Juan, M Á
2018-07-01
Writing a Bachelor thesis is the last step in obtaining a university degree. The thesis may be job- or research-orientated, but it must demonstrate certain degree-level competences. Rubrics are a useful way of unifying the assessment criteria. To design a system of rubrics for assessing the competences associated with the Bachelor thesis of a nursing degree, to examine the system's reliability and validity and to analyse results in relation to the final thesis mark. Cross-sectional and psychometric study conducted between 2012 and 2014. Nursing degree at a Spanish university. Twelve tutors who designed the system of rubrics. Students (n = 76) who wrote their Bachelor thesis during the 2013-2014 academic year. After deciding which aspects would be assessed, who would assess them and when, the tutors developed seven rubrics (drafting process, assessment of the written thesis by the supervisor and by a panel, student self-assessment, peer assessment, tutor evaluation of the peer assessment and panel assessment of the viva). We analysed the reliability (inter-rater and internal consistency) and validity (convergent and discriminant) of the rubrics, and also the relationship between the competences assessed and the final thesis mark. All the rubrics had internal consistency coefficients >0.80. The rubric for oral communication skills (viva) yielded inter-rater reliability of 0.95. Factor analysis indicated a unidimensional structure for all but one of the rubrics, the exception being the rubric for peer assessment, which had a two-factor structure. The main competences associated with a good quality Bachelor thesis were written communication skills and the ability to work independently. The assessment system based on seven rubrics is shown to be valid and reliable. Writing a Bachelor thesis requires a range of degree-level competences and it offers nursing students the opportunity to develop their evidence-based practice skills. Copyright © 2018 Elsevier Ltd. All rights reserved.
Feasibility of a Semi-computerized Line Bisection Test for Unilateral Visual Neglect Assessment.
Jee, H; Kim, J; Kim, C; Kim, T; Park, J
2015-01-01
Commonly used paper-and-pencil based test modalities for assessing the degree of unilateral visual neglect (ULN) in patients with hemispheric cerebral lesions consume human resources with a significant inter and intra-rater variability. To explore the feasibility of a semi-computerized electronic-pen based ULN assessment system (e-system) to improve assessment quality without altering the conventional user interface. Thirty cognitively healthy participants (HG) and 11 participants diagnosed with right-hemispheric lesion and unilateral visual neglect (NG) were recruited to evaluate the e-system. Line bisection tests (LBT) were repeatedly conducted twice for the inter-rater and intra-rater (reliability) comparisons. The LBT results were assessed by the e-system and the golden standard methods (manual rater assessment). The percent deviation (%), assessment duration (sec), and number of neglected line (each) were evaluated. The inter-rater comparisons of the assessed deviation (%) variable showed excellent interrater reliabilities (CCCs) ranging from .84 (.59 to .95 (p < .001)) to .99 (.90 to .99 (p < .001)) for HG and NG. The Bland Altman mean difference (B-A) plots with bias (95% LOA (limits of agreement)) showed similar agreements between the e-system and the raters ranging from -.04 % (-2.10 to 1.97) to 1.30 % (-2.23 to 4.84) for HG and NG. The effect sizes (ES), which show similarities between the assessment methods, yielded smaller ranges from .01 to .30 for HG and NG. The reliability (test-retest) comparisons showed similar assessment results between the e-system, rater 1, and rater 2. The manual rater assessment time ranging from 5.85 to 6.00 minutes and inter- and intraassessment variations were virtually eliminated with the e-system. The semi-computerized system with the conventional paper-and pencil user-interface showed valid and reliable assessment results. It may be a feasible replacement for the manual rater assessment modality even in a clinical setting.
The development and testing of a qualitative instrument designed to assess critical thinking
NASA Astrophysics Data System (ADS)
Clauson, Cynthia Louisa
This study examined a qualitative approach to assess critical thinking. An instrument was developed that incorporates an assessment process based on Dewey's (1933) concepts of self-reflection and critical thinking as problem solving. The study was designed to pilot test the critical thinking assessment process with writing samples collected from a heterogeneous group of students. The pilot test included two phases. Phase 1 was designed to determine the validity and inter-rater reliability of the instrument using two experts in critical thinking, problem solving, and literacy development. Validity of the instrument was addressed by requesting both experts to respond to ten questions in an interview. The inter-rater reliability was assessed by analyzing the consistency of the two experts' scorings of the 20 writing samples to each other, as well as to my scoring of the same 20 writing samples. Statistical analyses included the Spearman Rho and the Kuder-Richardson (Formula 20). Phase 2 was designed to determine the validity and reliability of the critical thinking assessment process with seven science teachers. Validity was addressed by requesting the teachers to respond to ten questions in a survey and interview. Inter-rater reliability was addressed by comparing the seven teachers' scoring of five writing samples with my scoring of the same five writing samples. Again, the Spearman Rho and the Kuder-Richardson (Formula 20) were used to determine the inter-rater reliability. The validity results suggest that the instrument is helpful as a guide for instruction and provides a systematic method to teach and assess critical thinking while problem solving with students in the classroom. The reliability results show the critical thinking assessment instrument to possess fairly high reliability when used by the experts, but weak reliability when used by classroom teachers. A major conclusion was drawn that teachers, as well as students, would need to receive instruction in critical thinking and in how to use the assessment process in order to gain more consistent interpretations of the six problem-solving steps. Specific changes needing to be made in the instrument to improve the quality are included.
Thin-film reliability and engineering overview
NASA Technical Reports Server (NTRS)
Ross, R. G., Jr.
1984-01-01
The reliability and engineering technology base required for thin film solar energy conversions modules is discussed. The emphasis is on the integration of amorphous silicon cells into power modules. The effort is being coordinated with SERI's thin film cell research activities as part of DOE's Amorphous Silicon Program. Program concentration is on temperature humidity reliability research, glass breaking strength research, point defect system analysis, hot spot heating assessment, and electrical measurements technology.
Thin-film reliability and engineering overview
NASA Astrophysics Data System (ADS)
Ross, R. G., Jr.
1984-10-01
The reliability and engineering technology base required for thin film solar energy conversions modules is discussed. The emphasis is on the integration of amorphous silicon cells into power modules. The effort is being coordinated with SERI's thin film cell research activities as part of DOE's Amorphous Silicon Program. Program concentration is on temperature humidity reliability research, glass breaking strength research, point defect system analysis, hot spot heating assessment, and electrical measurements technology.
NASA Astrophysics Data System (ADS)
Huang, Yuxia; Mao, Mengchai; Zhang, Zong; Zhou, Hui; Zhao, Yang; Duan, Lian; Kreplin, Ute; Xiao, Xiang; Zhu, Chaozhe
2017-01-01
Functional near-infrared spectroscopy (fNIRS) is being increasingly applied to affective and social neuroscience research; however, the reliability of this method is still unclear. This study aimed to evaluate the test-retest reliability of the fNIRS-based prefrontal response to emotional stimuli. Twenty-six participants viewed unpleasant and neutral pictures, and were simultaneously scanned by fNIRS in two sessions three weeks apart. The reproducibility of the prefrontal activation map was evaluated at three spatial scales (mapwise, clusterwise, and channelwise) at both the group and individual levels. The influence of the time interval was also explored and comparisons were made between longer (intersession) and shorter (intrasession) time intervals. The reliabilities of the activation map at the group level for the mapwise (up to 0.88, the highest value appeared in the intersession assessment) and clusterwise scales (up to 0.91, the highest appeared in the intrasession assessment) were acceptable, indicating that fNIRS may be a reliable tool for emotion studies, especially for a group analysis and under larger spatial scales. However, it should be noted that the individual-level and the channelwise fNIRS prefrontal responses were not sufficiently stable. Future studies should investigate which factors influence reliability, as well as the validity of fNIRS used in emotion studies.
Anderson, Donald D; Segal, Neil A; Kern, Andrew M; Nevitt, Michael C; Torner, James C; Lynch, John A
2012-01-01
Recent findings suggest that contact stress is a potent predictor of subsequent symptomatic osteoarthritis development in the knee. However, much larger numbers of knees (likely on the order of hundreds, if not thousands) need to be reliably analyzed to achieve the statistical power necessary to clarify this relationship. This study assessed the reliability of new semiautomated computational methods for estimating contact stress in knees from large population-based cohorts. Ten knees of subjects from the Multicenter Osteoarthritis Study were included. Bone surfaces were manually segmented from sequential 1.0 Tesla magnetic resonance imaging slices by three individuals on two nonconsecutive days. Four individuals then registered the resulting bone surfaces to corresponding bone edges on weight-bearing radiographs, using a semi-automated algorithm. Discrete element analysis methods were used to estimate contact stress distributions for each knee. Segmentation and registration reliabilities (day-to-day and interrater) for peak and mean medial and lateral tibiofemoral contact stress were assessed with Shrout-Fleiss intraclass correlation coefficients (ICCs). The segmentation and registration steps of the modeling approach were found to have excellent day-to-day (ICC 0.93-0.99) and good inter-rater reliability (0.84-0.97). This approach for estimating compartment-specific tibiofemoral contact stress appears to be sufficiently reliable for use in large population-based cohorts.
Validity and reliability of a video questionnaire to assess physical function in older adults.
Balachandran, Anoop; N Verduin, Chelsea; Potiaumpai, Melanie; Ni, Meng; Signorile, Joseph F
2016-08-01
Self-report questionnaires are widely used to assess physical function in older adults. However, they often lack a clear frame of reference and hence interpreting and rating task difficulty levels can be problematic for the responder. Consequently, the usefulness of traditional self-report questionnaires for assessing higher-level functioning is limited. Video-based questionnaires can overcome some of these limitations by offering a clear and objective visual reference for the performance level against which the subject is to compare his or her perceived capacity. Hence the purpose of the study was to develop and validate a novel, video-based questionnaire to assess physical function in older adults independently living in the community. A total of 61 community-living adults, 60years or older, were recruited. To examine validity, 35 of the subjects completed the video questionnaire, two types of physical performance tests: a test of instrumental activity of daily living (IADL) included in the Short Physical Functional Performance battery (PFP-10), and a composite of 3 performance tests (30s chair stand, single-leg balance and usual gait speed). To ascertain reliability, two-week test-retest reliability was assessed in the remaining 26 subjects who did not participate in validity testing. The video questionnaire showed a moderate correlation with the IADLs (Spearman rho=0.64, p<0.001; 95% CI (0.4, 0.8)), and a lower correlation with the composite score of physical performance tests (Spearman rho=0.49, p<0.01; 95% CI (0.18, 0.7)). The test-retest assessment yielded an intra-class correlation (ICC) of 0.87 (p<0.001; 95% CI (0.70, 0.94)) and a Cronbach's alpha of 0.89 demonstrating good reliability and internal consistency. Our results show that the video questionnaire developed to evaluate physical function in community-living older adults is a valid and reliable assessment tool; however, further validation is needed for definitive conclusions. Copyright © 2016 Elsevier Inc. All rights reserved.
Tailoring a Human Reliability Analysis to Your Industry Needs
NASA Technical Reports Server (NTRS)
DeMott, D. L.
2016-01-01
Companies at risk of accidents caused by human error that result in catastrophic consequences include: airline industry mishaps, medical malpractice, medication mistakes, aerospace failures, major oil spills, transportation mishaps, power production failures and manufacturing facility incidents. Human Reliability Assessment (HRA) is used to analyze the inherent risk of human behavior or actions introducing errors into the operation of a system or process. These assessments can be used to identify where errors are most likely to arise and the potential risks involved if they do occur. Using the basic concepts of HRA, an evolving group of methodologies are used to meet various industry needs. Determining which methodology or combination of techniques will provide a quality human reliability assessment is a key element to developing effective strategies for understanding and dealing with risks caused by human errors. There are a number of concerns and difficulties in "tailoring" a Human Reliability Assessment (HRA) for different industries. Although a variety of HRA methodologies are available to analyze human error events, determining the most appropriate tools to provide the most useful results can depend on industry specific cultures and requirements. Methodology selection may be based on a variety of factors that include: 1) how people act and react in different industries, 2) expectations based on industry standards, 3) factors that influence how the human errors could occur such as tasks, tools, environment, workplace, support, training and procedure, 4) type and availability of data, 5) how the industry views risk & reliability, and 6) types of emergencies, contingencies and routine tasks. Other considerations for methodology selection should be based on what information is needed from the assessment. If the principal concern is determination of the primary risk factors contributing to the potential human error, a more detailed analysis method may be employed versus a requirement to provide a numerical value as part of a probabilistic risk assessment. Industries involved with humans operating large equipment or transport systems (ex. railroads or airlines) would have more need to address the man machine interface than medical workers administering medications. Human error occurs in every industry; in most cases the consequences are relatively benign and occasionally beneficial. In cases where the results can have disastrous consequences, the use of Human Reliability techniques to identify and classify the risk of human errors allows a company more opportunities to mitigate or eliminate these types of risks and prevent costly tragedies.
Reliability of pubertal maturation self-assessment in a school-based survey.
Jaruratanasirikul, Somchit; Kreetapirom, Piyawut; Tassanakijpanich, Nattaporn; Sriplung, Hutcha
2015-03-01
To assess the reliability of pubertal self-assessment of Thai adolescents. Some 927 girls and 997 boys, aged 8-18 years, from nine schools in Hat-Yai municipality. The adolescents evaluated their pubertal status after being shown a line drawing of the five Tanner stages with a short description. Girls assessed their breast and pubic hair development, and boys assessed their pubic hair development. The pubertal self-assessments were compared to pubertal assessments made by a pediatrician who examined the children after their self-assessment. Kappa coefficient and percent agreement were used for statistical analysis. The percent agreement of breast and pubic hair development between the girl's self-assessments and the assessments by the pediatrician were 60.8% and 78%, respectively. Kappa coefficient for breast assessment was 0.50 (95% confidence interval, CI 0.47-0.53) and for pubic hair 0.68 (95% CI 0.65-0.72). Nearly 30% of girls aged younger than 10 years overestimated their breast development status while 45% of girls aged over 14 years underestimated their breast development (p<0.001). For boys, the percent agreement of pubic hair development between the adolescents and the pediatrician was 76.4%, with a weighted kappa coefficient of 0.68 (95% CI 0.65-0.72). Pubertal self-assessment using line drawings with a short description can be used as a reliable method to assess pubic hair maturation in boys and girls, but can be used with less reliability to assess the breast maturation in young girls.
Burt, Jenni; Abel, Gary; Elmore, Natasha; Campbell, John; Roland, Martin; Benson, John; Silverman, Jonathan
2014-03-06
To investigate initial reliability of the Global Consultation Rating Scale (GCRS: an instrument to assess the effectiveness of communication across an entire doctor-patient consultation, based on the Calgary-Cambridge guide to the medical interview), in simulated patient consultations. Multiple ratings of simulated general practitioner (GP)-patient consultations by trained GP evaluators. UK primary care. 21 GPs and six trained GP evaluators. GCRS score. 6 GP raters used GCRS to rate randomly assigned video recordings of GP consultations with simulated patients. Each of the 42 consultations was rated separately by four raters. We considered whether a fixed difference between scores had the same meaning at all levels of performance. We then examined the reliability of GCRS using mixed linear regression models. We augmented our regression model to also examine whether there were systematic biases between the scores given by different raters and to look for possible order effects. Assessing the communication quality of individual consultations, GCRS achieved a reliability of 0.73 (95% CI 0.44 to 0.79) for two raters, 0.80 (0.54 to 0.85) for three and 0.85 (0.61 to 0.88) for four. We found an average difference of 1.65 (on a 0-10 scale) in the scores given by the least and most generous raters: adjusting for this evaluator bias increased reliability to 0.78 (0.53 to 0.83) for two raters; 0.85 (0.63 to 0.88) for three and 0.88 (0.69 to 0.91) for four. There were considerable order effects, with later consultations (after 15-20 ratings) receiving, on average, scores more than one point higher on a 0-10 scale. GCRS shows good reliability with three raters assessing each consultation. We are currently developing the scale further by assessing a large sample of real-world consultations.
Statistical methodology: II. Reliability and validity assessment in study design, Part B.
Karras, D J
1997-02-01
Validity measures the correspondence between a test and other purported measures of the same or similar qualities. When a reference standard exists, a criterion-based validity coefficient can be calculated. If no such standard is available, the concepts of content and construct validity may be used, but quantitative analysis may not be possible. The Pearson and Spearman tests of correlation are often used to assess the correspondence between tests, but do not account for measurement biases and may yield misleading results. Techniques that measure interest differences may be more meaningful in validity assessment, and the kappa statistic is useful for analyzing categorical variables. Questionnaires often can be designed to allow quantitative assessment of reliability and validity, although this may be difficult. Inclusion of homogeneous questions is necessary to assess reliability. Analysis is enhanced by using Likert scales or similar techniques that yield ordinal data. Validity assessment of questionnaires requires careful definition of the scope of the test and comparison with previously validated tools.
Validity and Reliability of Baseline Testing in a Standardized Environment.
Higgins, Kathryn L; Caze, Todd; Maerlender, Arthur
2017-08-11
The Immediate Postconcussion Assessment and Cognitive Testing (ImPACT) is a computerized neuropsychological test battery commonly used to determine cognitive recovery from concussion based on comparing post-injury scores to baseline scores. This model is based on the premise that ImPACT baseline test scores are a valid and reliable measure of optimal cognitive function at baseline. Growing evidence suggests that this premise may not be accurate and a large contributor to invalid and unreliable baseline test scores may be the protocol and environment in which baseline tests are administered. This study examined the effects of a standardized environment and administration protocol on the reliability and performance validity of athletes' baseline test scores on ImPACT by comparing scores obtained in two different group-testing settings. Three hundred-sixty one Division 1 cohort-matched collegiate athletes' baseline data were assessed using a variety of indicators of potential performance invalidity; internal reliability was also examined. Thirty-one to thirty-nine percent of the baseline cases had at least one indicator of low performance validity, but there were no significant differences in validity indicators based on environment in which the testing was conducted. Internal consistency reliability scores were in the acceptable to good range, with no significant differences between administration conditions. These results suggest that athletes may be reliably performing at levels lower than their best effort would produce. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Chang, Jasper O; Levy, Susan S; Seay, Seth W; Goble, Daniel J
2014-05-01
Recent guidelines advocate sports medicine professionals to use balance tests to assess sensorimotor status in the management of concussions. The present study sought to determine whether a low-cost balance board could provide a valid, reliable, and objective means of performing this balance testing. Criterion validity testing relative to a gold standard and 7 day test-retest reliability. University biomechanics laboratory. Thirty healthy young adults. Balance ability was assessed on 2 days separated by 1 week using (1) a gold standard measure (ie, scientific grade force plate), (2) a low-cost Nintendo Wii Balance Board (WBB), and (3) the Balance Error Scoring System (BESS). Validity of the WBB center of pressure path length and BESS scores were determined relative to the force plate data. Test-retest reliability was established based on intraclass correlation coefficients. Composite scores for the WBB had excellent validity (r = 0.99) and test-retest reliability (R = 0.88). Both the validity (r = 0.10-0.52) and test-retest reliability (r = 0.61-0.78) were lower for the BESS. These findings demonstrate that a low-cost balance board can provide improved balance testing accuracy/reliability compared with the BESS. This approach provides a potentially more valid/reliable, yet affordable, means of assessing sports-related concussion compared with current methods.
Structural Validation of the Holistic Wellness Assessment
ERIC Educational Resources Information Center
Brown, Charlene; Applegate, E. Brooks; Yildiz, Mustafa
2015-01-01
The Holistic Wellness Assessment (HWA) is a relatively new assessment instrument based on an emergent transdisciplinary model of wellness. This study validated the factor structure identified via exploratory factor analysis (EFA), assessed test-retest reliability, and investigated concurrent validity of the HWA in three separate samples. The…
ERIC Educational Resources Information Center
Kim, Sooyeon; Livingston, Samuel A.
2017-01-01
The purpose of this simulation study was to assess the accuracy of a classical test theory (CTT)-based procedure for estimating the alternate-forms reliability of scores on a multistage test (MST) having 3 stages. We generated item difficulty and discrimination parameters for 10 parallel, nonoverlapping forms of the complete 3-stage test and…
Validation of the Evidence-Based Practice Process Assessment Scale
ERIC Educational Resources Information Center
Rubin, Allen; Parrish, Danielle E.
2011-01-01
Objective: This report describes the reliability, validity, and sensitivity of a scale that assesses practitioners' perceived familiarity with, attitudes of, and implementation of the evidence-based practice (EBP) process. Method: Social work practitioners and second-year master of social works (MSW) students (N = 511) were surveyed in four sites…
NASA Astrophysics Data System (ADS)
Goh, A. T. C.; Kulhawy, F. H.
2005-05-01
In urban environments, one major concern with deep excavations in soft clay is the potentially large ground deformations in and around the excavation. Excessive movements can damage adjacent buildings and utilities. There are many uncertainties associated with the calculation of the ultimate or serviceability performance of a braced excavation system. These include the variabilities of the loadings, geotechnical soil properties, and engineering and geometrical properties of the wall. A risk-based approach to serviceability performance failure is necessary to incorporate systematically the uncertainties associated with the various design parameters. This paper demonstrates the use of an integrated neural network-reliability method to assess the risk of serviceability failure through the calculation of the reliability index. By first performing a series of parametric studies using the finite element method and then approximating the non-linear limit state surface (the boundary separating the safe and failure domains) through a neural network model, the reliability index can be determined with the aid of a spreadsheet. Two illustrative examples are presented to show how the serviceability performance for braced excavation problems can be assessed using the reliability index.
ERIC Educational Resources Information Center
Nolan, Meaghan M.; Beran, Tanya; Hecker, Kent G.
2012-01-01
Students with positive attitudes toward statistics are likely to show strong academic performance in statistics courses. Multiple surveys measuring students' attitudes toward statistics exist; however, a comparison of the validity and reliability of interpretations based on their scores is needed. A systematic review of relevant electronic…
The Validation of a Food Label Literacy Questionnaire for Elementary School Children
ERIC Educational Resources Information Center
Reynolds, Jesse S.; Treu, Judith A.; Njike, Valentine; Walker, Jennifer; Smith, Erica; Katz, Catherine S.; Katz, David L.
2012-01-01
Objective: To determine the reliability and validity of a 10-item questionnaire, the Food Label Literacy for Applied Nutrition Knowledge questionnaire. Methods: Participants were elementary school children exposed to a 90-minute school-based nutrition program. Reliability was assessed via Cronbach alpha and intraclass correlation coefficient…
Competency-based assessment in surgeon-performed head and neck ultrasonography: A validity study.
Todsen, Tobias; Melchiors, Jacob; Charabi, Birgitte; Henriksen, Birthe; Ringsted, Charlotte; Konge, Lars; von Buchwald, Christian
2018-06-01
Head and neck ultrasonography (HNUS) increasingly is used as a point-of-care diagnostic tool by otolaryngologists. However, ultrasonography (US) is a very operator-dependent image modality. Hence, this study aimed to explore the diagnostic accuracy of surgeon-performed HNUS and to establish validity evidence for an objective structured assessment of ultrasound skills (OSAUS) used for competency-based assessment. A prospective experimental study. Six otolaryngologists and 11 US novices were included in a standardized test setup for which they had to perform focused HNUS of eight patients suspected for different head and neck lesions. Their diagnostic accuracy was calculated based on the US reports, and two blinded raters assessed the video-recorded US performance using the OSAUS scale. The otolaryngologists obtained a high diagnostic accuracy on 88% (range 63%-100%) compared to the US novices on 38% (range 0-63%); P < 0.001. The OSAUS score demonstrated good inter-case reliability (0.85) and inter-rater reliability (0.76), and significant discrimination between otolaryngologist and US novices; P < 0.001. A strong correlation between the OSAUS score and the diagnostic accuracy was found (Spearman's ρ, 0.85; P < P 0.001), and a pass/fail score was established at 2.8. Strong validity evidence supported the use of the OSAUS scale to assess HNUS competence with good reliability, significant discrimination between US competence levels, and a strong correlation of assessment score to diagnostic accuracy. An OSAUS pass/fail score was established and could be used for competence-based assessment in surgeon-performed HNUS. NA. Laryngoscope, 128:1346-1352, 2018. © 2017 The American Laryngological, Rhinological and Otological Society, Inc.
Mathur, Vijay Prakash; Dhillon, Jatinder Kaur; Logani, Ajay; Agarwal, Ramesh
2014-01-01
The purpose of this study was to develop a reliable instrument [Oral Health related Early Childhood Quality of Life (OH- ECQOL) scale] for measuring oral health related quality of life (OHrQoL) in preschool children in North Indian population. Four pediatric dentists evaluated a pool of 65 items from various QoL questionnaires to assess their relevance to Indian population. These items were discussed with eight independent pediatric dentists and two community dentists who were not a part of this study to assess relevance of these items to preschool age children based on their comprehensiveness and clarity. Based on their responses and feedback a modified pool of items was developed and administered to a convenience sample of 20 parents who rated these items according to their relevance. The test retest reliability was evaluated on another sample of 20 parents of 2-5 year old children. The final questionnaire comprised of 16 items (12 child and 4 family). This was administered to 300 parents of 24-71 months old children divided on the basis of early childhood caries to assess its reliability and validity. OH-ECQOL scores were significantly associated with parental ratings of their child's general and oral health, and the presence of dental disease in the child. Cronbach's alpha was 0.862, and the ICC for test-retest reliability was 0.94. The OH-ECQOL proved reliable and valid tool for assessing the impact of oral disorders on the quality of life of preschool children in Northern India.
Development and Exemplification of a Model for Teacher Assessment in Primary Science
ERIC Educational Resources Information Center
Davies, D. J.; Earle, S.; McMahon, K.; Howe, A.; Collier, C.
2017-01-01
The Teacher Assessment in Primary Science project is funded by the Primary Science Teaching Trust and based at Bath Spa University. The study aims to develop a whole-school model of valid, reliable and manageable teacher assessment to inform practice and make a positive impact on primary-aged children's learning in science. The model is based on a…
Reliability and validity of the Japanese Migraine Disability Assessment (MIDAS) Questionnaire.
Iigaya, Miho; Sakai, Fumihiko; Kolodner, Kenneth B; Lipton, Richard B; Stewart, Walter F
2003-04-01
This study was designed to assess the test-retest reliability, internal consistency, and validity of a Japanese translation of the Migraine Disability Assessment (MIDAS) Questionnaire in a sample of Japanese patients with headache. Previous studies have demonstrated that the English-language version of the MIDAS Questionnaire is a reliable and valid instrument for the assessment of migraine-related disability. Any translations of the MIDAS Questionnaire must also be assessed for reliability and validity. Study participants were recruited from the patient population attending either the Neurology Department of Kitasato University or an affiliated clinic. Participants were eligible for study entry if they had 6 or more primary headaches per year. For reliability testing, participants completed the MIDAS Questionnaire on 2 occasions, exactly 2 weeks apart. To assess validity, patients were also invited to participate in a 90-day daily diary study. Composite measures from the 90-day diaries were compared to equivalent MIDAS measures (ie, 5 questions on headache-related disability and 1 question each on average pain intensity and headache frequency in the last 3 months) and to the total MIDAS score obtained from a third MIDAS Questionnaire completed at the end of this 90-day period. One hundred one patients between the ages of 21 and 77 years were recruited (81 women and 20 men). Ninety-nine patients (80 women and 19 men) participated in the diary study. At baseline, 46.5% of patients were MIDAS grade I or II (minimal, mild, or infrequent disability), 22.2% were MIDAS grade III (moderate disability), and 31.3% were MIDAS grade IV (severe disability). Test-retest Spearman correlations for the 5 disability questions and the questions on average pain intensity and headache frequency ranged from 0.59 to 0.80 (P<.0001). The test-retest Spearman correlation coefficient for the total MIDAS score was 0.83 (P<.0001). The degree to which individual MIDAS questions correlated with the diary-based measures ranged from 0.36 to 0.88. The correlation between the total MIDAS score and the equivalent diary-based measure was 0.66. In general, the mean and median values for the MIDAS items and total MIDAS score were similar to the means and medians for the diary-based measures. However, the mean MIDAS scores for the number of days on which headache was experienced and the number of missed workdays were significantly different compared to the diary-based estimates for these items (P<.05). In addition, the mean MIDAS score for the number of days of missed housework was significantly higher than the corresponding diary-based estimate (P<.01). The results from this study show that the Japanese translation of the MIDAS Questionnaire is comparable with the English-language version in terms of reliability and validity.
Griew, Pippa; Hillsdon, Melvyn; Foster, Charlie; Coombes, Emma; Jones, Andy; Wilkinson, Paul
2013-08-23
Walking for physical activity is associated with substantial health benefits for adults. Increasingly research has focused on associations between walking behaviours and neighbourhood environments including street characteristics such as pavement availability and aesthetics. Nevertheless, objective assessment of street-level data is challenging. This research investigates the reliability of a new street characteristic audit tool designed for use with Google Street View, and assesses levels of agreement between computer-based and on-site auditing. The Forty Area STudy street VIEW (FASTVIEW) tool, a Google Street View based audit tool, was developed incorporating nine categories of street characteristics. Using the tool, desk-based audits were conducted by trained researchers across one large UK town during 2011. Both inter and intra-rater reliability were assessed. On-site street audits were also completed to test the criterion validity of the method. All reliability scores were assessed by percentage agreement and the kappa statistic. Within-rater agreement was high for each category of street characteristic (range: 66.7%-90.0%) and good to high between raters (range: 51.3%-89.1%). A high level of agreement was found between the Google Street View audits and those conducted in-person across the nine categories examined (range: 75.0%-96.7%). The audit tool was found to provide a reliable and valid measure of street characteristics. The use of Google Street View to capture street characteristic data is recommended as an efficient method that could substantially increase potential for large-scale objective data collection.
DeSmet, A; Bastiaensens, S; Van Cleemput, K; Poels, K; Vandebosch, H; Deboutte, G; Herrewijn, L; Malliet, S; Pabian, S; Van Broeckhoven, F; De Troyer, O; Deglorie, G; Van Hoecke, S; Samyn, K; De Bourdeaudhuij, I
2018-06-01
.This paper describes the items, scale validity and scale reliability of a self-report questionnaire that measures bystander behavior in cyberbullying incidents among adolescents, and its behavioral determinants. Determinants included behavioral intention, behavioral attitudes, moral disengagement attitudes, outcome expectations, self-efficacy, subjective norm and social skills. Questions also assessed (cyber-)bullying involvement. Validity and reliability information is based on a sample of 238 adolescents (M age=13.52 years, SD=0.57). Construct validity was assessed using Confirmatory Factor Analysis (CFA) or Exploratory Factor Analysis (EFA) in Mplus7 software. Reliability (Cronbach Alpha, α) was assessed in SPSS, version 22. Data and questionnaire are included in this article. Further information can be found in DeSmet et al. (2018) [1].
Thylstrup, Birgitte; Simonsen, Sebastian; Nemery, Caroline; Simonsen, Erik; Noll, Jane Fjernestad; Myatt, Mikkel Wanting; Hesse, Morten
2016-08-25
The personality disorder categories in the Diagnostic and Statistical Manual of Mental Disorders IV have been extensively criticized, and there is a growing consensus that personality pathology should be represented dimensionally rather than categorically. The aim of this pilot study was to test the Clinical Assessment of the Level of Personality Functioning Scale, a semi-structured clinical interview, designed to assess the Level of Personality Functioning Scale of the DSM-5 (Section III) by applying strategies similar to what characterizes assessments in clinical practice. The inter-rater reliability of the assessment of the four domains and the total impairment in the Level of Personality Functioning Scale were measured in a patient sample that varied in terms of severity and type of pathology. Ratings were done independently by the interviewer and two experts who watched a videotaped Clinical Assessment of the Level of Personality Functioning Scale interview. Inter-rater reliability coefficients varied between domains and were not sufficient for clinical practice, but may support the use of the interview to assess the dimensions of personality functioning for research purposes. While designed to measure the Level of Personality Functioning Scale with a high degree of similarity to clinical practice, the Clinical Assessment of the Level of Personality Functioning Scale had weak reliabilities and a rating based on a single interview should not be considered a stand-alone assessment of areas of functioning for a given patient.
Damschroder, Laura J; Goodrich, David E; Kim, Hyungjin Myra; Holleman, Robert; Gillon, Leah; Kirsh, Susan; Richardson, Caroline R; Lutes, Lesley D
2016-09-01
Practical and valid instruments are needed to assess fidelity of coaching for weight loss. The purpose of this study was to develop and validate the ASPIRE Coaching Fidelity Checklist (ACFC). Classical test theory guided ACFC development. Principal component analyses were used to determine item groupings. Psychometric properties, internal consistency, and inter-rater reliability were evaluated for each subscale. Criterion validity was tested by predicting weight loss as a function of coaching fidelity. The final 19-item ACFC consists of two domains (session process and session structure) and five subscales (sets goals and monitor progress, assess and personalize self-regulatory content, manages the session, creates a supportive and empathetic climate, and stays on track). Four of five subscales showed high internal consistency (Cronbach alphas > 0.70) for group-based coaching; only two of five subscales had high internal reliability for phone-based coaching. All five sub-scales were positively and significantly associated with weight loss for group- but not for phone-based coaching. The ACFC is a reliable and valid instrument that can be used to assess fidelity and guide skill-building for weight management interventionists.
Comprehensive classification test of scapular dyskinesis: A reliability study.
Huang, Tsun-Shun; Huang, Han-Yi; Wang, Tyng-Guey; Tsai, Yung-Shen; Lin, Jiu-Jenq
2015-06-01
Assessment of scapular dyskinesis (SD) is of clinical interest, as SD is believed to be related to shoulder pathology. However, no clinical assessment with sufficient reliability to identify SD and provide treatment strategies is available. The purpose of this study was to investigate the reliability of the comprehensive SD classification method. Cross-sectional reliability study. Sixty subjects with unilateral shoulder pain were evaluated by two independent physiotherapists with a visual-based palpation method. SD was classified as single abnormal scapular pattern [inferior angle (pattern I), medial border (pattern II), superior border of scapula prominence or abnormal scapulohumeral rhythm (pattern III)], a mixture of the above abnormal scapular patterns, or normal pattern (pattern IV). The assessment of SD was evaluated as subjects performed bilateral arm raising/lowering movements with a weighted load in the scapular plane. Percentage of agreement and kappa coefficients were calculated to determine reliability. Agreement between the 2 independent physiotherapists was 83% (50/60, 6 subjects as pattern III and 44 subjects as pattern IV) in the raising phase and 68% (41/60, 5 subjects as pattern I, 12 subjects as pattern II, 12 subjects as pattern IV, 12 subjects as mixed patterns I and II) in the lowering phase. The kappa coefficients were 0.49-0.64. We concluded that the visual-based palpation classification method for SD had moderate to substantial inter-rater reliability. The appearance of different types of SD was more pronounced in the lowering phase than in the raising phase of arm movements. Copyright © 2014 Elsevier Ltd. All rights reserved.
A proposed method to investigate reliability throughout a questionnaire
2011-01-01
Background Questionnaires are used extensively in medical and health care research and depend on validity and reliability. However, participants may differ in interest and awareness throughout long questionnaires, which can affect reliability of their answers. A method is proposed for "screening" of systematic change in random error, which could assess changed reliability of answers. Methods A simulation study was conducted to explore whether systematic change in reliability, expressed as changed random error, could be assessed using unsupervised classification of subjects by cluster analysis (CA) and estimation of intraclass correlation coefficient (ICC). The method was also applied on a clinical dataset from 753 cardiac patients using the Jalowiec Coping Scale. Results The simulation study showed a relationship between the systematic change in random error throughout a questionnaire and the slope between the estimated ICC for subjects classified by CA and successive items in a questionnaire. This slope was proposed as an awareness measure - to assessing if respondents provide only a random answer or one based on a substantial cognitive effort. Scales from different factor structures of Jalowiec Coping Scale had different effect on this awareness measure. Conclusions Even though assumptions in the simulation study might be limited compared to real datasets, the approach is promising for assessing systematic change in reliability throughout long questionnaires. Results from a clinical dataset indicated that the awareness measure differed between scales. PMID:21974842
Item Response Theory for Peer Assessment
ERIC Educational Resources Information Center
Uto, Masaki; Ueno, Maomi
2016-01-01
As an assessment method based on a constructivist approach, peer assessment has become popular in recent years. However, in peer assessment, a problem remains that reliability depends on the rater characteristics. For this reason, some item response models that incorporate rater parameters have been proposed. Those models are expected to improve…
Irvine, Karen-Amanda; Ferguson, Adam R.; Mitchell, Kathleen D.; Beattie, Stephanie B.; Lin, Amity; Stuck, Ellen D.; Huie, J. Russell; Nielson, Jessica L.; Talbott, Jason F.; Inoue, Tomoo; Beattie, Michael S.; Bresnahan, Jacqueline C.
2014-01-01
The IBB scale is a recently developed forelimb scale for the assessment of fine control of the forelimb and digits after cervical spinal cord injury [SCI; (1)]. The present paper describes the assessment of inter-rater reliability and face, concurrent and construct validity of this scale following SCI. It demonstrates that the IBB is a reliable and valid scale that is sensitive to severity of SCI and to recovery over time. In addition, the IBB correlates with other outcome measures and is highly predictive of biological measures of tissue pathology. Multivariate analysis using principal component analysis (PCA) demonstrates that the IBB is highly predictive of the syndromic outcome after SCI (2), and is among the best predictors of bio-behavioral function, based on strong construct validity. Altogether, the data suggest that the IBB, especially in concert with other measures, is a reliable and valid tool for assessing neurological deficits in fine motor control of the distal forelimb, and represents a powerful addition to multivariate outcome batteries aimed at documenting recovery of function after cervical SCI in rats. PMID:25071704
The feasibility of sharing simulation-based evaluation scenarios in anesthesiology.
Berkenstadt, Haim; Kantor, Gareth S; Yusim, Yakov; Gafni, Naomi; Perel, Azriel; Ezri, Tiberiu; Ziv, Amitai
2005-10-01
We prospectively assessed the feasibility of international sharing of simulation-based evaluation tools despite differences in language, education, and anesthesia practice, in an Israeli study, using validated scenarios from a multi-institutional United States (US) study. Thirty-one Israeli junior anesthesia residents performed four simulation scenarios. Training sessions were videotaped and performance was assessed using two validated scoring systems (Long and Short Forms) by two independent raters. Subjects scored from 37 to 95 (70 +/- 12) of 108 possible points with the "Long Form" and "Short Form" scores ranging from 18 to 35 (28.2 +/- 4.5) of 40 possible points. Scores >70% of the maximal score were achieved by 61% of participants in comparison to only 5% in the original US study. The scenarios were rated as very realistic by 80% of the participants (grade 4 on a 1-4 scale). Reliability of the original assessment tools was demonstrated by internal consistencies of 0.66 for the Long and 0.75 for the Short Form (Cronbach alpha statistic). Values in the original study were 0.72-0.76 for the Long and 0.71-0.75 for the Short Form. The reliability did not change when a revised Israeli version of the scoring was used. Interrater reliability measured by Pearson correlation was 0.91 for the Long and 0.96 for the Short Form (P < 0.01). The high scores for plausibility given to the scenarios and the similar reliability of the original assessment tool support the feasibility of using simulation-based evaluation tools, developed in the US, in Israel. The higher scores achieved by Israeli residents may be related to the fact that most Israeli residents are immigrants with previous training in anesthesia. Simulation-based assessment tools developed in a multi-institutional study in the United States can be used in Israel despite the differences in language, education, and medical system.
Apollo experience report: Reliability and quality assurance
NASA Technical Reports Server (NTRS)
Sperber, K. P.
1973-01-01
The reliability of the Apollo spacecraft resulted from the application of proven reliability and quality techniques and from sound management, engineering, and manufacturing practices. Continual assessment of these techniques and practices was made during the program, and, when deficiencies were detected, adjustments were made and the deficiencies were effectively corrected. The most significant practices, deficiencies, adjustments, and experiences during the Apollo Program are described in this report. These experiences can be helpful in establishing an effective base on which to structure an efficient reliability and quality assurance effort for future space-flight programs.
Reliability, Validity and Treatment Sensitivity of the Schizophrenia Cognition Rating Scale
Keefe, Richard S.E.; Davis, Vicki G.; Spagnola, Nathan B.; Hilt, Dana; Dgetluck, Nancy; Ruse, Stacy; Patterson, Thomas L.; Narasimhan, Meera; Harvey, Philip D.
2014-01-01
Cognitive functioning can be assessed with performance-based assessments such as neuropsychological tests and with interview-based assessments. Both assessment methods have the potential to assess whether treatments for schizophrenia improve clinically relevant aspects of cognitive impairment. However, little is known about the reliability, validity and treatment responsiveness of interview-based measures, especially in the context of clinical trials. Data from two studies were utilized to assess these features of the Schizophrenia Cognition Rating Scale (SCoRS). One of the studies was a validation study involving 79 patients with schizophrenia assessed at 3 academic research centers in the US. The other study was a 32-site clinical trial conducted in the US and Europe comparing the effects of encenicline, an alpha-7 nicotine agonist, to placebo in 319 patients with schizophrenia. The SCoRS interviewer ratings demonstrated excellent test-retest reliability in several different circumstances, including those that did not involve treatment (ICC> 0.90), and during treatment (ICC>0.80). SCoRS interviewer ratings were related to cognitive performance as measured by the MCCB (r= −0.35), and demonstrated significant sensitivity to treatment with encenicline compared to placebo (P<.001). These data suggest that the SCoRS has potential as a clinically relevant measure in clinical trials aiming to improve cognition in schizophrenia, and may be useful for clinical practice. The weaknesses of the SCoRS include its reliance on informant information, which is not available for some patients, and reduced validity when patient self-report is the sole information source. PMID:25028065
Reliability, validity and treatment sensitivity of the Schizophrenia Cognition Rating Scale.
Keefe, Richard S E; Davis, Vicki G; Spagnola, Nathan B; Hilt, Dana; Dgetluck, Nancy; Ruse, Stacy; Patterson, Thomas D; Narasimhan, Meera; Harvey, Philip D
2015-02-01
Cognitive functioning can be assessed with performance-based assessments such as neuropsychological tests and with interview-based assessments. Both assessment methods have the potential to assess whether treatments for schizophrenia improve clinically relevant aspects of cognitive impairment. However, little is known about the reliability, validity and treatment responsiveness of interview-based measures, especially in the context of clinical trials. Data from two studies were utilized to assess these features of the Schizophrenia Cognition Rating Scale (SCoRS). One of the studies was a validation study involving 79 patients with schizophrenia assessed at 3 academic research centers in the US. The other study was a 32-site clinical trial conducted in the US and Europe comparing the effects of encenicline, an alpha-7 nicotine agonist, to placebo in 319 patients with schizophrenia. The SCoRS interviewer ratings demonstrated excellent test-retest reliability in several different circumstances, including those that did not involve treatment (ICC> 0.90), and during treatment (ICC>0.80). SCoRS interviewer ratings were related to cognitive performance as measured by the MCCB (r=-0.35), and demonstrated significant sensitivity to treatment with encenicline compared to placebo (P<.001). These data suggest that the SCoRS has potential as a clinically relevant measure in clinical trials aiming to improve cognition in schizophrenia, and may be useful for clinical practice. The weaknesses of the SCoRS include its reliance on informant information, which is not available for some patients, and reduced validity when patient's self-report is the sole information source. Copyright © 2014 Elsevier B.V. and ECNP. All rights reserved.
DeVeney, Shari L; Hoffman, Lesa; Cress, Cynthia J
2012-06-01
In this study, the authors compared a multiple-domain strategy for assessing developmental age of young children with developmental disabilities who were at risk for long-term reliance on augmentative and alternative communication (AAC) with a communication-based strategy composed of receptive language and communication indices that may be less affected by physically challenging tasks than traditional developmental age scores. Participants were 42 children (age 9-27 months) with developmental disabilities and who were at risk for long-term reliance on AAC. Children were assessed longitudinally in their homes at 3 occasions over 18 months using multiple-domain and communication-based measures. Confirmatory factor analysis examined dimensionality across the measures, and age-equivalence scores under each strategy were compared, where possible. The communication-based latent factor of developmental age demonstrated good reliability and was almost perfectly correlated with the multiple-domain latent factor. However, the mean age-equivalence score of the communication-based assessment significantly exceeded that of the multiple-domain assessment by 5.3 months across ages. Clinicians working with young children with developmental disabilities should consider a communication-based approach as an alternative developmental age assessment strategy for characterizing children's capabilities, identifying challenges, and developing interventions. A communication-based developmental age estimation is sufficiently reliable and may result in more valid inferences about developmental age for children whose developmental or cognitive age scores may otherwise be limited by their physical capabilities.
Wang, Bowen; Xiong, Haitao; Jiang, Chengrui
2014-01-01
As a hot topic in supply chain management, fuzzy method has been widely used in logistics center location selection to improve the reliability and suitability of the logistics center location selection with respect to the impacts of both qualitative and quantitative factors. However, it does not consider the consistency and the historical assessments accuracy of experts in predecisions. So this paper proposes a multicriteria decision making model based on credibility of decision makers by introducing priority of consistency and historical assessments accuracy mechanism into fuzzy multicriteria decision making approach. In this way, only decision makers who pass the credibility check are qualified to perform the further assessment. Finally, a practical example is analyzed to illustrate how to use the model. The result shows that the fuzzy multicriteria decision making model based on credibility mechanism can improve the reliability and suitability of site selection for the logistics center.
Wang, Bowen; Jiang, Chengrui
2014-01-01
As a hot topic in supply chain management, fuzzy method has been widely used in logistics center location selection to improve the reliability and suitability of the logistics center location selection with respect to the impacts of both qualitative and quantitative factors. However, it does not consider the consistency and the historical assessments accuracy of experts in predecisions. So this paper proposes a multicriteria decision making model based on credibility of decision makers by introducing priority of consistency and historical assessments accuracy mechanism into fuzzy multicriteria decision making approach. In this way, only decision makers who pass the credibility check are qualified to perform the further assessment. Finally, a practical example is analyzed to illustrate how to use the model. The result shows that the fuzzy multicriteria decision making model based on credibility mechanism can improve the reliability and suitability of site selection for the logistics center. PMID:25215319
Reliability Impacts in Life Support Architecture and Technology Selection
NASA Technical Reports Server (NTRS)
Lange Kevin E.; Anderson, Molly S.
2012-01-01
Quantitative assessments of system reliability and equivalent system mass (ESM) were made for different life support architectures based primarily on International Space Station technologies. The analysis was applied to a one-year deep-space mission. System reliability was increased by adding redundancy and spares, which added to the ESM. Results were thus obtained allowing a comparison of the ESM for each architecture at equivalent levels of reliability. Although the analysis contains numerous simplifications and uncertainties, the results suggest that achieving necessary reliabilities for deep-space missions will add substantially to the life support ESM and could influence the optimal degree of life support closure. Approaches for reducing reliability impacts were investigated and are discussed.
Vieira, A; Battini, M; Can, E; Mattiello, S; Stilwell, G
2018-01-08
This study was conducted within the context of the Animal Welfare Indicators (AWIN) project and the underlying scientific motivation for the development of the study was the scarcity of data regarding inter-observer reliability (IOR) of welfare indicators, particularly given the importance of reliability as a further step for developing on-farm welfare assessment protocols. The objective of this study is therefore to evaluate IOR of animal-based indicators (at group and individual-level) of the AWIN welfare assessment protocol (prototype) for dairy goats. In the design of the study, two pairs of observers, one in Portugal and another in Italy, visited 10 farms each and applied the AWIN prototype protocol. Farms in both countries were visited between January and March 2014, and all the observers received the same training before the farm visits were initiated. Data collected during farm visits, and analysed in this study, include group-level and individual-level observations. The results of our study allow us to conclude that most of the group-level indicators presented the highest IOR level ('substantial', 0.85 to 0.99) in both field studies, pointing to a usable set of animal-based welfare indicators that were therefore included in the first level of the final AWIN welfare assessment protocol for dairy goats. Inter-observer reliability of individual-level indicators was lower, but the majority of them still reached 'fair to good' (0.41 to 0.75) and 'excellent' (0.76 to 1) levels. In the paper we explore reasons for the differences found in IOR between the group and individual-level indicators, including how the number of individual-level indicators to be assessed on each animal and the restraining method may have affected the results. Furthermore, we discuss the differences found in the IOR of individual-level indicators in both countries: the Portuguese pair of observers reached a higher level of IOR, when compared with the Italian observers. We argue how the reasons behind these differences may stem from the restraining method applied, or the different background and experience of the observers. Finally, the discussion of the results emphasizes the importance of considering that reliability is not an absolute attribute of an indicator, but derives from an interaction between the indicators, the observers and the situation in which the assessment is taking place. This highlights the importance of further considering the indicators' reliability while developing welfare assessment protocols.
Quiroz, Viviana; Reinero, Daniela; Hernández, Patricia; Contreras, Johanna; Vernal, Rolando; Carvajal, Paola
2017-01-01
This study aimed to develop and assess the content validity and reliability of a cognitively adapted self-report questionnaire designed for surveillance of gingivitis in adolescents. Ten predetermined self-report questions evaluating early signs and symptoms of gingivitis were preliminary assessed by a panel of clinical experts. Eight questions were selected and cognitively tested in 20 adolescents aged 12 to 18 years from Santiago de Chile. The questionnaire was then conducted and answered by 178 Chilean adolescents. Internal consistency was measured using the Cronbach's alpha and temporal stability was calculated using the Kappa-index. A reliable final self-report questionnaire consisting of 5 questions was obtained, with a total Cronbach's alpha of 0.73 and a Kappa-index ranging from 0.41 to 0.77 between the different questions. The proposed questionnaire is reliable, with an acceptable internal consistency and a temporal stability from moderate to substantial, and it is promising for estimating the prevalence of gingivitis in adolescents.
NASA Technical Reports Server (NTRS)
Wallace, Dolores R.
2003-01-01
In FY01 we learned that hardware reliability models need substantial changes to account for differences in software, thus making software reliability measurements more effective, accurate, and easier to apply. These reliability models are generally based on familiar distributions or parametric methods. An obvious question is 'What new statistical and probability models can be developed using non-parametric and distribution-free methods instead of the traditional parametric method?" Two approaches to software reliability engineering appear somewhat promising. The first study, begin in FY01, is based in hardware reliability, a very well established science that has many aspects that can be applied to software. This research effort has investigated mathematical aspects of hardware reliability and has identified those applicable to software. Currently the research effort is applying and testing these approaches to software reliability measurement, These parametric models require much project data that may be difficult to apply and interpret. Projects at GSFC are often complex in both technology and schedules. Assessing and estimating reliability of the final system is extremely difficult when various subsystems are tested and completed long before others. Parametric and distribution free techniques may offer a new and accurate way of modeling failure time and other project data to provide earlier and more accurate estimates of system reliability.
Singendonk, M M J; Smits, M J; Heijting, I E; van Wijk, M P; Nurko, S; Rosen, R; Weijenborg, P W; Abu-Assi, R; Hoekman, D R; Kuizenga-Wessel, S; Seiboth, G; Benninga, M A; Omari, T I; Kritas, S
2015-02-01
The Chicago Classification (CC) facilitates interpretation of high-resolution manometry (HRM) recordings. Application of this adult based algorithm to the pediatric population is unknown. We therefore assessed intra and interrater reliability of software-based CC diagnosis in a pediatric cohort. Thirty pediatric solid state HRM recordings (13M; mean age 12.1 ± 5.1 years) assessing 10 liquid swallows per patient were analyzed twice by 11 raters (six experts, five non-experts). Software-placed anatomical landmarks required manual adjustment or removal. Integrated relaxation pressure (IRP4s), distal contractile integral (DCI), contractile front velocity (CFV), distal latency (DL) and break size (BS), and an overall CC diagnosis were software-generated. In addition, raters provided their subjective CC diagnosis. Reliability was calculated with Cohen's and Fleiss' kappa (κ) and intraclass correlation coefficient (ICC). Intra- and interrater reliability of software-generated CC diagnosis after manual adjustment of landmarks was substantial (mean κ = 0.69 and 0.77 respectively) and moderate-substantial for subjective CC diagnosis (mean κ = 0.70 and 0.58 respectively). Reliability of both software-generated and subjective diagnosis of normal motility was high (κ = 0.81 and κ = 0.79). Intra- and interrater reliability were excellent for IRP4s, DCI, and BS. Experts had higher interrater reliability than non-experts for DL (ICC = 0.65 vs ICC = 0.36 respectively) and the software-generated diagnosis diffuse esophageal spasm (DES, κ = 0.64 vs κ = 0.30). Among experts, the reliability for the subjective diagnosis of achalasia and esophageal gastric junction outflow obstruction was moderate-substantial (κ = 0.45-0.82). Inter- and intrarater reliability of software-based CC diagnosis of pediatric HRM recordings was high overall. However, experience was a factor influencing the diagnosis of some motility disorders, particularly DES and achalasia. © 2014 John Wiley & Sons Ltd.
Self-Motion Perception: Assessment by Real-Time Computer Generated Animations
NASA Technical Reports Server (NTRS)
Parker, Donald E.
1999-01-01
Our overall goal is to develop materials and procedures for assessing vestibular contributions to spatial cognition. The specific objective of the research described in this paper is to evaluate computer-generated animations as potential tools for studying self-orientation and self-motion perception. Specific questions addressed in this study included the following. First, does a non- verbal perceptual reporting procedure using real-time animations improve assessment of spatial orientation? Are reports reliable? Second, do reports confirm expectations based on stimuli to vestibular apparatus? Third, can reliable reports be obtained when self-motion description vocabulary training is omitted?
Frame-of-reference training for simulation-based intraoperative communication assessment.
Gardner, Aimee K; Russo, Michael A; Jabbour, Ibrahim I; Kosemund, Matthew; Scott, Daniel J
2016-09-01
The purpose of this study was to examine the impact of frame-of-reference (FOR) training on assessments of intraoperative communication skills and identify areas of need to inform curricular efforts. Simulation instructors (M.D., Ph.D., Research Fellow, Simulation Technician) underwent a 2-hour FOR training session with the operating room communication instrument. They then independently rated communication skills of 19 PGY1s who participated in a team-based simulation. Residents completed self-assessments via video review of the scenario. Intraclass correlation coefficients were used to examine inter-rater reliability. Relationships between trained raters and resident scores were assessed with Pearson correlation coefficients and paired sample t tests. Inter-reliability after FOR training was .91. The correlation between trained rater scores and resident evaluations was nonsignificant. Residents significantly underestimated their intraoperative communication skills (P < .05). Use of names, closed loop communication, and sharing information with team members demonstrated consistently low ratings among all residents. These findings reveal that a number of individuals can be trained to reliably rate resident intraoperative communication performance and that residents tend to under-rate their communication skills. Copyright © 2016 Elsevier Inc. All rights reserved.
A 2-year study of Gram stain competency assessment in 40 clinical laboratories.
Goodyear, Nancy; Kim, Sara; Reeves, Mary; Astion, Michael L
2006-01-01
We used a computer-based competency assessment tool for Gram stain interpretation to assess the performance of 278 laboratory staff from 40 laboratories on 40 multiple-choice questions. We report test reliability, mean scores, median, item difficulty, discrimination, and analysis of the highest- and lowest-scoring questions. The questions were reliable (KR-20 coefficient, 0.80). Overall mean score was 88% (range, 63%-98%). When categorized by cell type, the means were host cells, 93%; other cells (eg, yeast), 92%; gram-positive, 90%; and gram-negative, 88%. When categorized by type of interpretation, the means were other (eg, underdecolorization), 92%; identify by structure (eg, bacterial morphologic features), 91%; and identify by name (eg, genus and species), 87%. Of the 6 highest-scoring questions (mean scores, > or = 99%) 5 were identify by structure and 1 was identify by name. Of the 6 lowest-scoring questions (mean scores, < 75%) 5 were gram-negative and 1 was host cells. By type of interpretation, 2 were identify by structure and 4 were identify by name. Computer-based Gram stain competency assessment examinations are reliable. Our analysis helps laboratories identify areas for continuing education in Gram stain interpretation and will direct future revisions of the tests.
Development and Testing of the Church Environment Audit Tool.
Kaczynski, Andrew T; Jake-Schoffman, Danielle E; Peters, Nathan A; Dunn, Caroline G; Wilcox, Sara; Forthofer, Melinda
2018-05-01
In this paper, we describe development and reliability testing of a novel tool to evaluate the physical environment of faith-based settings pertaining to opportunities for physical activity (PA) and healthy eating (HE). Tool development was a multistage process including a review of similar tools, stakeholder review, expert feedback, and pilot testing. Final tool sections included indoor opportunities for PA, outdoor opportunities for PA, food preparation equipment, kitchen type, food for purchase, beverages for purchase, and media. Two independent audits were completed at 54 churches. Interrater reliability (IRR) was determined with Kappa and percent agreement. Of 218 items, 102 were assessed for IRR and 116 could not be assessed because they were not present at enough churches. Percent agreement for all 102 items was over 80%. For 42 items, the sample was too homogeneous to assess Kappa. Forty-six of the remaining items had Kappas greater than 0.60 (25 items 0.80-1.00; 21 items 0.60-0.79), indicating substantial to almost perfect agreement. The tool proved reliable and efficient for assessing church environments and identifying potential intervention points. Future work can focus on applications within faith-based partnerships to understand how church environments influence diverse health outcomes.
Stefanatou, Pentagiotissa; Giannouli, Eleni; Konstantakopoulos, George; Vitoratou, Silia; Mavreas, Venetsanos
2014-11-01
Evaluation of mental health services based on patients' needs assessments has never taken place in Greece, although it is a crucial factor for the efficient use of their limited resources. To examine the inter-rater and test-retest reliability and the concurrent/convergent validity of the Greek research version of the Camberwell Assessment of Need-Research (CAN-R). A total of 53 schizophrenic patient-staff pairs were interviewed twice to test the inter-rater and test-retest reliability of the Greek version of the CAN-R. The World Health Organization Quality of Life-Brief Form (WHOQOL-BREF) and World Health Organization Disability Assessment Schedule-2.0 (WHODAS-2.0) were administered to the patients to examine concurrent validity. The inter-rater and test-retest reliability of patient and staff interviews for the 22 individual items and the eight summary scores of the instrument's four sections were good to excellent. Significant correlations emerged between CAN scores and the WHOQOL-BREF and WHODAS-2.0 domains for both patient and staff ratings, indicating good concurrent validity. Our results suggest that the Greek version of the CAN-R is a reliable instrument for assessing mental health patients' needs. Moreover, it is the first CAN-R validity study with satisfactory results using WHOQOL-BREF and WHODAS-2.0 as criterion variables. © The Author(s) 2013.
Comparing Methods for Assessing Reliability Uncertainty Based on Pass/Fail Data Collected Over Time
Abes, Jeff I.; Hamada, Michael S.; Hills, Charles R.
2017-12-20
In this paper, we compare statistical methods for analyzing pass/fail data collected over time; some methods are traditional and one (the RADAR or Rationale for Assessing Degradation Arriving at Random) was recently developed. These methods are used to provide uncertainty bounds on reliability. We make observations about the methods' assumptions and properties. Finally, we illustrate the differences between two traditional methods, logistic regression and Weibull failure time analysis, and the RADAR method using a numerical example.
Comparing Methods for Assessing Reliability Uncertainty Based on Pass/Fail Data Collected Over Time
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abes, Jeff I.; Hamada, Michael S.; Hills, Charles R.
In this paper, we compare statistical methods for analyzing pass/fail data collected over time; some methods are traditional and one (the RADAR or Rationale for Assessing Degradation Arriving at Random) was recently developed. These methods are used to provide uncertainty bounds on reliability. We make observations about the methods' assumptions and properties. Finally, we illustrate the differences between two traditional methods, logistic regression and Weibull failure time analysis, and the RADAR method using a numerical example.
Life prediction and reliability assessment of lithium secondary batteries
NASA Astrophysics Data System (ADS)
Eom, Seung-Wook; Kim, Min-Kyu; Kim, Ick-Jun; Moon, Seong-In; Sun, Yang-Kook; Kim, Hyun-Soo
Reliability assessment of lithium secondary batteries was mainly considered. Shape parameter (β) and scale parameter (η) were calculated from experimental data based on cycle life test. We also examined safety characteristics of lithium secondary batteries. As proposed by IEC 62133 (2002), we had performed all of the safety/abuse tests such as 'mechanical abuse tests', 'environmental abuse tests', 'electrical abuse tests'. This paper describes the cycle life of lithium secondary batteries, FMEA (failure modes and effects analysis) and the safety/abuse tests we had performed.
de Jong, Lex D; van Meeteren, Annemiek; Emmelot, Cornelis H; Land, Nanne E; Dijkstra, Pieter U
2018-03-01
To determine reliability of the ABILHAND-Kids, explore sources of variation associated with these measurement results, and generate repeatability coefficients. A reliability study with a repeated measures design was performed in an ambulatory rehabilitation care department from a rehabilitation center, and a center for special education. A physician, an occupational therapist, and parents of 27 children with spastic cerebral palsy independently rated the children's manual capacity when performing 21 standardized tasks of the ABILHAND-Kids from video recordings twice with a three week time interval (27 first-, and 25 second video recordings available). Parents additionally rated their children's performance based on their own perception of their child's ability to perform manual activities in everyday life, resulting in eight ratings per child. ABILHAND-Kids ratings were systematically different between observers, sessions, and rating method. Participant × observer interaction (66%) and residual variance (20%) contributed the most to error variance (9%). Test-retest reliability was 0.92. Repeatability coefficients (between 0.81 and 1.82 logit points) were largest for the parents' performance-based ratings. ABILHAND-Kids scores can be reliably used as a performance- and capacity-based rating method across different raters. Parents' performance-based ratings are less reliable than their capacity-based ratings. Resulting repeatability coefficients can be used to interpret ABILHAND-Kids ratings with more confidence. Implications for Rehabilitation The ABILHAND-Kids is a valuable tool to assess a child's unimanual and bimanual upper limb activities. The reliability of the ABILHANDS-Kids is good across different observers as a performance- and capacity-based rating method. Parents' performance-based ratings are less reliable than their capacity-based ones. This study has generated repeatability coefficients for clinical decision making.
Reliability tests and guidelines for B-mode ultrasound assessment of central adiposity.
Stoner, Lee; Chinn, Victoria; Cornwall, Jon; Meikle, Grant; Page, Rachel; Lambrick, Danielle; Faulkner, James
2015-11-01
Ultrasound represents a validated and relatively inexpensive diagnostic device for assessing central adiposity; however, widespread adoption has been impeded by the lack of reliable standard operating procedures. To examine the reliability of, and describe guidelines for, ultrasound-derived recording of intra-abdominal fat thickness (IAT) and maximal preperitoneal fat thickness (PFT). Ultrasound scans were obtained from 20 adults (50% female, 26 ± 7 years, 24·5 kg/m(2) ) on three different mornings. IAT was assessed 2 cm above the umbilicus (transverse plane) measuring from linea alba to: (i) anterior aorta, (ii) posterior aorta and (iii) anterior aspect of the vertebral column. PFT was measured from linea alba to visceral peritoneum in (i) sagittal and (ii) transverse planes, immediately over and inferior to the xiphi-sternum, respectively. For IAT, the criterion intraclass correlation coefficient (ICC) of 0·75 was exceeded for measurements to anterior aorta (0·95), posterior aorta (0·94) and vertebra (0·96). The reliability coefficient expressed as a percentage of the mean (RC%) was lowest (better) for measurement to vertebrae (9·8%). For PFT, mean thickness was comparable for sagittal (1·74 cm) and transverse (1·76 cm) planes; ICC values were also comparable for both planes (0·98 vs. 0·98, respectively), as were RC% (7·5% vs. 7·1%, respectively). IAT assessments to the vertebra were marginally more reliable than those to other structures. While PFT assessments were equally reliable for both measurements planes, precise probe placement was easier for the sagittal plane. Based on these findings, guidelines for the reliable measurement of central adiposity using ultrasound are presented. © 2015 Stichting European Society for Clinical Investigation Journal Foundation.
Metrics for Assessing the Reliability of a Telemedicine Remote Monitoring System
Fox, Mark; Papadopoulos, Amy; Crump, Cindy
2013-01-01
Abstract Objective: The goal of this study was to assess using new metrics the reliability of a real-time health monitoring system in homes of older adults. Materials and Methods: The “MobileCare Monitor” system was installed into the homes of nine older adults >75 years of age for a 2-week period. The system consisted of a wireless wristwatch-based monitoring system containing sensors for location, temperature, and impacts and a “panic” button that was connected through a mesh network to third-party wireless devices (blood pressure cuff, pulse oximeter, weight scale, and a survey-administering device). To assess system reliability, daily phone calls instructed participants to conduct system tests and reminded them to fill out surveys and daily diaries. Phone reports and participant diary entries were checked against data received at a secure server. Results: Reliability metrics assessed overall system reliability, data concurrence, study effectiveness, and system usability. Except for the pulse oximeter, system reliability metrics varied between 73% and 92%. Data concurrence for proximal and distal readings exceeded 88%. System usability following the pulse oximeter firmware update varied between 82% and 97%. An estimate of watch-wearing adherence within the home was quite high, about 80%, although given the inability to assess watch-wearing when a participant left the house, adherence likely exceeded the 10 h/day requested time. In total, 3,436 of 3,906 potential measurements were obtained, indicating a study effectiveness of 88%. Conclusions: The system was quite effective in providing accurate remote health data. The different system reliability measures identify important error sources in remote monitoring systems. PMID:23611640
Nutrition Environment Food Pantry Assessment Tool (NEFPAT): Development and Evaluation.
Nikolaus, Cassandra J; Laurent, Emily; Loehmer, Emily; An, Ruopeng; Khan, Naiman; McCaffrey, Jennifer
2018-04-24
To develop and evaluate a nutrition environment assessment tool to assess the consumer nutrition environment and use of recommended practices in food pantries. The Nutrition Environment Food Pantry Assessment Tool (NEFPAT) was developed based on a literature review and guidance from professionals working with food pantries. The tool was pilot-tested at 9 food pantries, an expert panel assessed content validity, and interrater reliability was evaluated by pairs in 3 pantries. After revisions, the NEFPAT was used in 27 pantries. Pilot tests indicated positive appraisal for the NEFPAT and recommendations were addressed. The NEFPAT's 6 objectives and the overall tool were rated as content valid by experts, with an average section rating of 3.85 ± 0.10. Intraclass correlation coefficients for interrater reliability were >0.90. The NEFPAT is content valid with high interrater reliability. It provides baseline data that could be valuable for interventions within the nutrition environment of food pantries. Published by Elsevier Inc.
Lindsley, Kristina; Li, Tianjing; Ssemanda, Elizabeth; Virgili, Gianni; Dickersin, Kay
2016-01-01
Topic Are existing systematic reviews of interventions for age-related macular degeneration incorporated into clinical practice guidelines? Clinical relevance High-quality systematic reviews should be used to underpin evidence-based clinical practice guidelines and clinical care. We have examined the reliability of systematic reviews of interventions for age-related macular degeneration (AMD) and described the main findings of reliable reviews in relation to clinical practice guidelines. Methods Eligible publications are systematic reviews of the effectiveness of treatment interventions for AMD. We searched a database of systematic reviews in eyes and vision and employed no language or date restrictions; the database is up-to-date as of May 6, 2014. Two authors independently screened records for eligibility and abstracted and assessed the characteristics and methods of each review. We classified reviews as “reliable” when they reported eligibility criteria, comprehensive searches, appraisal of methodological quality of included studies, appropriate statistical methods for meta-analysis, and conclusions based on results. We mapped treatment recommendations from the American Academy of Ophthalmology Preferred Practice Patterns (AAO PPP) for AMD to the identified systematic reviews and assessed whether any reliable systematic review was cited or could have been cited to support each treatment recommendation. Results Of 1,570 systematic reviews in our database, 47 met our inclusion criteria. Most of the systematic reviews targeted neovascular AMD and investigated anti-vascular endothelial growth factor (anti-VEGF) interventions, dietary supplements or photodynamic therapy. We classified over two-thirds (33/47) of the reports as reliable. The quality of reporting varied, with criteria for reliable reporting met more often for Cochrane reviews and for reviews whose authors disclosed conflicts of interest. Although most systematic reviews were reliable, anti-VEGF agents and photodynamic therapy were the only interventions identified as effective by reliable reviews. Of 35 treatment recommendations extracted from the AAO PPP, 15 could have been supported with reliable systematic reviews; however, only one recommendation had an accompanying intervention systematic review citation, which we assessed as a reliable systematic review. No reliable systematic review was identified for 20 treatment recommendations, highlighting areas of evidence gaps. Conclusions For AMD, reliable systematic reviews exist for many treatment recommendations in the AAO PPP and should be used to support these recommendations. We also identified areas where no high-level evidence exists. Mapping clinical practice guidelines to existing systematic reviews is one way to highlight areas where evidence generation or evidence synthesis is either available or needed. PMID:26804762
ERIC Educational Resources Information Center
Rubin, Allen; Parrish, Danielle E.
2010-01-01
Objective: This report describes the development and preliminary findings regarding the reliability, validity, and sensitivity of a scale that has been developed to assess practitioners' perceived familiarity with, attitudes about, and implementation of the phases of the evidence-based practice (EBP) process. Method: After a panel of national…
Adapting the Media and Technology Usage and Attitudes Scale to Turkish
ERIC Educational Resources Information Center
Özgür, Hasan
2016-01-01
Due to the requirement of a current, valid, and reliable assessment instrument for determining usage frequencies of technology-based media and the attitudes towards these, this study intends to determine the validity and reliability of the Media and Technology Usage and Attitudes Scale, developed by researchers from California State University,…
ERIC Educational Resources Information Center
Romer, Natalie; Merrell, Kenneth W.
2013-01-01
This study focused on evaluating the temporal stability of self-reported and teacher-reported perceptions of students' social and emotional skills and assets. We used a test-retest reliability procedure over repeated administrations of the child, adolescent, and teacher versions of the "Social-Emotional Assets and Resilience Scales".…
75 FR 66038 - Planning Resource Adequacy Assessment Reliability Standard
Federal Register 2010, 2011, 2012, 2013, 2014
2010-10-27
... (R1 and R2) are assigned a violation risk factor (VRF) and violation severity level (VSL). However... base penalty amount. To do so, RFC is to assign a VRF to each Requirement and sub-Requirement of a... each VRF assignment.\\59\\ \\59\\ See North American Electric Reliability Corp., 119 FERC ] 61,145, order...
Test Theories, Educational Priorities and Reliability of Public Examinations in England
ERIC Educational Resources Information Center
Baird, Jo-Anne; Black, Paul
2013-01-01
Much has already been written on the controversies surrounding the use of different test theories in educational assessment. Other authors have noted the prevalence of classical test theory over item response theory in practice. This Special Issue draws together articles based upon work conducted on the Reliability Programme for England's…
Rönspies, Jelena; Schmidt, Alexander F; Melnikova, Anna; Krumova, Rosina; Zolfagari, Asadeh; Banse, Rainer
2015-07-01
The present study was conducted to validate an adaptation of the Implicit Relational Assessment Procedure (IRAP) as an indirect latency-based measure of sexual orientation. Furthermore, reliability and criterion validity of the IRAP were compared to two established indirect measures of sexual orientation: a Choice Reaction Time task (CRT) and a Viewing Time (VT) task. A sample of 87 heterosexual and 35 gay men completed all three indirect measures in an online study. The IRAP and the VT predicted sexual orientation nearly perfectly. Both measures also showed a considerable amount of convergent validity. Reliabilities (internal consistencies) reached satisfactory levels. In contrast, the CRT did not tap into sexual orientation in the present study. In sum, the VT measure performed best, with the IRAP showing only slightly lower reliability and criterion validity, whereas the CRT did not yield any evidence of reliability or criterion validity in the present research. The results were discussed in the light of specific task properties of the indirect latency-based measures (task-relevance vs. task-irrelevance).
Reliability and validity of a school recess physical activity recall in Spanish youth.
Martínez-Gómez, David; Calabro, M Andres; Welk, Gregory J; Marcos, Ascension; Veiga, Oscar L
2010-05-01
Recess is a frequent target in school-based physical activity (PA) promotion research but there are challenges in assessing PA during this time period. The purpose of this study was to evaluate the reliability and validity of a recess PA recall (RPAR) instrument designed to assess total PA and time spent in moderate to vigorous PA (MVPA) during recess. One hundred twenty-five 7th and 8th-grade students (59 females), age 12-14 years, participated in the study. Activity levels were objectively monitored on Mondays using different activity monitors (Yamax Digiwalker, Biotrainer and ActiGraph). On Tuesdays, 2 RPAR self-reports were administered within 1-hr. Test-retest reliability showed ICC = 0.87 and 0.88 for total PA and time spent in MVPA, respectively. The RPAR was correlated against Yamax (r = .35), Biotrainer (r = .40 and 0.54) and ActiGraph (r = .42) to assess total PA during recess. The RPAR was also correlated against ActiGraph (r = .54) to assess time spent in MVPA during recess. Mean difference between the RPAR and ActiGraph to assess time spent in MVPA during recess was no significant (2.15 +/- 3.67 min, p = .313). The RPAR showed an adequate reliability and a reasonable validity for assessing PA during the school recess in youth.
Siu, B W M; Au-Yeung, C C Y; Chan, A W L; Chan, L S Y; Yuen, K K; Leung, H W; Yan, C K; Ng, K K; Lai, A C H; Davies, S; Collins, M
Mapping forensic psychiatric services with the security needs of patients is a salient step in service planning, audit and review. A valid and reliable instrument for measuring the security needs of Chinese forensic psychiatric inpatients was not yet available. This study aimed to develop and validate the Chinese version of the Security Needs Assessment Profile for measuring the profiles of security needs of Chinese forensic psychiatric inpatients. The Security Needs Assessment Profile by Davis was translated into Chinese. Its face validity, content validity, construct validity and internal consistency reliability were assessed by measuring the security needs of 98 Chinese forensic psychiatric inpatients. Principal factor analysis for construct validity provided a six-factor security needs model explaining 68.7% of the variance. Based on the Cronbach's alpha coefficient, the internal consistency reliability was rated as acceptable for procedural security (0.73), and fair for both physical security (0.62) and relational security (0.58). A significant sex difference (p=0.002) in total security score was found. The Chinese version of the Security Needs Assessment Profile is a valid and reliable instrument for assessing the security needs of Chinese forensic psychiatric inpatients. Copyright © 2017 Elsevier Ltd. All rights reserved.
Read, Paul J; Oliver, Jon L; Croix, Mark Ba De Ste; Myer, Gregory D; Lloyd, Rhodri S
2016-12-01
Read, P, Oliver, JL, Croix, MD, Myer, GD, and Lloyd, RS. Consistency of field-based measures of neuromuscular control using force-plate diagnostics in elite male youth soccer players. J Strength Cond Res 30(12): 3304-3311, 2016-Deficits in neuromuscular control during movement patterns such as landing are suggested pathomechanics that underlie sport-related injury. A common mode of assessment is measurement of landing forces during jumping tasks; however, these measures have been used less frequently in male youth soccer players, and reliability data are sparse. The aim of this study was to examine the reliability of a field-based neuromuscular control screening battery using force-plate diagnostics in this cohort. Twenty-six pre-peak height velocity (PHV) and 25 post-PHV elite male youth soccer players completed a drop vertical jump (DVJ), single-leg 75% horizontal hop and stick (75%HOP), and single-leg countermovement jump (SLCMJ). Measures of peak landing vertical ground reaction force (pVGRF), time to stabilization, time to pVGRF, and pVGRF asymmetry were recorded. A test-retest design was used, and reliability statistics included change in mean, intraclass correlation coefficient, and coefficient of variation (CV). No significant differences in mean score were reported for any of the assessed variables between test sessions. In both groups, pVGRF and asymmetry during the 75%HOP and SLCMJ demonstrated largely acceptable reliability (CV ≤ 10%). Greater variability was evident in DVJ pVGRF and all other assessed variables, across the 3 protocols (CV range = 13.8-49.7%). Intraclass correlation coefficient values ranged from small to large and were generally higher in the post-PHV players. The results of this study suggest that pVGRF and asymmetry can be reliably assessed using a 75%HOP and SLCMJ in this cohort. These measures could be used to support a screening battery for elite male youth soccer players and for test-retest comparison.
Kramp, Kelvin H; van Det, Marc J; Veeger, Nic J G M; Pierie, Jean-Pierre E N
2016-06-01
There is no widely used method to evaluate procedure-specific laparoscopic skills. The first aim of this study was to develop a procedure-based assessment method. The second aim was to compare its validity, reliability and feasibility with currently available global rating scales (GRSs). An independence-scaled procedural assessment was created by linking the procedural key steps of the laparoscopic cholecystectomy to an independence scale. Subtitled and blinded videos of a novice, an intermediate and an almost competent trainee, were evaluated with GRSs (OSATS and GOALS) and the independence-scaled procedural assessment by seven surgeons, three senior trainees and six scrub nurses. Participants received a short introduction to the GRSs and independence-scaled procedural assessment before assessment. The validity was estimated with the Friedman and Wilcoxon test and the reliability with the intra-class correlation coefficient (ICC). A questionnaire was used to evaluate user opinion. Independence-scaled procedural assessment and GRS scores improved significantly with surgical experience (OSATS p = 0.001, GOALS p < 0.001, independence-scaled procedural assessment p < 0.001). The ICCs of the OSATS, GOALS and independence-scaled procedural assessment were 0.78, 0.74 and 0.84, respectively, among surgeons. The ICCs increased when the ratings of scrub nurses were added to those of the surgeons. The independence-scaled procedural assessment was not considered more of an administrative burden than the GRSs (p = 0.692). A procedural assessment created by combining procedural key steps to an independence scale is a valid, reliable and acceptable assessment instrument in surgery. In contrast to the GRSs, the reliability of the independence-scaled procedural assessment exceeded the threshold of 0.8, indicating that it can also be used for summative assessment. It furthermore seems that scrub nurses can assess the operative competence of surgical trainees.
Keller, Jürgen; Krimly, Amon; Bauer, Lisa; Schulenburg, Sarah; Böhm, Sarah; Aho-Özhan, Helena E A; Uttner, Ingo; Gorges, Martin; Kassubek, Jan; Pinkhardt, Elmar H; Abrahams, Sharon; Ludolph, Albert C; Lulé, Dorothée
2017-08-01
Reliable assessment of cognitive functions is a challenging task in amyotrophic lateral sclerosis (ALS) patients unable to speak and write. We therefore present an eye-tracking based neuropsychological screening tool based on the Edinburgh Cognitive and Behavioural ALS Screen (ECAS), a standard screening tool for cognitive deficits in ALS. In total, 46 ALS patients and 50 healthy controls matched for age, gender and education were tested with an oculomotor based and a standard paper-and-pencil version of the ECAS. Significant correlation between both versions was observed for ALS patients and healthy controls in the ECAS total score and in all of its ALS-specific domains (all r > 0.3; all p < 0.05). The eye-tracking version of the ECAS reliably distinguished between ALS patients and healthy controls in the ECAS total score (p < 0.05). Also, cognitively impaired and non-impaired patients could be reliably distinguished with a specificity of 95%. This study provides first evidence that the eye-tracking based ECAS version is a promising approach for assessing cognitive deficits in ALS patients who are unable to speak or write.
Development of the CarMen-Q Questionnaire for mental workload assessment.
Rubio-Valdehita, Susana; López-Núñez, María I; López-Higes, Ramón; Díaz-Ramiro, Eva M
2017-11-01
Mental workload has emerged as one of the most important occupational risk factors present in most psychological and physical diseases caused by work. In view of the lack of specific tools to assess mental workload, the objective of this research was to assess the construct validity and reliability of a new questionnaire for mental workload assessment (CarMen-Q). The sample was composed of 884 workers from several professional sectors, between 18 and 65 years old, 53.4% men and 46.6% women. To evaluate the validity based on relationships with other measures, the NASA-TLX scale was also administered. Confirmatory factor analysis showed an internal structure made up of four dimensions: cognitive, temporal and emotional demands and performance requirement. The results show satisfactory evidence of validity based on relationships with NASA-TLX and good reliability. The questionnaire has good psychometric properties and can be an easy, brief, useful tool for mental workload diagnosis and prevention.
Assessing local instrument reliability and validity: a field-based example from northern Uganda.
Betancourt, Theresa S; Bass, Judith; Borisova, Ivelina; Neugebauer, Richard; Speelman, Liesbeth; Onyango, Grace; Bolton, Paul
2009-08-01
This paper presents an approach for evaluating the reliability and validity of mental health measures in non-Western field settings. We describe this approach using the example of our development of the Acholi psychosocial assessment instrument (APAI), which is designed to assess depression-like (two tam, par and kumu), anxiety-like (ma lwor) and conduct problems (kwo maraco) among war-affected adolescents in northern Uganda. To examine the criterion validity of this measure in the absence of a traditional gold standard, we derived local syndrome terms from qualitative data and used self reports of these syndromes by indigenous people as a reference point for determining caseness. Reliability was examined using standard test-retest and inter-rater methods. Each of the subscale scores for the depression-like syndromes exhibited strong internal reliability ranging from alpha = 0.84-0.87. Internal reliability was good for anxiety (0.70), conduct problems (0.83), and the pro-social attitudes and behaviors (0.70) subscales. Combined inter-rater reliability and test-retest reliability were good for most subscales except for the conduct problem scale and prosocial scales. The pattern of significant mean differences in the corresponding APAI problem scale score between self-reported cases vs. noncases on local syndrome terms was confirmed in the data for all of the three depression-like syndromes, but not for the anxiety-like syndrome ma lwor or the conduct problem kwo maraco.
Crockford, Christopher; Newton, Judith; Lonergan, Katie; Madden, Caoifa; Mays, Iain; O'Sullivan, Meabhdh; Costello, Emmet; Pinto-Grau, Marta; Vajda, Alice; Heverin, Mark; Pender, Niall; Al-Chalabi, Ammar; Hardiman, Orla; Abrahams, Sharon
2018-02-01
Cognitive impairment affects approximately 50% of people with amyotrophic lateral sclerosis (ALS). Research has indicated that impairment may worsen with disease progression. The Edinburgh Cognitive and Behavioural ALS Screen (ECAS) was designed to measure neuropsychological functioning in ALS, with its alternate forms (ECAS-A, B, and C) allowing for serial assessment over time. The aim of the present study was to establish reliable change scores for the alternate forms of the ECAS, and to explore practice effects and test-retest reliability of the ECAS's alternate forms. Eighty healthy participants were recruited, with 57 completing two and 51 completing three assessments. Participants were administered alternate versions of the ECAS serially (A-B-C) at four-month intervals. Intra-class correlation analysis was employed to explore test-retest reliability, while analysis of variance was used to examine the presence of practice effects. Reliable change indices (RCI) and regression-based methods were utilized to establish change scores for the ECAS alternate forms. Test-retest reliability was excellent for ALS Specific, ALS Non-Specific, and ECAS Total scores of the combined ECAS A, B, and C (all > .90). No significant practice effects were observed over the three testing sessions. RCI and regression-based methods produced similar change scores. The alternate forms of the ECAS possess excellent test-retest reliability in a healthy control sample, with no significant practice effects. The use of conservative RCI scores is recommended. Therefore, a change of ≥8, ≥4, and ≥9 for ALS Specific, ALS Non-Specific, and ECAS Total score is required for reliable change.
Evaluation of Animal-Based Indicators to Be Used in a Welfare Assessment Protocol for Sheep.
Richmond, Susan E; Wemelsfelder, Francoise; de Heredia, Ina Beltran; Ruiz, Roberto; Canali, Elisabetta; Dwyer, Cathy M
2017-01-01
Sheep are managed under a variety of different environments (continually outdoors, partially outdoors with seasonal or diurnal variation, continuously indoors) and for different purposes, which makes assessing welfare challenging. This diversity means that resource-based indicators are not particularly useful and, thus, a welfare assessment scheme for sheep, focusing on animal-based indicators, was developed. We focus specifically on ewes, as the most numerous group of sheep present on farm, although many of the indicators may also have relevance to adult male sheep. Using the Welfare Quality ® framework of four Principles and 12 Criteria, we considered the validity, reliability, and feasibility of 46 putative animal-based indicators derived from the literature for these criteria. Where animal-based indicators were potentially unreliably or were not considered feasible, we also considered the resource-based indicators of access to water, stocking density, and floor slipperiness. With the exception of the criteria "Absence of prolonged thirst," we suggest at least one animal-based indicator for each welfare criterion. As a minimum, face validity was available for all indicators; however, for many, we found evidence of convergent validity and discriminant validity (e.g., lameness as measured by gait score, body condition score). The reliability of most of the physical and health measures has been tested in the field and found to be appropriate for use in welfare assessment. However, for the majority of the proposed behavioral indicators (lying synchrony, social withdrawal, postures associated with pain, vocalizations, stereotypy, vigilance, response to surprise, and human approach test), this still needs to be tested. In conclusion, the comprehensive assessment of sheep welfare through largely animal-based measures is supported by the literature through the use of indicators focusing on specific aspects of sheep biology. Further work is required for some indicators to ensure that measures are reliable when used in commercial settings.
Evaluation of Animal-Based Indicators to Be Used in a Welfare Assessment Protocol for Sheep
Richmond, Susan E.; Wemelsfelder, Francoise; de Heredia, Ina Beltran; Ruiz, Roberto; Canali, Elisabetta; Dwyer, Cathy M.
2017-01-01
Sheep are managed under a variety of different environments (continually outdoors, partially outdoors with seasonal or diurnal variation, continuously indoors) and for different purposes, which makes assessing welfare challenging. This diversity means that resource-based indicators are not particularly useful and, thus, a welfare assessment scheme for sheep, focusing on animal-based indicators, was developed. We focus specifically on ewes, as the most numerous group of sheep present on farm, although many of the indicators may also have relevance to adult male sheep. Using the Welfare Quality® framework of four Principles and 12 Criteria, we considered the validity, reliability, and feasibility of 46 putative animal-based indicators derived from the literature for these criteria. Where animal-based indicators were potentially unreliably or were not considered feasible, we also considered the resource-based indicators of access to water, stocking density, and floor slipperiness. With the exception of the criteria “Absence of prolonged thirst,” we suggest at least one animal-based indicator for each welfare criterion. As a minimum, face validity was available for all indicators; however, for many, we found evidence of convergent validity and discriminant validity (e.g., lameness as measured by gait score, body condition score). The reliability of most of the physical and health measures has been tested in the field and found to be appropriate for use in welfare assessment. However, for the majority of the proposed behavioral indicators (lying synchrony, social withdrawal, postures associated with pain, vocalizations, stereotypy, vigilance, response to surprise, and human approach test), this still needs to be tested. In conclusion, the comprehensive assessment of sheep welfare through largely animal-based measures is supported by the literature through the use of indicators focusing on specific aspects of sheep biology. Further work is required for some indicators to ensure that measures are reliable when used in commercial settings. PMID:29322048
Sensor Selection and Optimization for Health Assessment of Aerospace Systems
NASA Technical Reports Server (NTRS)
Maul, William A.; Kopasakis, George; Santi, Louis M.; Sowers, Thomas S.; Chicatelli, Amy
2007-01-01
Aerospace systems are developed similarly to other large-scale systems through a series of reviews, where designs are modified as system requirements are refined. For space-based systems few are built and placed into service. These research vehicles have limited historical experience to draw from and formidable reliability and safety requirements, due to the remote and severe environment of space. Aeronautical systems have similar reliability and safety requirements, and while these systems may have historical information to access, commercial and military systems require longevity under a range of operational conditions and applied loads. Historically, the design of aerospace systems, particularly the selection of sensors, is based on the requirements for control and performance rather than on health assessment needs. Furthermore, the safety and reliability requirements are met through sensor suite augmentation in an ad hoc, heuristic manner, rather than any systematic approach. A review of the current sensor selection practice within and outside of the aerospace community was conducted and a sensor selection architecture is proposed that will provide a justifiable, dependable sensor suite to address system health assessment requirements.
Sensor Selection and Optimization for Health Assessment of Aerospace Systems
NASA Technical Reports Server (NTRS)
Maul, William A.; Kopasakis, George; Santi, Louis M.; Sowers, Thomas S.; Chicatelli, Amy
2008-01-01
Aerospace systems are developed similarly to other large-scale systems through a series of reviews, where designs are modified as system requirements are refined. For space-based systems few are built and placed into service these research vehicles have limited historical experience to draw from and formidable reliability and safety requirements, due to the remote and severe environment of space. Aeronautical systems have similar reliability and safety requirements, and while these systems may have historical information to access, commercial and military systems require longevity under a range of operational conditions and applied loads. Historically, the design of aerospace systems, particularly the selection of sensors, is based on the requirements for control and performance rather than on health assessment needs. Furthermore, the safety and reliability requirements are met through sensor suite augmentation in an ad hoc, heuristic manner, rather than any systematic approach. A review of the current sensor selection practice within and outside of the aerospace community was conducted and a sensor selection architecture is proposed that will provide a justifiable, defendable sensor suite to address system health assessment requirements.
Lo, Wing-Sze; Ho, Sai-Yin; Wong, Bonny Yee-Man; Mak, Kwok-Kei; Lam, Tai-Hing
2011-06-01
The reliability and validity of Stunkard's Figure Rating Scale (FRS) as a measure of current body size (CBS) was established in Western adolescent girls but not in non-Western population. We examined the validity and test-retest reliability of Stunkard's FRS in assessing CBS among Chinese adolescents. Methods. In a school-based survey in Hong Kong, 5666 adolescents (boys: 45.1%; mean age 14.7 years) provided data on self-reported height and weight, CBS, perceived weight status, and health-related quality of life using the Medical Outcomes Study Short-Form version 2 (SF-12v2). Height and weight were also objectively measured. Spearman's correlation was used to assess construct validity, concurrent validity and test-retest reliability. Convergent and discriminant validity were good: CBS correlated strongly with weight and self-reported/measured BMI, but only weakly with SF-12v2. CBS correlated strongly with perceived weight status, showing concurrent validity. Spearman's correlation (r) for CBS was 0.78 for girls and 0.72 for boys indicating good test-retest reliability. Validity and reliability results did not differ significantly between senior and junior grade adolescents. Our findings support the use of Stunkard's FRS to measure body size among Chinese adolescents.
The development of an instrument to match individuals with disabilities and service animals.
Zapf, S A; Rough, R B
There has been an increase in the use of service animals assisting persons with disabilities in the past decade. However many of the service dog agencies do not utilize an assessment that is designed to match the person to the animal in the rehabilitation and psycho-social domains. The purpose of this study was to develop the Service Animal Adaptive Intervention Assessment (SAAIA) and to measure the content validity, inter-rater reliability and clinical utility of the assessment. Two subject groups were used. Subject group one had 43 subjects who measured the content validity and clinical utility of the SAAIA Survey. Subject group two had 12 subjects who measured the inter-rater reliability by completing the SAAIA using information obtained through a video-taped client case scenario. Content validity results indicated a good to high percentage of agreement and a fair percentage of agreement for clinical utility. Inter-rater reliability results indicate good to high agreement on six of the eight variables of the SAAIA. However, the Kappa score indicates low inter-rater reliability. Results indicate the SAAIA has good content validity and inter-rater reliability and fair clinical utility based on percent agreement. However, further research is needed on the reliability of the SAAIA.
System reliability of randomly vibrating structures: Computational modeling and laboratory testing
NASA Astrophysics Data System (ADS)
Sundar, V. S.; Ammanagi, S.; Manohar, C. S.
2015-09-01
The problem of determination of system reliability of randomly vibrating structures arises in many application areas of engineering. We discuss in this paper approaches based on Monte Carlo simulations and laboratory testing to tackle problems of time variant system reliability estimation. The strategy we adopt is based on the application of Girsanov's transformation to the governing stochastic differential equations which enables estimation of probability of failure with significantly reduced number of samples than what is needed in a direct simulation study. Notably, we show that the ideas from Girsanov's transformation based Monte Carlo simulations can be extended to conduct laboratory testing to assess system reliability of engineering structures with reduced number of samples and hence with reduced testing times. Illustrative examples include computational studies on a 10-degree of freedom nonlinear system model and laboratory/computational investigations on road load response of an automotive system tested on a four-post test rig.
Tung, Li-Chen; Yu, Wan-Hui; Lin, Gong-Hong; Yu, Tzu-Ying; Wu, Chien-Te; Tsai, Chia-Yin; Chou, Willy; Chen, Mei-Hsiang; Hsieh, Ching-Lin
2016-09-01
To develop a Tablet-based Symbol Digit Modalities Test (T-SDMT) and to examine the test-retest reliability and concurrent validity of the T-SDMT in patients with stroke. The study had two phases. In the first phase, six experts, nine college students and five outpatients participated in the development and testing of the T-SDMT. In the second phase, 52 outpatients were evaluated twice (2 weeks apart) with the T-SDMT and SDMT to examine the test-retest reliability and concurrent validity of the T-SDMT. The T-SDMT was developed via expert input and college student/patient feedback. Regarding test-retest reliability, the practise effects of the T-SDMT and SDMT were both trivial (d=0.12) but significant (p≦0.015). The improvement in the T-SDMT (4.7%) was smaller than that in the SDMT (5.6%). The minimal detectable changes (MDC%) of the T-SDMT and SDMT were 6.7 (22.8%) and 10.3 (32.8%), respectively. The T-SDMT and SDMT were highly correlated with each other at the two time points (Pearson's r=0.90-0.91). The T-SDMT demonstrated good concurrent validity with the SDMT. Because the T-SDMT had a smaller practise effect and less random measurement error (superior test-retest reliability), it is recommended over the SDMT for assessing information processing speed in patients with stroke. Implications for Rehabilitation The Symbol Digit Modalities Test (SDMT), a common measure of information processing speed, showed a substantial practise effect and considerable random measurement error in patients with stroke. The Tablet-based SDMT (T-SDMT) has been developed to reduce the practise effect and random measurement error of the SDMT in patients with stroke. The T-SDMT had smaller practise effect and random measurement error than the SDMT, which can provide more reliable assessments of information processing speed.
De Keersmaecker, Wanda; Lhermitte, Stef; Honnay, Olivier; Farifteh, Jamshid; Somers, Ben; Coppin, Pol
2014-07-01
Increasing frequency of extreme climate events is likely to impose increased stress on ecosystems and to jeopardize the services that ecosystems provide. Therefore, it is of major importance to assess the effects of extreme climate events on the temporal stability (i.e., the resistance, the resilience, and the variance) of ecosystem properties. Most time series of ecosystem properties are, however, affected by varying data characteristics, uncertainties, and noise, which complicate the comparison of ecosystem stability metrics (ESMs) between locations. Therefore, there is a strong need for a more comprehensive understanding regarding the reliability of stability metrics and how they can be used to compare ecosystem stability globally. The objective of this study was to evaluate the performance of temporal ESMs based on time series of the Moderate Resolution Imaging Spectroradiometer derived Normalized Difference Vegetation Index of 15 global land-cover types. We provide a framework (i) to assess the reliability of ESMs in function of data characteristics, uncertainties and noise and (ii) to integrate reliability estimates in future global ecosystem stability studies against climate disturbances. The performance of our framework was tested through (i) a global ecosystem comparison and (ii) an comparison of ecosystem stability in response to the 2003 drought. The results show the influence of data quality on the accuracy of ecosystem stability. White noise, biased noise, and trends have a stronger effect on the accuracy of stability metrics than the length of the time series, temporal resolution, or amount of missing values. Moreover, we demonstrate the importance of integrating reliability estimates to interpret stability metrics within confidence limits. Based on these confidence limits, other studies dealing with specific ecosystem types or locations can be put into context, and a more reliable assessment of ecosystem stability against environmental disturbances can be obtained. © 2013 John Wiley & Sons Ltd.
McCurdy, M; Bellows, A; Deng, D; Leppert, M; Mahone, E; Pritchard, A
2015-01-01
Reliable and valid screening and assessment tools are necessary to identify children at risk for neurodevelopmental disabilities who may require additional services. This study evaluated the test-retest reliability of the Capute Scales in a high-risk sample, hypothesizing adequate reliability across 6- and 12-month intervals. Capute Scales scores (N = 66) were collected via retrospective chart review from a NICU follow-up clinic within a large urban medical center spanning three age-ranges: 12-18, 19-24, and 25-36 months. On average, participants were classified as very low birth weight and premature. Reliability of the Capute Scales was evaluated with intraclass correlation coefficients across length of test-retest interval, age at testing, and degree of neonatal complications. The Capute Scales demonstrated high reliability, regardless of length of test-retest interval (ranging from 6 to 14 months) or age of participant, for all index scores, including overall Developmental Quotient (DQ), language-based skill index (CLAMS) and nonverbal reasoning index (CAT). Linear regressions revealed that greater neonatal risk was related to poorer test-retest reliability; however, reliability coefficients remained strong. The Capute Scales afford clinicians a reliable and valid means of screening and assessing for neurodevelopmental delay within high-risk infant populations.
Excellent reliability of the Hamilton Depression Rating Scale (HDRS-21) in Indonesia after training.
Istriana, Erita; Kurnia, Ade; Weijers, Annelies; Hidayat, Teddy; Pinxten, Lucas; de Jong, Cor; Schellekens, Arnt
2013-09-01
The Hamilton Depression Rating Scale (HDRS) is the most widely used depression rating scale worldwide. Reliability of HDRS has been reported mainly from Western countries. The current study tested the reliability of HDRS ratings among psychiatric residents in Indonesia, before and after HDRS training. The hypotheses were that: (i) prior to the training reliability of HDRS ratings is poor; and (ii) HDRS training can improve reliability of HDRS ratings to excellent levels. Furthermore, we explored cultural validity at item level. Videotaped HDRS interviews were rated by 30 psychiatric residents before and after 1 day of HDRS training. Based on a gold standard rating, percentage correct ratings and deviation from the standard were calculated. Correct ratings increased from 83% to 99% at item level and from 70% to 100% for the total rating. The average deviation from the gold standard rating improved from 0.07 to 0.02 at item level and from 2.97 to 0.46 for the total rating. HDRS assessment by psychiatric trainees in Indonesia without prior training is unreliable. A short, evidence-based HDRS training improves reliability to near perfect levels. The outlined training program could serve as a template for HDRS trainings. HDRS items that may be less valid for assessment of depression severity in Indonesia are discussed. Copyright © 2013 Wiley Publishing Asia Pty Ltd.
Meylan, Grégoire; Reck, Barbara K; Rechberger, Helmut; Graedel, Thomas E; Schwab, Oliver
2017-10-17
Decision-makers traditionally expect "hard facts" from scientific inquiry, an expectation that the results of material flow analyses (MFAs) can hardly meet. MFA limitations are attributable to incompleteness of flowcharts, limited data quality, and model assumptions. Moreover, MFA results are, for the most part, based less on empirical observation but rather on social knowledge construction processes. Developing, applying, and improving the means of evaluating and communicating the reliability of MFA results is imperative. We apply two recently proposed approaches for making quantitative statements on MFA reliability to national minor metals systems: rhenium, gallium, and germanium in the United States in 2012. We discuss the reliability of results in policy and management contexts. The first approach consists of assessing data quality based on systematic characterization of MFA data and the associated meta-information and quantifying the "information content" of MFAs. The second is a quantification of data inconsistencies indicated by the "degree of data reconciliation" between the data and the model. A high information content and a low degree of reconciliation indicate reliable or certain MFA results. This article contributes to reliability and uncertainty discourses in MFA, exemplifying the usefulness of the approaches in policy and management, and to raw material supply discussions by providing country-level information on three important minor metals often considered critical.
THE DYNAMIC LEAP AND BALANCE TEST (DLBT): A TEST-RETEST RELIABILITY STUDY
Newman, Thomas M.; Smith, Brent I.; John Miller, Sayers
2017-01-01
Background There is a need for new clinical assessment tools to test dynamic balance during typical functional movements. Common methods for assessing dynamic balance, such as the Star Excursion Balance Test, which requires controlled movement of body segments over an unchanged base of support, may not be an adequate measure for testing typical functional movements that involve controlled movement of body segments along with a change in base of support. Purpose/hypothesis The purpose of this study was to determine the reliability of the Dynamic Leap and Balance Test (DLBT) by assessing its test-retest reliability. It was hypothesized that there would be no statistically significant differences between testing days in time taken to complete the test. Study Design Reliability study Methods Thirty healthy college aged individuals participated in this study. Participants performed a series of leaps in a prescribed sequence, unique to the DLBT test. Time required by the participants to complete the 20-leap task was the dependent variable. Subjects leaped back and forth from peripheral to central targets alternating weight bearing from one leg to the other. Participants landed on the central target with the tested limb and were required to stabilize for two seconds before leaping to the next target. Stability was based upon qualitative measures similar to Balance Error Scoring System. Each assessment was comprised of three trials and performed on two days with a separation of at least six days. Results Two-way mixed ANOVA was used to analyze the differences in time to complete the sequence between the three trial averages of the two testing sessions. Intraclass Correlation Coefficient (ICC3,1) was used to establish between session test-retest reliability of the test trial averages. Significance was set a priori at p ≤ 0.05. No significant differences (p > 0.05) were detected between the two testing sessions. The ICC was 0.93 with a 95% confidence interval from 0.84 to 0.96. Conclusion This test is a cost-effective, easy to administer and clinically relevant novel measure for assessing dynamic balance that has excellent test-retest reliability. Clinical relevance As a new measure of dynamic balance, the DLBT has the potential to be a cost-effective, challenging and functional tool for clinicians. Level of Evidence 2b PMID:28900556
Hawkins, Keith A; Jennings, Danna; Vincent, Andrea S; Gilliland, Kirby; West, Adrienne; Marek, Kenneth
2012-08-01
The automated neuropsychological assessment metrics battery-4 for PD offers the promise of a computerized approach to cognitive assessment. To assess its utility, the ANAM4-PD was administered to 72 PD patients and 24 controls along with a traditional battery. Reliability was assessed by retesting 26 patients. The cognitive efficiency score (CES; a global score) exhibited high reliability (r = 0.86). Constituent variables exhibited lower reliability. The CES correlated strongly with the traditional battery global score, but displayed weaker relationships to UPDRS scores than the traditional score. Multivariate analysis of variance revealed a significant difference between the patient and control groups in ANAM4-PD performance, with three ANAM4-PD tests, math, tower, and pursuit tracking, displaying sizeable differences. In discriminant analyses these variables were as effective as the total ANAM4-PD in classifying cases designated as impaired based on traditional variables. Principal components analyses uncovered fewer factors in the ANAM4-PD relative to the traditional battery. ANAM4-PD variables correlated at higher levels with traditional motor and processing speed variables than with untimed executive, intellectual or memory variables. The ANAM4-PD displays high global reliability, but variable subtest reliability. The battery assesses a narrower range of cognitive functions than traditional tests, and discriminates between patients and controls less effectively. Three ANAM4-PD tests, pursuit tracking, math, and tower performed as well as the total ANAM4-PD in classifying patients as cognitively impaired. These findings could guide the refinement of the ANAM4-PD as an efficient method of screening for mild to moderate cognitive deficits in PD patients. Copyright © 2012 Elsevier Ltd. All rights reserved.
Hayashi, Paul H.; Barnhart, Huiman X.; Fontana, Robert J.; Chalasani, Naga; Davern, Timothy J.; Talwalkar, Jayant A.; Reddy, K. Rajender; Stolz, Andrew A.; Hoofnagle, Jay H.; Rockey, Don C.
2014-01-01
Background Due to the lack of objective tests to diagnose drug induced liver injury (DILI), causality assessment is a matter of debate. Expert opinion is often used in research and industry but its test-retest reliability is unknown. Aims To determine the test-retest reliability of the expert opinion process used by the Drug-Induced Liver Injury Network (DILIN) Methods Three DILIN hepatologists adjudicate suspected hepatotoxicity cases to 1 of 5 categories representing levels of likelihood of DILI. Adjudication is based on retrospective assessment of gathered case data that includes prospective follow-up information. One hundred randomly selected DILIN cases were re-assessed using the same processes for initial assessment but by 3 different reviewers in 92% of cases. Results The median time between assessments was 938 days (range: 140–2352). Thirty-one cases involved >1 agent. Weighted kappa statistics for overall case and individual agent category agreement were 0.60 (95% CI: 0.50–0.71) and 0.60 (0.52–0.68), respectively. Overall case adjudications were within one category of each other 93% of the time, while 5% differed by 2 categories and 2% differed by 3 categories. Fourteen-percent crossed the 50% threshold of likelihood due to competing diagnoses or atypical timing between drug exposure and injury. Conclusions The DILIN expert opinion causality assessment method has moderate inter-observer reliability but very good agreement within 1 category. A small but important proportion of cases could not be reliably diagnosed as ≥ 50% likely to be DILI. PMID:24661785
De Vet, Emely; De Ridder, Denise; Stok, Marijn; Brunso, Karen; Baban, Adriana; Gaspar, Tania
2014-09-02
Applying self-regulation strategies have proven important in eating behaviors, but it remains subject to investigation what strategies adolescents report to use to ensure healthy eating, and adequate measures are lacking. Therefore, we developed and validated a self-regulation questionnaire applied to eating (TESQ-E) for adolescents. Study 1 reports a four-step approach to develop the TESQ-E questionnaire (n = 1097). Study 2 was a cross-sectional survey among adolescents from nine European countries (n = 11,392) that assessed the TESQ-E, eating-related behaviors, dietary intake and background characteristics. In study 3, the TESQ-E was administered twice within four weeks to evaluate test-retest reliability (n = 140). Study 4 was a cross-sectional survey (n = 93) that assessed the TESQ-E and related psychological constructs (e.g., motivation, autonomy, self-control). All participants were aged between 10 and 17 years. Study 1 resulted in a 24-item questionnaire assessing adolescent-reported use of six specific strategies for healthy eating that represent three general self-regulation approaches. Study 2 showed that the easy-to-administer theory-based TESQ-E has a clear factor structure and good subscale reliabilities. The questionnaire was related to eating-related behaviors and dietary intake, indicating predictive validity. Study 3 showed good test-retest reliabilities for the TESQ-E. Study 4 indicated that TESQ-E was related to but also distinguishable from general self-regulation and motivation measures. The TESQ-E provides a reliable and valid measure to assess six theory-based self-regulation strategies that adolescents may use to ensure their healthy eating.
Validity and Reliability of a General Nutrition Knowledge Questionnaire for Japanese Adults.
Matsumoto, Mai; Tanaka, Rie; Ikemoto, Shinji
2017-01-01
Nutrition knowledge is necessary for individuals to adopt appropriate dietary habits, and needs to be evaluated before nutrition education is provided. However, there is no tool to assess general nutrition knowledge of adults in Japan. Our aims were to determine the validity and reliability of a general nutrition knowledge questionnaire for Japanese adults. We developed the pilot version of the Japanese general nutrition knowledge questionnaire (JGNKQ) and administered the pilot study to assess content validity and internal reliability to 1,182 Japanese adults aged 18-64 y. The JGNKQ was further modified based on the pilot study and the final version consisted of 5 sections and 147 items. The JGNKQ was administered to female undergraduate Japanese students in their senior year twice in 2015 to assess construct validity and test-retest reliability. Ninety-six students majoring in nutrition and 44 students in other majors who studied at the same university completed the first questionnaire. Seventy-five students completed the questionnaire twice. The responses from the first questionnaire and both questionnaires were used to assess construct validity and test-retest reliability, respectively. The students in nutrition major had significantly higher scores than the students in other majors on all sections of the questionnaire (p=0.000); therefore, the questionnaire had good construct validity. The test-retest reliability correlation coefficient value of overall and each section except "The use of dietary information to make dietary choices" were 0.75, 0.67, 0.67, 0.68 and 0.61, respectively. We suggest that the JGNKQ is an effective tool to assess the nutrition knowledge level of Japanese adults.
READ, PAUL; OLIVER, JON L.; DE STE CROIX, MARK B.A.; MYER, GREGORY D.; LLOYD, RHODRI S.
2016-01-01
Deficits in neuromuscular control during movement patterns such as landing are suggested pathomechanics that underlie sport-related injury. A common mode of assessment is measurement of landing forces during jumping tasks; however, these measures have been used less frequently in male youth soccer players and reliability data is sparse. The aim of this study was to examine the reliability of a field-based neuromuscular control screening battery using force plate diagnostics in this cohort. Twenty six pre-peak height velocity (PHV) and twenty five post-PHV elite male youth soccer players completed a drop vertical jump (DVJ), single leg 75% horizontal hop and stick (75%HOP) and single leg countermovement jump (SLCMJ). Measures of peak landing vertical ground reaction force (pVGRF), time to stabilisation (TTS), time to pVGRF, and pVGRF asymmetry were recorded. A test, re-test design was used and reliability statistics included: change in mean, intraclass correlation coefficient (ICC) and coefficient of variation (CV). No significant differences in mean score were reported for any of the assessed variables between test sessions. In both groups, pVGRF and asymmetry during the 75%HOP and SLCMJ demonstrated largely acceptable reliability (CV ≤ 10%). Greater variability was evident in DVJ pVGRF and all other assessed variables, across the three protocols (CV range = 13.8 – 49.7%). ICC values ranged from small to large and were generally higher in the post-PHV players. The results of this study suggest that pVGRF and asymmetry can be reliably assessed using a 75%HOP and SLCMJ in this cohort. These measures could be utilized to support a screening battery for elite male youth soccer players and for test re-test comparison. PMID:27075641
Feraco, Angela M; Starmer, Amy J; Sectish, Theodore C; Spector, Nancy D; West, Daniel C; Landrigan, Christopher P
2016-08-01
1) To develop validity evidence for the use of the Verbal Handoff Assessment Tool (VHAT) and examine the reliability of VHAT scores, and 2) to determine whether implementation of a resident handoff bundle (RHB) was associated with improved verbal patient handoffs among pediatric resident physicians. In a pre-post design, prospectively audio recorded verbal patient handoffs conducted at Boston Children's Hospital before and after implementation of the RHB were rated using the VHAT, which was developed for this study (primary outcome). Using generalizability theory, we evaluated the reliability of VHAT scores. Overall, VHAT scores increased after RHB implementation (mean 142 vs 191, possible score 0-500; P < .0001). When accounting for clustering according to resident physician, hospital unit, unit census, and patient complexity, implementation of the RHB was associated with a 63-point increase in VHAT score. Using generalizability theory, we determined that a resident's mean VHAT score on the basis of a handoff of 15 patients assessed by a single observer was sufficiently reliable for relative ranking decisions (ie, norm-based; generalizability coefficient, 0.81), whereas a VHAT score on the basis of a handoff of 21 patients would be sufficiently reliable for high-stakes, standard-based decisions (Phi, 0.80). Verbal handoffs improved after implementation of a RHB, although gains were variable across the 2 clinical units. The VHAT shows promise as an assessment tool for resident handoff skills. If used for competency or entrustment decisions, a resident's mean VHAT score should be on the basis of observation of verbal handoff of ≥21 patients. Copyright © 2016 Academic Pediatric Association. Published by Elsevier Inc. All rights reserved.
Reliability and Validity of Assessing User Satisfaction With Web-Based Health Interventions
Lehr, Dirk; Reis, Dorota; Vis, Christiaan; Riper, Heleen; Berking, Matthias; Ebert, David Daniel
2016-01-01
Background The perspective of users should be taken into account in the evaluation of Web-based health interventions. Assessing the users’ satisfaction with the intervention they receive could enhance the evidence for the intervention effects. Thus, there is a need for valid and reliable measures to assess satisfaction with Web-based health interventions. Objective The objective of this study was to analyze the reliability, factorial structure, and construct validity of the Client Satisfaction Questionnaire adapted to Internet-based interventions (CSQ-I). Methods The psychometric quality of the CSQ-I was analyzed in user samples from 2 separate randomized controlled trials evaluating Web-based health interventions, one from a depression prevention intervention (sample 1, N=174) and the other from a stress management intervention (sample 2, N=111). At first, the underlying measurement model of the CSQ-I was analyzed to determine the internal consistency. The factorial structure of the scale and the measurement invariance across groups were tested by multigroup confirmatory factor analyses. Additionally, the construct validity of the scale was examined by comparing satisfaction scores with the primary clinical outcome. Results Multigroup confirmatory analyses on the scale yielded a one-factorial structure with a good fit (root-mean-square error of approximation =.09, comparative fit index =.96, standardized root-mean-square residual =.05) that showed partial strong invariance across the 2 samples. The scale showed very good reliability, indicated by McDonald omegas of .95 in sample 1 and .93 in sample 2. Significant correlations with change in depressive symptoms (r=−.35, P<.001) and perceived stress (r=−.48, P<.001) demonstrated the construct validity of the scale. Conclusions The proven internal consistency, factorial structure, and construct validity of the CSQ-I indicate a good overall psychometric quality of the measure to assess the user’s general satisfaction with Web-based interventions for depression and stress management. Multigroup analyses indicate its robustness across different samples. Thus, the CSQ-I seems to be a suitable measure to consider the user’s perspective in the overall evaluation of Web-based health interventions. PMID:27582341
Reliability and Validity of Assessing User Satisfaction With Web-Based Health Interventions.
Boß, Leif; Lehr, Dirk; Reis, Dorota; Vis, Christiaan; Riper, Heleen; Berking, Matthias; Ebert, David Daniel
2016-08-31
The perspective of users should be taken into account in the evaluation of Web-based health interventions. Assessing the users' satisfaction with the intervention they receive could enhance the evidence for the intervention effects. Thus, there is a need for valid and reliable measures to assess satisfaction with Web-based health interventions. The objective of this study was to analyze the reliability, factorial structure, and construct validity of the Client Satisfaction Questionnaire adapted to Internet-based interventions (CSQ-I). The psychometric quality of the CSQ-I was analyzed in user samples from 2 separate randomized controlled trials evaluating Web-based health interventions, one from a depression prevention intervention (sample 1, N=174) and the other from a stress management intervention (sample 2, N=111). At first, the underlying measurement model of the CSQ-I was analyzed to determine the internal consistency. The factorial structure of the scale and the measurement invariance across groups were tested by multigroup confirmatory factor analyses. Additionally, the construct validity of the scale was examined by comparing satisfaction scores with the primary clinical outcome. Multigroup confirmatory analyses on the scale yielded a one-factorial structure with a good fit (root-mean-square error of approximation =.09, comparative fit index =.96, standardized root-mean-square residual =.05) that showed partial strong invariance across the 2 samples. The scale showed very good reliability, indicated by McDonald omegas of .95 in sample 1 and .93 in sample 2. Significant correlations with change in depressive symptoms (r=-.35, P<.001) and perceived stress (r=-.48, P<.001) demonstrated the construct validity of the scale. The proven internal consistency, factorial structure, and construct validity of the CSQ-I indicate a good overall psychometric quality of the measure to assess the user's general satisfaction with Web-based interventions for depression and stress management. Multigroup analyses indicate its robustness across different samples. Thus, the CSQ-I seems to be a suitable measure to consider the user's perspective in the overall evaluation of Web-based health interventions.
Guerra, Ricardo Oliveira; Oliveira, Bruna Silva; Alvarado, Beatriz Eugenia; Curcio, Carmen Lucia; Rejeski, W Jack; Marsh, Anthony P; Ip, Edward H; Barnard, Ryan T; Guralnik, Jack M; Zunzunegui, Maria Victoria
2016-01-01
Aim To assess the reliability and the validity of Portuguese- and Spanish-translated versions of the video-based short-form Mobility Assessment Tool in assessing self-reported mobility, and to provide evidence for the applicability of these videos in elderly Latin American populations as a complement to physical performance measures. Methods The sample consisted of 300 elderly participants (150 from Brazil, 150 from Colombia) recruited at neighborhood social centers. Mobility was assessed with the Mobility Assessment Tool, and compared with the Short Physical Performance Battery score and self-reported functional limitations. Reliability was calculated using intraclass correlation coefficients. Multiple linear regression analyses were used to assess associations among mobility assessment tools and health, and sociodemographic variables. Results A significant gradient of increasing Mobility Assessment Tool score with better physical function was observed for both self-reported and objective measures, and in each city. Associations between self-reported mobility and health were strong, and significant. Mobility Assessment Tool scores were lower in women at both sites. Intraclass correlation coefficients of the Mobility Assessment Tool were 0.94 (95% confidence interval 0.90–0.97) in Brazil and 0.81 (95% confidence interval 0.66–0.91) in Colombia. Mobility Assessment Tool scores were lower in Manizales than in Natal after adjustment by Short Physical Performance Battery, self-rated health and sex. Conclusions These results provide evidence for high reliability and good validity of the Mobility Assessment Tool in its Spanish and Portuguese versions used in Latin American populations. In addition, the Mobility Assessment Tool can detect mobility differences related to environmental features that cannot be captured by objective perfor mance measures. PMID:24666718
Guerra, Ricardo Oliveira; Oliveira, Bruna Silva; Alvarado, Beatriz Eugenia; Curcio, Carmen Lucia; Rejeski, W Jack; Marsh, Anthony P; Ip, Edward H; Barnard, Ryan T; Guralnik, Jack M; Zunzunegui, Maria Victoria
2014-10-01
To assess the reliability and the validity of Portuguese- and Spanish-translated versions of the video-based short-form Mobility Assessment Tool in assessing self-reported mobility, and to provide evidence for the applicability of these videos in elderly Latin American populations as a complement to physical performance measures. The sample consisted of 300 elderly participants (150 from Brazil, 150 from Colombia) recruited at neighborhood social centers. Mobility was assessed with the Mobility Assessment Tool, and compared with the Short Physical Performance Battery score and self-reported functional limitations. Reliability was calculated using intraclass correlation coefficients. Multiple linear regression analyses were used to assess associations among mobility assessment tools and health, and sociodemographic variables. A significant gradient of increasing Mobility Assessment Tool score with better physical function was observed for both self-reported and objective measures, and in each city. Associations between self-reported mobility and health were strong, and significant. Mobility Assessment Tool scores were lower in women at both sites. Intraclass correlation coefficients of the Mobility Assessment Tool were 0.94 (95% confidence interval 0.90-0.97) in Brazil and 0.81 (95% confidence interval 0.66-0.91) in Colombia. Mobility Assessment Tool scores were lower in Manizales than in Natal after adjustment by Short Physical Performance Battery, self-rated health and sex. These results provide evidence for high reliability and good validity of the Mobility Assessment Tool in its Spanish and Portuguese versions used in Latin American populations. In addition, the Mobility Assessment Tool can detect mobility differences related to environmental features that cannot be captured by objective performance measures. © 2013 Japan Geriatrics Society.
Development and application of basis database for materials life cycle assessment in china
NASA Astrophysics Data System (ADS)
Li, Xiaoqing; Gong, Xianzheng; Liu, Yu
2017-03-01
As the data intensive method, high quality environmental burden data is an important premise of carrying out materials life cycle assessment (MLCA), and the reliability of data directly influences the reliability of the assessment results and its application performance. Therefore, building Chinese MLCA database is the basic data needs and technical supports for carrying out and improving LCA practice. Firstly, some new progress on database which related to materials life cycle assessment research and development are introduced. Secondly, according to requirement of ISO 14040 series standards, the database framework and main datasets of the materials life cycle assessment are studied. Thirdly, MLCA data platform based on big data is developed. Finally, the future research works were proposed and discussed.
Development and validation of a toddler silhouette scale.
Hager, Erin R; McGill, Adrienne E; Black, Maureen M
2010-02-01
The purpose of this study is to develop and validate a toddler silhouette scale. A seven-point scale was developed by an artist based on photographs of 15 toddlers (6 males, 9 females) varying in race/ethnicity and body size, and a list of phenotypic descriptions of toddlers of varying body sizes. Content validity, age-appropriateness, and gender and race/ethnicity neutrality were assessed among 180 pediatric health professionals and 129 parents of toddlers. Inter- and intrarater reliability and concurrent validity were assessed by having 138 pediatric health professionals match the silhouettes with photographs of toddlers. Assessments of content validity revealed that most health professionals (74.6%) and parents of toddlers (63.6%) ordered all seven silhouettes correctly, and interobserver agreement for weight status classification was high (kappa = 0.710, r = 0.827, P < 0.001). Most respondents reported that the scale represented toddlers aged 12-36 months (89%) and was gender (68.5%) and race/ethnicity (77.3%) neutral. The inter-rater reliability, based on matching silhouettes with photographs, was 0.787 (Cronbach's alpha) and the intrarater reliability was 0.855 (P < 0.001). The concurrent validity, based on the correlation between silhouette choice and the weight-for-length percentile of each toddler's photograph, was 0.633 (P < 0.001). In conclusion, a valid and reliable toddler silhouette scale that may be used for male or female toddlers, aged 12-36 months, of varying race/ethnicity was developed and evaluated. This scale may be used clinically or in research settings to assess parents' perception of and satisfaction with their toddler's body size. Interventions can be targeted toward parents who have inaccurate perceptions of or are dissatisfied with their toddler's body size.
Development and validation of an instrument to assess perceived social influence on health behaviors
HOLT, CHERYL L.; CLARK, EDDIE M.; ROTH, DAVID L.; CROWTHER, MARTHA; KOHLER, CONNIE; FOUAD, MONA; FOUSHEE, RUSTY; LEE, PATRICIA A.; SOUTHWARD, PENNY L.
2012-01-01
Assessment of social influence on health behavior is often approached through a situational context. The current study adapted an existing, theory-based instrument from another content domain to assess Perceived Social Influence on Health Behavior (PSI-HB) among African Americans, using an individual difference approach. The adapted instrument was found to have high internal reliability (α = .81–.84) and acceptable testretest reliability (r = .68–.85). A measurement model revealed a three-factor structure and supported the theoretical underpinnings. Scores were predictive of health behaviors, particularly among women. Future research using the new instrument may have applied value assessing social influence in the context of health interventions. PMID:20522506
Achieving Reliable Communication in Dynamic Emergency Responses
Chipara, Octav; Plymoth, Anders N.; Liu, Fang; Huang, Ricky; Evans, Brian; Johansson, Per; Rao, Ramesh; Griswold, William G.
2011-01-01
Emergency responses require the coordination of first responders to assess the condition of victims, stabilize their condition, and transport them to hospitals based on the severity of their injuries. WIISARD is a system designed to facilitate the collection of medical information and its reliable dissemination during emergency responses. A key challenge in WIISARD is to deliver data with high reliability as first responders move and operate in a dynamic radio environment fraught with frequent network disconnections. The initial WIISARD system employed a client-server architecture and an ad-hoc routing protocol was used to exchange data. The system had low reliability when deployed during emergency drills. In this paper, we identify the underlying causes of unreliability and propose a novel peer-to-peer architecture that in combination with a gossip-based communication protocol achieves high reliability. Empirical studies show that compared to the initial WIISARD system, the redesigned system improves reliability by as much as 37% while reducing the number of transmitted packets by 23%. PMID:22195075
Advanced Reactor Passive System Reliability Demonstration Analysis for an External Event
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bucknor, Matthew D.; Grabaskas, David; Brunett, Acacia J.
2016-01-01
Many advanced reactor designs rely on passive systems to fulfill safety functions during accident sequences. These systems depend heavily on boundary conditions to induce a motive force, meaning the system can fail to operate as intended due to deviations in boundary conditions, rather than as the result of physical failures. Furthermore, passive systems may operate in intermediate or degraded modes. These factors make passive system operation difficult to characterize within a traditional probabilistic framework that only recognizes discrete operating modes and does not allow for the explicit consideration of time-dependent boundary conditions. Argonne National Laboratory has been examining various methodologiesmore » for assessing passive system reliability within a probabilistic risk assessment for a station blackout event at an advanced small modular reactor. This paper provides an overview of a passive system reliability demonstration analysis for an external event. Centering on an earthquake with the possibility of site flooding, the analysis focuses on the behavior of the passive reactor cavity cooling system following potential physical damage and system flooding. The assessment approach seeks to combine mechanistic and simulation-based methods to leverage the benefits of the simulation-based approach without the need to substantially deviate from conventional probabilistic risk assessment techniques. While this study is presented as only an example analysis, the results appear to demonstrate a high level of reliability for the reactor cavity cooling system (and the reactor system in general) to the postulated transient event.« less
QUIROZ, Viviana; REINERO, Daniela; HERNÁNDEZ, Patricia; CONTRERAS, Johanna; VERNAL, Rolando; CARVAJAL, Paola
2017-01-01
Abstract The major infectious diseases in Chile encompass the periodontal diseases, with a combined prevalence that rises up to 90% of the population. Thus, the population-based surveillance of periodontal diseases plays a central role for assessing their prevalence and for planning, implementing, and evaluating preventive and control programs. Self-report questionnaires have been proposed for the surveillance of periodontal diseases in adult populations world-wide. Objective This study aimed to develop and assess the content validity and reliability of a cognitively adapted self-report questionnaire designed for surveillance of gingivitis in adolescents. Material and Methods Ten predetermined self-report questions evaluating early signs and symptoms of gingivitis were preliminary assessed by a panel of clinical experts. Eight questions were selected and cognitively tested in 20 adolescents aged 12 to 18 years from Santiago de Chile. The questionnaire was then conducted and answered by 178 Chilean adolescents. Internal consistency was measured using the Cronbach’s alpha and temporal stability was calculated using the Kappa-index. Results A reliable final self-report questionnaire consisting of 5 questions was obtained, with a total Cronbach’s alpha of 0.73 and a Kappa-index ranging from 0.41 to 0.77 between the different questions. Conclusions The proposed questionnaire is reliable, with an acceptable internal consistency and a temporal stability from moderate to substantial, and it is promising for estimating the prevalence of gingivitis in adolescents. PMID:28877279
Advanced Reactor Passive System Reliability Demonstration Analysis for an External Event
Bucknor, Matthew; Grabaskas, David; Brunett, Acacia J.; ...
2017-01-24
We report that many advanced reactor designs rely on passive systems to fulfill safety functions during accident sequences. These systems depend heavily on boundary conditions to induce a motive force, meaning the system can fail to operate as intended because of deviations in boundary conditions, rather than as the result of physical failures. Furthermore, passive systems may operate in intermediate or degraded modes. These factors make passive system operation difficult to characterize within a traditional probabilistic framework that only recognizes discrete operating modes and does not allow for the explicit consideration of time-dependent boundary conditions. Argonne National Laboratory has beenmore » examining various methodologies for assessing passive system reliability within a probabilistic risk assessment for a station blackout event at an advanced small modular reactor. This paper provides an overview of a passive system reliability demonstration analysis for an external event. Considering an earthquake with the possibility of site flooding, the analysis focuses on the behavior of the passive Reactor Cavity Cooling System following potential physical damage and system flooding. The assessment approach seeks to combine mechanistic and simulation-based methods to leverage the benefits of the simulation-based approach without the need to substantially deviate from conventional probabilistic risk assessment techniques. Lastly, although this study is presented as only an example analysis, the results appear to demonstrate a high level of reliability of the Reactor Cavity Cooling System (and the reactor system in general) for the postulated transient event.« less
Measuring competence in endoscopic sinus surgery.
Syme-Grant, J; White, P S; McAleer, J P G
2008-02-01
Competence based education is currently being introduced into higher surgical training in the UK. Valid and reliable performance assessment tools are essential to ensure competencies are achieved. No such tools have yet been reported in the UK literature. We sought to develop and pilot test an Endoscopic Sinus Surgery Competence Assessment Tool (ESSCAT). The ESSCAT was designed for in-theatre assessment of higher surgical trainees in the UK. The ESSCAT rating matrix was developed through task analysis of ESS procedures. All otolaryngology consultants and specialist registrars in Scotland were given the opportunity to contribute to its refinement. Two cycles of in-theatre testing were used to ensure utility and gather quantitative data on validity and reliability. Videos of trainees performing surgery were used in establishing inter-rater reliability. National consultation, the consensus derived minimum standard of performance, Cronbach's alpha = 0.89 and demonstration of trainee learning (p = 0.027) during the in vivo application of the ESSCAT suggest a high level of validity. Inter-rater reliability was moderate for competence decisions (Cohen's Kappa = 0.5) and good for total scores (Intra-Class Correlation Co-efficient = 0.63). Intra-rater reliability was good for both competence decisions (Kappa = 0.67) and total scores (Kendall's Tau-b = 0.73). The ESSCAT generates a valid and reliable assessment of trainees' in-theatre performance of endoscopic sinus surgery. In conjunction with ongoing evaluation of the instrument we recommend the use of the ESSCAT in higher specialist training in otolaryngology in the UK.
Health Service Quality Scale: Brazilian Portuguese translation, reliability and validity.
Rocha, Luiz Roberto Martins; Veiga, Daniela Francescato; e Oliveira, Paulo Rocha; Song, Elaine Horibe; Ferreira, Lydia Masako
2013-01-17
The Health Service Quality Scale is a multidimensional hierarchical scale that is based on interdisciplinary approach. This instrument was specifically created for measuring health service quality based on marketing and health care concepts. The aim of this study was to translate and culturally adapt the Health Service Quality Scale into Brazilian Portuguese and to assess the validity and reliability of the Brazilian Portuguese version of the instrument. We conducted a cross-sectional, observational study, with public health system patients in a Brazilian university hospital. Validity was assessed using Pearson's correlation coefficient to measure the strength of the association between the Brazilian Portuguese version of the instrument and the SERVQUAL scale. Internal consistency was evaluated using Cronbach's alpha coefficient; the intraclass (ICC) and Pearson's correlation coefficients were used for test-retest reliability. One hundred and sixteen consecutive postoperative patients completed the questionnaire. Pearson's correlation coefficient for validity was 0.20. Cronbach's alpha for the first and second administrations of the final version of the instrument were 0.982 and 0.986, respectively. For test-retest reliability, Pearson's correlation coefficient was 0.89 and ICC was 0.90. The culturally adapted, Brazilian Portuguese version of the Health Service Quality Scale is a valid and reliable instrument to measure health service quality.
Offermans, Nadine S M; Vermeulen, Roel; Burdorf, Alex; Peters, Susan; Goldbohm, R Alexandra; Koeman, Tom; van Tongeren, Martie; Kauppinen, T; Kant, Ijmert; Kromhout, Hans; van den Brandt, Piet A
2012-10-01
Reliable retrospective exposure assessment continues to be a challenge in most population-based studies. Several methodologies exist for estimating exposures retrospectively, of which case-by-case expert assessment and job-exposure matrices (JEMs) are commonly used. This study evaluated the reliability of exposure estimates for selected carcinogens obtained through three JEMs by comparing the estimates with case-by-case expert assessment within the Netherlands Cohort Study (NLCS). The NLCS includes 58,279 men aged 55-69 years at enrolment in 1986. For a subcohort of these men (n=1630), expert assessment is available for exposure to asbestos, polycyclic aromatic hydrocarbons (PAHs) and welding fumes. Reliability of the different JEMs (DOMJEM (asbestos, PAHs), FINJEM (asbestos, PAHs and welding fumes) and Asbestos JEM (asbestos) was determined by assessing the agreement between these JEMs and the expert assessment. Expert assessment revealed the lowest prevalence of exposure for all three exposures (asbestos 9.3%; PAHs 5.3%; welding fumes 11.7%). The DOMJEM showed the highest level of agreement with the expert assessment for asbestos and PAHs (κs=0.29 and 0.42, respectively), closely followed by the FINJEM. For welding fumes, concordance between the expert assessment and FINJEM was high (κ=0.70). The Asbestos JEM showed poor agreement with the expert asbestos assessment (κ=0.10). This study shows case-by-case expert assessment to result in the lowest prevalence of occupational exposure in the NLCS. Furthermore, the DOMJEM and FINJEM proved to be rather similar in agreement when compared with the expert assessment. The Asbestos JEM appeared to be less appropriate for use in the NLCS.
ERIC Educational Resources Information Center
Cook, David A.; Zendejas, Benjamin; Hamstra, Stanley J.; Hatala, Rose; Brydges, Ryan
2014-01-01
Ongoing transformations in health professions education underscore the need for valid and reliable assessment. The current standard for assessment validation requires evidence from five sources: content, response process, internal structure, relations with other variables, and consequences. However, researchers remain uncertain regarding the types…
Toward a Linguistically Realistic Assessment of Language Vitality: The Case of Jejueo
ERIC Educational Resources Information Center
Yang, Changyong; O'Grady, William; Yang, Sejung
2017-01-01
The assessment of language endangerment requires accurate estimates of speaker populations, including information about the proficiency of different groups within those populations. Typically, this information is based on self-assessments, a methodology whose reliability is open to question. We outline an approach that seeks to improve the…
Assessor Training: Its Effects on Criterion-Based Assessment in a Medical Context
ERIC Educational Resources Information Center
Pell, Godfrey; Homer, Matthew S.; Roberts, Trudie E.
2008-01-01
Increasingly, academic institutions are being required to improve the validity of the assessment process; unfortunately, often this is at the expense of reliability. In medical schools (such as Leeds), standardized tests of clinical skills, such as "Objective Structured Clinical Examinations" (OSCEs) are widely used to assess clinical…
Assessment of physical server reliability in multi cloud computing system
NASA Astrophysics Data System (ADS)
Kalyani, B. J. D.; Rao, Kolasani Ramchand H.
2018-04-01
Business organizations nowadays functioning with more than one cloud provider. By spreading cloud deployment across multiple service providers, it creates space for competitive prices that minimize the burden on enterprises spending budget. To assess the software reliability of multi cloud application layered software reliability assessment paradigm is considered with three levels of abstractions application layer, virtualization layer, and server layer. The reliability of each layer is assessed separately and is combined to get the reliability of multi-cloud computing application. In this paper, we focused on how to assess the reliability of server layer with required algorithms and explore the steps in the assessment of server reliability.
McCaffrey, Ruth; Bishop, Mary; Adonis-Rizzo, Marie; Williamson, Ellen; McPherson, Melanie; Cruikshank, Alice; Carrier, Vicki Jo; Sands, Simone; Pigano, Diane; Girard, Patricia; Lauzon, Cathy
2007-01-01
Hospital-acquired deep vein thrombosis (DVT) and pulmonary embolisms (PE) are preventable problems that can increase mortality. Early assessment and recognition of risk as well as initiating appropriate prevention measures can prevent DVT or PE. The purpose of this research project was to develop a DVT risk assessment tool and test the tool for validity and reliability. Three phases were undertaken in developing and testing the JFK Medical Center DVT risk assessment tool. Investigation and clarification of risk and predisposing factors for DVT were identified from the literature, expert nursing knowledge, and medical staff input. Second, item development and weighting were undertaken. Third, parametric testing for content validity measured the differences in mean assessment tool scores between a group of patients who developed DVT in the hospital and a demographically similar group who did not develop DVT. Interrater reliability was measured by having three different nurses score each patient and compare the differences in scores among the three. The DVT group had significantly higher scores on the JFK DVT assessment scale than did those who did not experience DVT. Interrater reliability showed a strong correlation among the scores of the three nurses (.98). Providing a valid and reliable tool for measuring the risk for DVT or PE in hospitalized patients will enable nurses to intervene early in patients at risk. Basing DVT risk assessment on the evidence provided in this study will assist nurses in becoming more confident in recognizing the necessity for interventions in hospitalized patients and decreasing risk. Nurses can now evaluate patients at risk for DVT or PE using the JFK Medial Center's risk assessment tool.
The reliability of vertical jump tests between the Vertec and My Jump phone application.
Yingling, Vanessa R; Castro, Dimitri A; Duong, Justin T; Malpartida, Fiorella J; Usher, Justin R; O, Jenny
2018-01-01
The vertical jump is used to estimate sports performance capabilities and physical fitness in children, elderly, non-athletic and injured individuals. Different jump techniques and measurement tools are available to assess vertical jump height and peak power; however, their use is limited by access to laboratory settings, excessive cost and/or time constraints thus making these tools oftentimes unsuitable for field assessment. A popular field test uses the Vertec and the Sargent vertical jump with countermovement; however, new low cost, easy to use tools are becoming available, including the My Jump iOS mobile application (app). The purpose of this study was to assess the reliability of the My Jump relative to values obtained by the Vertec for the Sargent stand and reach vertical jump (VJ) test. One hundred and thirty-five healthy participants aged 18-39 years (94 males, 41 females) completed three maximal Sargent VJ with countermovement that were simultaneously measured using the Vertec and the My Jump . Jump heights were quantified for each jump and peak power was calculated using the Sayers equation. Four separate ICC estimates and their 95% confidence intervals were used to assess reliability. Two analyses (with jump height and calculated peak power as the dependent variables, respectively) were based on a single rater, consistency, two-way mixed-effects model, while two others (with jump height and calculated peak power as the dependent variables, respectively) were based on a single rater, absolute agreement, two-way mixed-effects model. Moderate to excellent reliability relative to the degree of consistency between the Vertec and My Jump values was found for jump height (ICC = 0.813; 95% CI [0.747-0.863]) and calculated peak power (ICC = 0.926; 95% CI [0.897-0.947]). However, poor to good reliability relative to absolute agreement for VJ height (ICC = 0.665; 95% CI [0.050-0.859]) and poor to excellent reliability relative to absolute agreement for peak power (ICC = 0.851; 95% CI [0.272-0.946]) between the Vertec and My Jump values were found; Vertec VJ height, and thus, Vertec calculated peak power values, were significantly higher than those calculated from My Jump values ( p < 0.0001). The My Jump app may provide a reliable measure of vertical jump height and calculated peak power in multiple field and laboratory settings without the need of costly equipment such as force plates or Vertec. The reliability relative to degree of consistency between the Vertec and My Jump app was moderate to excellent. However, the reliability relative to absolute agreement between Vertec and My Jump values contained significant variation (based on CI values), thus, it is recommended that either the My Jump or the Vertec be used to assess VJ height in repeated measures within subjects' designs; these measurement tools should not be considered interchangeable within subjects or in group measurement designs.
The reliability of vertical jump tests between the Vertec and My Jump phone application
Castro, Dimitri A.; Duong, Justin T.; Malpartida, Fiorella J.; Usher, Justin R.; O, Jenny
2018-01-01
Background The vertical jump is used to estimate sports performance capabilities and physical fitness in children, elderly, non-athletic and injured individuals. Different jump techniques and measurement tools are available to assess vertical jump height and peak power; however, their use is limited by access to laboratory settings, excessive cost and/or time constraints thus making these tools oftentimes unsuitable for field assessment. A popular field test uses the Vertec and the Sargent vertical jump with countermovement; however, new low cost, easy to use tools are becoming available, including the My Jump iOS mobile application (app). The purpose of this study was to assess the reliability of the My Jump relative to values obtained by the Vertec for the Sargent stand and reach vertical jump (VJ) test. Methods One hundred and thirty-five healthy participants aged 18–39 years (94 males, 41 females) completed three maximal Sargent VJ with countermovement that were simultaneously measured using the Vertec and the My Jump. Jump heights were quantified for each jump and peak power was calculated using the Sayers equation. Four separate ICC estimates and their 95% confidence intervals were used to assess reliability. Two analyses (with jump height and calculated peak power as the dependent variables, respectively) were based on a single rater, consistency, two-way mixed-effects model, while two others (with jump height and calculated peak power as the dependent variables, respectively) were based on a single rater, absolute agreement, two-way mixed-effects model. Results Moderate to excellent reliability relative to the degree of consistency between the Vertec and My Jump values was found for jump height (ICC = 0.813; 95% CI [0.747–0.863]) and calculated peak power (ICC = 0.926; 95% CI [0.897–0.947]). However, poor to good reliability relative to absolute agreement for VJ height (ICC = 0.665; 95% CI [0.050–0.859]) and poor to excellent reliability relative to absolute agreement for peak power (ICC = 0.851; 95% CI [0.272–0.946]) between the Vertec and My Jump values were found; Vertec VJ height, and thus, Vertec calculated peak power values, were significantly higher than those calculated from My Jump values (p < 0.0001). Discussion The My Jump app may provide a reliable measure of vertical jump height and calculated peak power in multiple field and laboratory settings without the need of costly equipment such as force plates or Vertec. The reliability relative to degree of consistency between the Vertec and My Jump app was moderate to excellent. However, the reliability relative to absolute agreement between Vertec and My Jump values contained significant variation (based on CI values), thus, it is recommended that either the My Jump or the Vertec be used to assess VJ height in repeated measures within subjects’ designs; these measurement tools should not be considered interchangeable within subjects or in group measurement designs. PMID:29692955
Survey of critical failure events in on-chip interconnect by fault tree analysis
NASA Astrophysics Data System (ADS)
Yokogawa, Shinji; Kunii, Kyousuke
2018-07-01
In this paper, a framework based on reliability physics is proposed for adopting fault tree analysis (FTA) to the on-chip interconnect system of a semiconductor. By integrating expert knowledge and experience regarding the possibilities of failure on basic events, critical issues of on-chip interconnect reliability will be evaluated by FTA. In particular, FTA is used to identify the minimal cut sets with high risk priority. Critical events affecting the on-chip interconnect reliability are identified and discussed from the viewpoint of long-term reliability assessment. The moisture impact is evaluated as an external event.
Rosales, Roberto S; Martin-Hidalgo, Yolanda; Reboso-Morales, Luis; Atroshi, Isam
2016-03-03
The purpose of this study was to assess the reliability and construct validity of the Spanish version of the 6-item carpal tunnel syndrome (CTS) symptoms scale (CTS-6). In this cross-sectional study 40 patients diagnosed with CTS based on clinical and neurophysiologic criteria, completed the standard Spanish versions of the CTS-6 and the disabilities of the arm, shoulder and hand (QuickDASH) scales on two occasions with a 1-week interval. Internal-consistency reliability was assessed with the Cronbach alpha coefficient and test-retest reliability with the intraclass correlation coefficient, two way random effect model and absolute agreement definition (ICC2,1). Cross-sectional precision was analyzed with the Standard Error of the Measurement (SEM). Longitudinal precision for test-retest reliability coefficient was assessed with the Standard Error of the Measurement difference (SEMdiff) and the Minimal Detectable Change at 95 % confidence level (MDC95). For assessing construct validity it was hypothesized that the CTS-6 would have a strong positive correlation with the QuickDASH, analyzed with the Pearson correlation coefficient (r). The standard Spanish version of the CTS-6 presented a Cronbach alpha of 0.81 with a SEM of 0.3. Test-retest reliability showed an ICC of 0.85 with a SRMdiff of 0.36 and a MDC95 of 0.7. The correlation between CTS-6 and the QuickDASH was concordant with the a priori formulated construct hypothesis (r 0.69) CONCLUSIONS: The standard Spanish version of the 6-item CTS symptoms scale showed good internal consistency, test-retest reliability and construct validity for outcomes assessment in CTS. The CTS-6 will be useful to clinicians and researchers in Spanish speaking parts of the world. The use of standardized outcome measures across countries also will facilitate comparison of research results in carpal tunnel syndrome.
Marking Essays on Screen: An Investigation into the Reliability of Marking Extended Subjective Texts
ERIC Educational Resources Information Center
Johnson, Martin; Nadas, Rita; Bell, John F.
2010-01-01
There is a growing body of research literature that considers how the mode of assessment, either computer-based or paper-based, might affect candidates' performances. Despite this, there is a fairly narrow literature that shifts the focus of attention to those making assessment judgements and which considers issues of assessor consistency when…
NASA Astrophysics Data System (ADS)
Liu, Haixing; Savić, Dragan; Kapelan, Zoran; Zhao, Ming; Yuan, Yixing; Zhao, Hongbin
2014-07-01
Flow entropy is a measure of uniformity of pipe flows in water distribution systems. By maximizing flow entropy one can identify reliable layouts or connectivity in networks. In order to overcome the disadvantage of the common definition of flow entropy that does not consider the impact of pipe diameter on reliability, an extended definition of flow entropy, termed as diameter-sensitive flow entropy, is proposed. This new methodology is then assessed by using other reliability methods, including Monte Carlo Simulation, a pipe failure probability model, and a surrogate measure (resilience index) integrated with water demand and pipe failure uncertainty. The reliability assessment is based on a sample of WDS designs derived from an optimization process for each of the two benchmark networks. Correlation analysis is used to evaluate quantitatively the relationship between entropy and reliability. To ensure reliability, a comparative analysis between the flow entropy and the new method is conducted. The results demonstrate that the diameter-sensitive flow entropy shows consistently much stronger correlation with the three reliability measures than simple flow entropy. Therefore, the new flow entropy method can be taken as a better surrogate measure for reliability and could be potentially integrated into the optimal design problem of WDSs. Sensitivity analysis results show that the velocity parameters used in the new flow entropy has no significant impact on the relationship between diameter-sensitive flow entropy and reliability.
ERIC Educational Resources Information Center
Inozu, Bahadir; Ayyub, Bilal A.
1999-01-01
Examines the current status of existing curricula, accreditation requirements, and new developments in Naval Architecture and Marine Engineering education in the United States. Discusses the emerging needs of the maritime industry in light of advances in information technology and movement toward risk-based, reliability-centered rule making in the…
An Evaluation of the Reliability of the Food Label Literacy Questionnaire in Russian
ERIC Educational Resources Information Center
Gurevich, Konstantin G.; Reynolds, Jesse; Bifulco, Lauren; Doughty, Kimberly; Njike, Valentine; Katz, David L.
2016-01-01
Objective: School-based nutrition education can promote the development of skills, such as food label reading, that can contribute to making healthier food choices. The purpose of this study was to assess the reliability of a Russian language version of the previously validated Food Label Literacy for Applied Nutrition Knowledge (FLLANK)…
ERIC Educational Resources Information Center
Byars, Alvin Gregg
The objectives of this investigation are to develop, describe, assess, and demonstrate procedures for constructing mastery tests to minimize errors of classification and to maximize decision reliability. The guidelines are based on conditions where item exchangeability is a reasonable assumption and the test constructor can control the number of…
Environmental education curriculum evaluation questionnaire: A reliability and validity study
NASA Astrophysics Data System (ADS)
Minner, Daphne Diane
The intention of this research project was to bridge the gap between social science research and application to the environmental domain through the development of a theoretically derived instrument designed to give educators a template by which to evaluate environmental education curricula. The theoretical base for instrument development was provided by several developmental theories such as Piaget's theory of cognitive development, Developmental Systems Theory, Life-span Perspective, as well as curriculum research within the area of environmental education. This theoretical base fueled the generation of a list of components which were then translated into a questionnaire with specific questions relevant to the environmental education domain. The specific research question for this project is: Can a valid assessment instrument based largely on human development and education theory be developed that reliably discriminates high, moderate, and low quality in environmental education curricula? The types of analyses conducted to answer this question were interrater reliability (percent agreement, Cohen's Kappa coefficient, Pearson's Product-Moment correlation coefficient), test-retest reliability (percent agreement, correlation), and criterion-related validity (correlation). Face validity and content validity were also assessed through thorough reviews. Overall results indicate that 29% of the questions on the questionnaire demonstrated a high level of interrater reliability and 43% of the questions demonstrated a moderate level of interrater reliability. Seventy-one percent of the questions demonstrated a high test-retest reliability and 5% a moderate level. Fifty-five percent of the questions on the questionnaire were reliable (high or moderate) both across time and raters. Only eight questions (8%) did not show either interrater or test-retest reliability. The global overall rating of high, medium, or low quality was reliable across both coders and time, indicating that the questionnaire can discriminate differences in quality of environmental education curricula. Of the 35 curricula evaluated, 6 were high quality, 14 were medium quality and 15 were low quality. The criterion-related validity of the instrument is at current time unable to be established due to the lack of comparable measures or a concretely usable set of multidisciplinary standards. Face and content validity were sufficiently demonstrated.
Saloheimo, T; González, S A; Erkkola, M; Milauskas, D M; Meisel, J D; Champagne, C M; Tudor-Locke, C; Sarmiento, O; Katzmarzyk, P T; Fogelholm, M
2015-01-01
Objective: The main aim of this study was to assess the reliability and validity of a food frequency questionnaire with 23 food groups (I-FFQ) among a sample of 9–11-year-old children from three different countries that differ on economical development and income distribution, and to assess differences between country sites. Furthermore, we assessed factors associated with I-FFQ's performance. Methods: This was an ancillary study of the International Study of Childhood Obesity, Lifestyle and the Environment. Reliability (n=321) and validity (n=282) components of this study had the same participants. Participation rates were 95% and 70%, respectively. Participants completed two I-FFQs with a mean interval of 4.9 weeks to assess reliability. A 3-day pre-coded food diary (PFD) was used as the reference method in the validity analyses. Wilcoxon signed-rank tests, intraclass correlation coefficients and cross-classifications were used to assess the reliability of I-FFQ. Spearman correlation coefficients, percentage difference and cross-classifications were used to assess the validity of I-FFQ. A logistic regression model was used to assess the relation of selected variables with the estimate of validity. Analyses based on information in the PFDs were performed to assess how participants interpreted food groups. Results: Reliability correlation coefficients ranged from 0.37 to 0.78 and gross misclassification for all food groups was <5%. Validity correlation coefficients were below 0.5 for 22/23 food groups, and they differed among country sites. For validity, gross misclassification was <5% for 22/23 food groups. Over- or underestimation did not appear for 19/23 food groups. Logistic regression showed that country of participation and parental education were associated (P⩽0.05) with the validity of I-FFQ. Analyses of children's interpretation of food groups suggested that the meaning of most food groups was understood by the children. Conclusion: I-FFQ is a moderately reliable method and its validity ranged from low to moderate, depending on food group and country site. PMID:27152180
DOE Office of Scientific and Technical Information (OSTI.GOV)
Paret, Paul
The National Renewable Energy Laboratory (NREL) will conduct thermal and reliability modeling on three sets of power modules for the development of a next generation inverter for electric traction drive vehicles. These modules will be chosen by General Motors (GM) to represent three distinct technological approaches to inverter power module packaging. Likely failure mechanisms will be identified in each package and a physics-of-failure-based reliability assessment will be conducted.
Reliability analysis of a sensitive and independent stabilometry parameter set
Nagymáté, Gergely; Orlovits, Zsanett
2018-01-01
Recent studies have suggested reduced independent and sensitive parameter sets for stabilometry measurements based on correlation and variance analyses. However, the reliability of these recommended parameter sets has not been studied in the literature or not in every stance type used in stabilometry assessments, for example, single leg stances. The goal of this study is to evaluate the test-retest reliability of different time-based and frequency-based parameters that are calculated from the center of pressure (CoP) during bipedal and single leg stance for 30- and 60-second measurement intervals. Thirty healthy subjects performed repeated standing trials in a bipedal stance with eyes open and eyes closed conditions and in a single leg stance with eyes open for 60 seconds. A force distribution measuring plate was used to record the CoP. The reliability of the CoP parameters was characterized by using the intraclass correlation coefficient (ICC), standard error of measurement (SEM), minimal detectable change (MDC), coefficient of variation (CV) and CV compliance rate (CVCR). Based on the ICC, SEM and MDC results, many parameters yielded fair to good reliability values, while the CoP path length yielded the highest reliability (smallest ICC > 0.67 (0.54–0.79), largest SEM% = 19.2%). Usually, frequency type parameters and extreme value parameters yielded poor reliability values. There were differences in the reliability of the maximum CoP velocity (better with 30 seconds) and mean power frequency (better with 60 seconds) parameters between the different sampling intervals. PMID:29664938
Reliability analysis of a sensitive and independent stabilometry parameter set.
Nagymáté, Gergely; Orlovits, Zsanett; Kiss, Rita M
2018-01-01
Recent studies have suggested reduced independent and sensitive parameter sets for stabilometry measurements based on correlation and variance analyses. However, the reliability of these recommended parameter sets has not been studied in the literature or not in every stance type used in stabilometry assessments, for example, single leg stances. The goal of this study is to evaluate the test-retest reliability of different time-based and frequency-based parameters that are calculated from the center of pressure (CoP) during bipedal and single leg stance for 30- and 60-second measurement intervals. Thirty healthy subjects performed repeated standing trials in a bipedal stance with eyes open and eyes closed conditions and in a single leg stance with eyes open for 60 seconds. A force distribution measuring plate was used to record the CoP. The reliability of the CoP parameters was characterized by using the intraclass correlation coefficient (ICC), standard error of measurement (SEM), minimal detectable change (MDC), coefficient of variation (CV) and CV compliance rate (CVCR). Based on the ICC, SEM and MDC results, many parameters yielded fair to good reliability values, while the CoP path length yielded the highest reliability (smallest ICC > 0.67 (0.54-0.79), largest SEM% = 19.2%). Usually, frequency type parameters and extreme value parameters yielded poor reliability values. There were differences in the reliability of the maximum CoP velocity (better with 30 seconds) and mean power frequency (better with 60 seconds) parameters between the different sampling intervals.
The Utrecht questionnaire (U-CEP) measuring knowledge on clinical epidemiology proved to be valid.
Kortekaas, Marlous F; Bartelink, Marie-Louise E L; de Groot, Esther; Korving, Helen; de Wit, Niek J; Grobbee, Diederick E; Hoes, Arno W
2017-02-01
Knowledge on clinical epidemiology is crucial to practice evidence-based medicine. We describe the development and validation of the Utrecht questionnaire on knowledge on Clinical epidemiology for Evidence-based Practice (U-CEP); an assessment tool to be used in the training of clinicians. The U-CEP was developed in two formats: two sets of 25 questions and a combined set of 50. The validation was performed among postgraduate general practice (GP) trainees, hospital trainees, GP supervisors, and experts. Internal consistency, internal reliability (item-total correlation), item discrimination index, item difficulty, content validity, construct validity, responsiveness, test-retest reliability, and feasibility were assessed. The questionnaire was externally validated. Internal consistency was good with a Cronbach alpha of 0.8. The median item-total correlation and mean item discrimination index were satisfactory. Both sets were perceived as relevant to clinical practice. Construct validity was good. Both sets were responsive but failed on test-retest reliability. One set took 24 minutes and the other 33 minutes to complete, on average. External GP trainees had comparable results. The U-CEP is a valid questionnaire to assess knowledge on clinical epidemiology, which is a prerequisite for practicing evidence-based medicine in daily clinical practice. Copyright © 2016 Elsevier Inc. All rights reserved.
Elfering, Achim; Cronenberg, Sonja; Grebner, Simone; Tamcan, Oezguer; Müller, Urs
2017-12-01
A newly developed questionnaire assessing limitations in activity of daily living (LADL-Q) that should improve assessment of LADL is tested in a large population-based validation study. This survey was paper-based. Overall, 16,634 individuals who were representative of the working population in the German-speaking part of Switzerland participated in the study. Item analysis was used the final version of the LADL-Q to four items per subscale that correspond to potential problems in three body regions (back and neck, upper extremities, lower extremities). Analysis included tests for reliability, internal consistency, dimensionality and convergent validity. Test-retest reliability coefficients after 2 weeks ranged from 0.82 to 0.99 (Mdn = 0.87), with no item having a coefficient below 0.60. The median item-total coefficients ranged between moderate and good. Correlation coefficients between LADL-Q subscales and three validated clinical instruments (Western Ontario and McMaster Universities osteoarthritis index, shoulder pain disability index, Oswestry) ranged from 0.63 to 0.81. In structural equation modeling the three subscales were significantly related with two important outcomes in occupational rehabilitation: self-reported general health and daily task performance. The new LADL-Q is a brief, reliable and valid tool for assessment of LADL in studies on musculoskeletal health.
Applying Resource Utilization Groups (RUG-III) in Hong Kong nursing homes.
Chou, Kee-Lee; Chi, Iris; Leung, Joe C B
2008-01-01
Resource Utilization Groups III (RUG-III) is a case-mix system developed in the United States for categorization of nursing home residents and the financing of residential care services. In Hong Kong, RUG-III is based on several board groups of residents. The aim of this study was to examine the reliability and validity of the RUG-III in Hong Kong nursing homes. A cross-sectional survey was conducted in seven residential facilities operated by one agency. Residents ( N = 1,127) were assessed by the Minimum Data Set (MDS) and nursing as well as auxiliary staff care times were recorded within 2 weeks before or after the completion of MDS assessment. Forty-five out 1,127 residents were re-interviewed by an independent assessor to assess the inter-rater reliability. The inter-rater reliability of MDS assessment was excellent (kappa = 0.76) and the original RUG-III accounted for about 30 per cent of nursing staff time. Results provide preliminary evidence to support that RUG-III is a reliable and valid case-mix system for Hong Kong nursing homes, but future studies must be explored to reduce the variance of resource use explained by this case-mix system.
Beard, J D; Marriott, J; Purdie, H; Crossley, J
2011-01-01
To compare user satisfaction and acceptability, reliability and validity of three different methods of assessing the surgical skills of trainees by direct observation in the operating theatre across a range of different surgical specialties and index procedures. A 2-year prospective, observational study in the operating theatres of three teaching hospitals in Sheffield. The assessment methods were procedure-based assessment (PBA), Objective Structured Assessment of Technical Skills (OSATS) and Non-technical Skills for Surgeons (NOTSS). The specialties were obstetrics and gynaecology (O&G) and upper gastrointestinal, colorectal, cardiac, vascular and orthopaedic surgery. Two to four typical index procedures were selected from each specialty. Surgical trainees were directly observed performing typical index procedures and assessed using a combination of two of the three methods (OSATS or PBA and NOTSS for O&G, PBA and NOTSS for the other specialties) by the consultant clinical supervisor for the case and the anaesthetist and/or scrub nurse, as well as one or more independent assessors from the research team. Information on user satisfaction and acceptability of each assessment method from both assessor and trainee perspectives was obtained from structured questionnaires. The reliability of each method was measured using generalisability theory. Aspects of validity included the internal structure of each tool and correlation between tools, construct validity, predictive validity, interprocedural differences, the effect of assessor designation and the effect of assessment on performance. Of the 558 patients who were consented, a total of 437 (78%) cases were included in the study: 51 consultant clinical supervisors, 56 anaesthetists, 39 nurses, 2 surgical care practitioners and 4 independent assessors provided 1635 assessments on 85 trainees undertaking the 437 cases. A total of 749 PBAs, 695 NOTSS and 191 OSATSs were performed. Non-O&G clinical supervisors and trainees provided mixed, but predominantly positive, responses about a range of applications of PBA. Most felt that PBA was important in surgical education, and would use it again in the future and did not feel that it added time to the operating list. The overall satisfaction of O&G clinical supervisors and trainees with OSATS was not as high, and a majority of those who used both preferred PBA. A majority of anaesthetists and nurses felt that NOTSS allowed them to rate interpersonal skills (communication, teamwork and leadership) more easily than cognitive skills (situation awareness and decision-making), that it had formative value and that it was a valuable adjunct to the assessment of technical skills. PBA demonstrated high reliability (G > 0.8 for only three assessor judgements on the same index procedure). OSATS had lower reliability (G > 0.8 for five assessor judgements on the same index procedure). Both were less reliable on a mix of procedures because of strong procedure-specific factors. A direct comparison of PBA between O&G and non-O&G cases showed a striking difference in reliability. Within O&G, a good level of reliability (G > 0.8) could not be obtained using a feasible number of assessments. Conversely, the reliability within non-O&G cases was exceptionally high, with only two assessor judgements being required. The reasons for this difference probably include the more summative purpose of assessment in O&G and the much higher proportion of O&G trainees in this study with training concerns (42% vs 4%). The reliability of NOTSS was lower than that for PBA. Reliability for the same procedure (G > 0.8) required six assessor judgements. However, as procedure-specific factors exerted a lesser influence on NOTSS, reliability on a mix of procedures could be achieved using only eight assessor judgements. NOTSS also demonstrated a valid internal structure. The strongest correlations between NOTSS and PBA or OSATS were in the 'decision-making' domain. PBA and NOTSS showed better construct validity than OSATS, the year of training and the number of recent index procedures performed being significant independent predictors of performance. There was little variation in scoring between different procedures or different designations of assessor. The results suggest that PBA is a reliable and acceptable method of assessing surgical skills, with good construct validity. Specialties that use OSATS may wish to consider changing the design or switching to PBA. Whatever workplace-based assessment method is used, the purpose, timing and frequency of assessment require detailed guidance. NOTSS is a promising tool for the assessment of non-technical skills, and surgical specialties may wish to consider its inclusion in their assessment framework. Further research is required into the use of health-care professionals other than consultant surgeons to assess trainees, the relationship between performance and experience, the educational impact of assessment and the additional value of video recording.
Assessing adherence to the evidence base in the management of poststroke dysphagia.
Burton, Christopher; Pennington, Lindsay; Roddam, Hazel; Russell, Ian; Russell, Daphne; Krawczyk, Karen; Smith, Hilary A
2006-01-01
To evaluate the reliability and responsiveness to change of an audit tool to assess adherence to evidence of effectiveness in the speech and language therapy (SLT) management of poststroke dysphagia. The tool was used to review SLT practice as part of a randomized study of different education strategies. Medical records were audited before and after delivery of the trial intervention. Seventeen SLT departments in the north-west of England participated in the study. The assessment tool was used to assess the medical records of 753 patients before and 717 patients after delivery of the trial intervention across the 17 departments. A target of 10 records per department per month was sought, using systematic sampling with a random start. Inter- and intra-rater reliability were explored, together with the tool's internal consistency and responsiveness to change. The assessment tool had high face validity, although internal consistency was low (ra = 0.37). Composite scores on the tool were however responsive to differences between SLT departments. Both inter- and intra-rater reliability ranged from 'substantial' to 'near perfect' across all items. The audit tool has high face validity and measurement reliability. The use of a composite adherence score should, however, proceed with caution as internal consistency is low.
Burt, Jenni; Abel, Gary; Elmore, Natasha; Campbell, John; Roland, Martin; Benson, John; Silverman, Jonathan
2014-01-01
Objectives To investigate initial reliability of the Global Consultation Rating Scale (GCRS: an instrument to assess the effectiveness of communication across an entire doctor–patient consultation, based on the Calgary-Cambridge guide to the medical interview), in simulated patient consultations. Design Multiple ratings of simulated general practitioner (GP)–patient consultations by trained GP evaluators. Setting UK primary care. Participants 21 GPs and six trained GP evaluators. Outcome measures GCRS score. Methods 6 GP raters used GCRS to rate randomly assigned video recordings of GP consultations with simulated patients. Each of the 42 consultations was rated separately by four raters. We considered whether a fixed difference between scores had the same meaning at all levels of performance. We then examined the reliability of GCRS using mixed linear regression models. We augmented our regression model to also examine whether there were systematic biases between the scores given by different raters and to look for possible order effects. Results Assessing the communication quality of individual consultations, GCRS achieved a reliability of 0.73 (95% CI 0.44 to 0.79) for two raters, 0.80 (0.54 to 0.85) for three and 0.85 (0.61 to 0.88) for four. We found an average difference of 1.65 (on a 0–10 scale) in the scores given by the least and most generous raters: adjusting for this evaluator bias increased reliability to 0.78 (0.53 to 0.83) for two raters; 0.85 (0.63 to 0.88) for three and 0.88 (0.69 to 0.91) for four. There were considerable order effects, with later consultations (after 15–20 ratings) receiving, on average, scores more than one point higher on a 0–10 scale. Conclusions GCRS shows good reliability with three raters assessing each consultation. We are currently developing the scale further by assessing a large sample of real-world consultations. PMID:24604483
Decision-theoretic methodology for reliability and risk allocation in nuclear power plants
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cho, N.Z.; Papazoglou, I.A.; Bari, R.A.
1985-01-01
This paper describes a methodology for allocating reliability and risk to various reactor systems, subsystems, components, operations, and structures in a consistent manner, based on a set of global safety criteria which are not rigid. The problem is formulated as a multiattribute decision analysis paradigm; the multiobjective optimization, which is performed on a PRA model and reliability cost functions, serves as the guiding principle for reliability and risk allocation. The concept of noninferiority is used in the multiobjective optimization problem. Finding the noninferior solution set is the main theme of the current approach. The assessment of the decision maker's preferencesmore » could then be performed more easily on the noninferior solution set. Some results of the methodology applications to a nontrivial risk model are provided and several outstanding issues such as generic allocation and preference assessment are discussed.« less
Spector, Aimee; Hebditch, Molly; Stoner, Charlotte R; Gibbor, Luke
2016-09-01
The ability to identify biological, social, and psychological issues for people with dementia is an important skill for healthcare professionals. Therefore, valid and reliable measures are needed to assess this ability. This study involves the development of a vignette style measure to capture the extent to which health professionals use "Biopsychosocial" thinking in dementia care (VIG-Dem), based on the framework of the model developed by Spector and Orrell (2010). The development process consisted of Phase 1: Developing and refining the vignettes; Phase 2: Field testing (N = 9), and Phase 3: A pilot study to assess reliability and validity (N = 131). The VIG-Dem, consisting of two vignettes with open-ended questions and a standardized scoring scheme, was developed. Evidence for the good inter-rater reliability, convergent validity, and test-retest reliability were established. The VIG-Dem has good psychometric properties and may provide a useful tool in dementia care research and practice.
ERIC Educational Resources Information Center
Cross, Vinette; Hicks, Carolyn; Barwell, Fred
2001-01-01
Using videos of physiotherapy students, compared two assessment forms for validity and reliability (the first currently used by an academic program and the second developed from practitioners' perceptions of competence). Also investigated effects of training on assessment decisions. Found wide differences in individual ability to assess students…
Assessing the Clinical Skills of Dental Students: A Review of the Literature
ERIC Educational Resources Information Center
Taylor, Carly L.; Grey, Nick; Satterthwaite, Julian D.
2013-01-01
Education, from a student perspective, is largely driven by assessment. An effective assessment tool should be both valid and reliable, yet this is often not achieved. The aim of this literature review is to identify and appraise the evidence base for assessment tools used primarily in evaluating clinical skills of dental students. Methods:…
A CONSISTENT APPROACH FOR THE APPLICATION OF PHARMACOKINETIC MODELING IN CANCER RISK ASSESSMENT
Physiologically based pharmacokinetic (PBPK) modeling provides important capabilities for improving the reliability of the extrapolations across dose, species, and exposure route that are generally required in chemical risk assessment regardless of the toxic endpoint being consid...
Reliability and validity of the symptoms of major depressive illness.
Mazure, C; Nelson, J C; Price, L H
1986-05-01
In two consecutive studies, we examined the interrater reliability and then the concurrent validity of interview ratings for individual symptoms of major depressive illness. The concurrent validity of symptoms was determined by assessing the degree to which symptoms observed or reported during an interview were observed in daily behavior. Results indicated that most signs and symptoms of major depression and melancholia can be reliably rated by clinicians during a semistructured interview. Ratings of observable symptoms (signs) assessed during the interview were valid indicators of dysfunction observed in daily behavior. Several but not all ratings based on patient report of symptoms were at variance with observation. These discordant patient-reported symptoms may have value as subjective reports but were not accurate descriptions of observed dysfunction.
New methods for analyzing semantic graph based assessments in science education
NASA Astrophysics Data System (ADS)
Vikaros, Lance Steven
This research investigated how the scoring of semantic graphs (known by many as concept maps) could be improved and automated in order to address issues of inter-rater reliability and scalability. As part of the NSF funded SENSE-IT project to introduce secondary school science students to sensor networks (NSF Grant No. 0833440), semantic graphs illustrating how temperature change affects water ecology were collected from 221 students across 16 schools. The graphing task did not constrain students' use of terms, as is often done with semantic graph based assessment due to coding and scoring concerns. The graphing software used provided real-time feedback to help students learn how to construct graphs, stay on topic and effectively communicate ideas. The collected graphs were scored by human raters using assessment methods expected to boost reliability, which included adaptations of traditional holistic and propositional scoring methods, use of expert raters, topical rubrics, and criterion graphs. High levels of inter-rater reliability were achieved, demonstrating that vocabulary constraints may not be necessary after all. To investigate a new approach to automating the scoring of graphs, thirty-two different graph features characterizing graphs' structure, semantics, configuration and process of construction were then used to predict human raters' scoring of graphs in order to identify feature patterns correlated to raters' evaluations of graphs' topical accuracy and complexity. Results led to the development of a regression model able to predict raters' scoring with 77% accuracy, with 46% accuracy expected when used to score new sets of graphs, as estimated via cross-validation tests. Although such performance is comparable to other graph and essay based scoring systems, cross-context testing of the model and methods used to develop it would be needed before it could be recommended for widespread use. Still, the findings suggest techniques for improving the reliability and scalability of semantic graph based assessments without requiring constraint of how ideas are expressed.
Robinson, Lauren M; Skiver Thompson, Rebekah; Ha, James C
2016-01-01
Puppy assessments for companion dogs have shown mixed long-term reliability. Temperament is cited among the reasons for surrendering dogs to shelters. A puppy temperament test that reliably predicts adult behavior is one potential way to lower the number of dogs given to shelters. This study used a longitudinal design to assess temperament in puppies from 8 different breeds at 7 weeks old (n = 52) and 6 years old (n = 34) using modified temperament tests, physiological measures, and a follow-up questionnaire. For 7-week-old puppies, results revealed (a) puppy breed was predictable using 3 variables, (b) 4 American Kennel Club breed groups had some validity based on temperament, (c) temperament was variable within litters of puppies, and (d) certain measures of temperament were related to physiological measures (heart rate). Finally, puppy temperament assessments were reliable in predicting the scores of 2 of the 8 adult dog temperament measures. However, overall, the puppy temperament scores were unreliable in predicting adult temperament.
Hofmann, Elisabeth; Robold, Matthias; Proff, Peter; Kirschneck, Christian
2017-03-01
The method published in 1973 by Demirjian et al. to assess age based on the mineralisation stage of permanent teeth is standard practice in forensic and orthodontic diagnostics. From age 14 onwards, however, this method is only applicable to third molars. No current epidemiological data on third molar mineralisation are available for Caucasian Central-Europeans. Thus, a method for assessing age in this population based on third molar mineralisation is presented, taking into account possible topographic and gender-specific differences. The study included 486 Caucasian Central-European orthodontic patients (9-24 years) with unaffected dental development. In an anonymized, randomized, and blinded manner, one orthopantomogram of each patient at either start, mid or end of treatment was visually analysed regarding the mineralisation stage of the third molars according to the method by Demirjian et al. Corresponding topographic and gender-specific point scores were determined and added to form a dental maturity score. Prediction equations for age assessment were derived by linear regression analysis with chronological age and checked for reliability within the study population. Mineralisation of the lower third molars was slower than mineralisation of the upper third molars, whereas no jaw-side-specific differences were detected. Gender-specific differences were relatively small, but girls reached mineralisation stage C earlier than boys, whereas boys showed an accelerated mineralisation between the ages of 15 and 16. The global equation generated by regression analysis (age = -1.103 + 0.268 × dental maturity score 18 + 28 + 38 + 48) is sufficiently accurate and reliable for clinical use. Age assessment only based on either maxilla or mandible also shows good prognostic reliability.
O'Connor, S; McCaffrey, N; Whyte, E; Moran, K
2016-07-01
To adapt the trunk stability test to facilitate further sub-classification of higher levels of core stability in athletes for use as a screening tool. To establish the inter-tester and intra-tester reliability of this adapted core stability test. Reliability study. Collegiate athletic therapy facilities. Fifteen physically active male subjects (19.46 ± 0.63) free from any orthopaedic or neurological disorders were recruited from a convenience sample of collegiate students. The intraclass correlation coefficients (ICC) and 95% Confidence Intervals (CI) were computed to establish inter-tester and intra-tester reliability. Excellent ICC values were observed in the adapted core stability test for inter-tester reliability (0.97) and good to excellent intra-tester reliability (0.73-0.90). While the 95% CI were narrow for inter-tester reliability, Tester A and C 95% CI's were widely distributed compared to Tester B. The adapted core stability test developed in this study is a quick and simple field based test to administer that can further subdivide athletes with high levels of core stability. The test demonstrated high inter-tester and intra-tester reliability. Copyright © 2015 Elsevier Ltd. All rights reserved.
Reliability of 3D laser-based anthropometry and comparison with classical anthropometry.
Kuehnapfel, Andreas; Ahnert, Peter; Loeffler, Markus; Broda, Anja; Scholz, Markus
2016-05-26
Anthropometric quantities are widely used in epidemiologic research as possible confounders, risk factors, or outcomes. 3D laser-based body scans (BS) allow evaluation of dozens of quantities in short time with minimal physical contact between observers and probands. The aim of this study was to compare BS with classical manual anthropometric (CA) assessments with respect to feasibility, reliability, and validity. We performed a study on 108 individuals with multiple measurements of BS and CA to estimate intra- and inter-rater reliabilities for both. We suggested BS equivalents of CA measurements and determined validity of BS considering CA the gold standard. Throughout the study, the overall concordance correlation coefficient (OCCC) was chosen as indicator of agreement. BS was slightly more time consuming but better accepted than CA. For CA, OCCCs for intra- and inter-rater reliability were greater than 0.8 for all nine quantities studied. For BS, 9 of 154 quantities showed reliabilities below 0.7. BS proxies for CA measurements showed good agreement (minimum OCCC > 0.77) after offset correction. Thigh length showed higher reliability in BS while upper arm length showed higher reliability in CA. Except for these issues, reliabilities of CA measurements and their BS equivalents were comparable.
Safety, reliability, and validity of a physiologic definition of bronchopulmonary dysplasia.
Walsh, Michele C; Wilson-Costello, Deanna; Zadell, Arlene; Newman, Nancy; Fanaroff, Avroy
2003-09-01
Bronchopulmonary dysplasia (BPD) is the focus of many intervention trials, yet the outcome measure when based solely on oxygen administration may be confounded by differing criteria for oxygen administration between physicians. Thus, we wished to define BPD by a standardized oxygen saturation monitoring at 36 weeks corrected age, and compare this physiologic definition with the standard clinical definition of BPD based solely on oxygen administration. A total of 199 consecutive very low birthweight infants (VLBW, 501 to 1500 g birthweight) were assessed prospectively at 36+/-1 weeks corrected age. Neonates on positive pressure support or receiving >30% supplemental oxygen were assigned the outcome BPD. Those receiving < or =30% oxygen underwent a stepwise 2% reduction in supplemental oxygen to room air while under continuous observation and oxygen saturation monitoring. Outcomes of the test were "no BPD" (saturations > or =88% for 60 minutes) or "BPD" (saturation < 88%). At the conclusion of the test, all infants were returned to their baseline oxygen. Safety (apnea, bradycardia, increased oxygen use), inter-rater reliability, test-retest reliability, and validity of the physiologic definition vs the clinical definition were assessed. A total of 199 VLBW were assessed, of whom 45 (36%) were diagnosed with BPD by the clinical definition of oxygen use at 36 weeks corrected age. The physiologic definition identified 15 infants treated with oxygen who successfully passed the saturation monitoring test in room air. The physiologic definition diagnosed BPD in 30 (24%) of the cohort. All infants were safely studied. The test was highly reliable (inter-rater reliability, kappa=1.0; test-retest reliability, kappa=0.83) and highly correlated with discharge home in oxygen, length of hospital stay, and hospital readmissions in the first year of life. The physiologic definition of BPD is safe, feasible, reliable, and valid and improves the precision of the diagnosis of BPD. This may be of benefit in future multicenter clinical trials.
Langarika-Rocafort, Argia; Emparanza, José Ignacio; Aramendi, José F; Castellano, Julen; Calleja-González, Julio
2017-01-01
To examine the intra-observer reliability and agreement between five methods of measurement for dorsiflexion during Weight Bearing Dorsiflexion Lunge Test and to assess the degree of agreement between three methods in female athletes. Repeated measurements study design. Volleyball club. Twenty-five volleyball players. Dorsiflexion was evaluated using five methods: heel-wall distance, first toe-wall distance, inclinometer at tibia, inclinometer at Achilles tendon and the dorsiflexion angle obtained by a simple trigonometric function. For the statistical analysis, agreement was studied using the Bland-Altman method, the Standard Error of Measurement and the Minimum Detectable Change. Reliability analysis was performed using the Intraclass Correlation Coefficient. Measurement methods using the inclinometer had more than 6° of measurement error. The angle calculated by trigonometric function had 3.28° error. The reliability of inclinometer based methods had ICC values < 0.90. Distance based methods and trigonometric angle measurement had an ICC values > 0.90. Concerning the agreement between methods, there was from 1.93° to 14.42° bias, and from 4.24° to 7.96° random error. To assess DF angle in WBLT, the angle calculated by a trigonometric function is the most repeatable method. The methods of measurement cannot be used interchangeably. Copyright © 2016 Elsevier Ltd. All rights reserved.
Wilson, Annabelle; Magarey, Anthea; Mastersson, Nadia
2013-01-01
Childhood overweight and obesity are a growing concern globally, and environments, including the home and school, can contribute to this epidemic. This paper assesses the reliability of two questionnaires (parent and teacher) used in the evaluation of a community-based childhood obesity prevention intervention, the eat well be active (ewba) Community Programs. Parents and teachers were recruited from two primary schools and they completed the same questionnaire twice in 2008 and 2009. Data from both questionnaires were classified into outcomes relevant to healthy eating and activity, and target outcomes, based on the goals of the ewba Community Programs, were identified. Fourteen and 12 outcomes were developed from the parent and teacher questionnaires, respectively. Sixty parents and 28 teachers participated in the reliability study. Intraclass correlation coefficients for outcomes ranged from 0.37 to 0.92 (parent) (P < 0.05) and from 0.42 to 0.86 (teacher) (P < 0.05). Internal consistency, measured by Cronbach's alpha, of teacher scores ranged from 0.11 to 0.91 and 0.13 to 0.78 for scores from the parent questionnaire. The parent and teacher questionnaires are moderately reliable tools for simultaneously assessing child intakes, environments, attitudes, and knowledge associated with healthy eating and physical activity in the home and school and may be useful for evaluation of similar programs.
Improving the Validity and Reliability of a Health Promotion Survey for Physical Therapists
Stephens, Jaca L.; Lowman, John D.; Graham, Cecilia L.; Morris, David M.; Kohler, Connie L.; Waugh, Jonathan B.
2013-01-01
Purpose Physical therapists (PTs) have a unique opportunity to intervene in the area of health promotion. However, no instrument has been validated to measure PTs’ views on health promotion in physical therapy practice. The purpose of this study was to evaluate the content validity and test-retest reliability of a health promotion survey designed for PTs. Methods An expert panel of PTs assessed the content validity of “The Role of Health Promotion in Physical Therapy Survey” and provided suggestions for revision. Item content validity was assessed using the content validity ratio (CVR) as well as the modified kappa statistic. Therapists then participated in the test-retest reliability assessment of the revised health promotion survey, which was assessed using a weighted kappa statistic. Results Based on feedback from the expert panelists, significant revisions were made to the original survey. The expert panel reached at least a majority consensus agreement for all items in the revised survey and the survey-CVR improved from 0.44 to 0.66. Only one item on the revised survey had substantial test-retest agreement, with 55% of the items having moderate agreement and 43% poor agreement. Conclusions All items on the revised health promotion survey demonstrated at least fair validity, but few items had reasonable test-retest reliability. Further modifications should be made to strengthen the validity and improve the reliability of this survey. PMID:23754935
Empirical Recommendations for Improving the Stability of the Dot-Probe Task in Clinical Research
Price, Rebecca B.; Kuckertz, Jennie M.; Siegle, Greg J.; Ladouceur, Cecile D.; Silk, Jennifer S.; Ryan, Neal D.; Dahl, Ronald E.; Amir, Nader
2014-01-01
The dot-probe task has been widely used in research to produce an index of biased attention based on reaction times (RTs). Despite its popularity, very few published studies have examined psychometric properties of the task, including test-retest reliability, and no previous study has examined reliability in clinically anxious samples or systematically explored the effects of task design and analysis decisions on reliability. In the current analysis, we utilized dot-probe data from three studies where attention bias towards threat-related faces was assessed at multiple (≥5) timepoints. Two of the studies were similar (adults with Social Anxiety Disorder, similar design features) while one was much more disparate (pediatric healthy volunteers, distinct task design). We explored the effects of analysis choices (e.g., bias score calculation formula, methods for outlier handling) on reliability and searched for convergence of findings across the three studies. We found that, when considering the three studies concurrently, the most reliable RT bias index utilized data from dot-bottom trials, comparing congruent to incongruent trials, with rescaled outliers, particularly after averaging across more than one assessment point. Although reliability of RT bias indices was moderate to low under most circumstances, within-session variability in bias (attention bias variability; ABV), a recently proposed RT index, was more reliable across sessions. Several eyetracking-based indices of attention bias (available in the pediatric healthy sample only) showed reliability that matched the optimal RT index (ABV). On the basis of these findings, we make specific recommendations to researchers using the dot probe, particularly those wishing to investigate individual differences and/or single-patient applications. PMID:25419646
Maximizing Statistical Power When Verifying Probabilistic Forecasts of Hydrometeorological Events
NASA Astrophysics Data System (ADS)
DeChant, C. M.; Moradkhani, H.
2014-12-01
Hydrometeorological events (i.e. floods, droughts, precipitation) are increasingly being forecasted probabilistically, owing to the uncertainties in the underlying causes of the phenomenon. In these forecasts, the probability of the event, over some lead time, is estimated based on some model simulations or predictive indicators. By issuing probabilistic forecasts, agencies may communicate the uncertainty in the event occurring. Assuming that the assigned probability of the event is correct, which is referred to as a reliable forecast, the end user may perform some risk management based on the potential damages resulting from the event. Alternatively, an unreliable forecast may give false impressions of the actual risk, leading to improper decision making when protecting resources from extreme events. Due to this requisite for reliable forecasts to perform effective risk management, this study takes a renewed look at reliability assessment in event forecasts. Illustrative experiments will be presented, showing deficiencies in the commonly available approaches (Brier Score, Reliability Diagram). Overall, it is shown that the conventional reliability assessment techniques do not maximize the ability to distinguish between a reliable and unreliable forecast. In this regard, a theoretical formulation of the probabilistic event forecast verification framework will be presented. From this analysis, hypothesis testing with the Poisson-Binomial distribution is the most exact model available for the verification framework, and therefore maximizes one's ability to distinguish between a reliable and unreliable forecast. Application of this verification system was also examined within a real forecasting case study, highlighting the additional statistical power provided with the use of the Poisson-Binomial distribution.
Parhar, Harman S; Thamboo, Andrew; Habib, Al-Rahim; Chang, Brent; Gan, Eng Cern; Javer, Amin R
2014-04-01
The Philpott-Javer postoperative endoscopic mucosal staging system for allergic fungal rhinosinusitis has previously demonstrated acceptable interrater reliability among rhinologists. There are, however, numerous learners involved in patient care at tertiary centers. This study aims to analyze the interrater and intrarater reliability of this system among learners in otolaryngology at different stages in training. A prospective analysis of retrospectively collected endoscopic photographs. A tertiary care teaching hospital (January 2013). Fifty patients undergoing routine follow-up. Three photographs from each of 50 patients undergoing routine postsurgical nasoendoscopy were reviewed. Images were played twice, 1 week apart, in 2 differently randomized cycles and scored according to Philpott-Javer criteria by a rhinologist, a rhinology fellow, a senior otolaryngology resident, a junior otolaryngology resident, and a medical student. Interobserver reliability was assessed using the intraclass correlation coefficient, while intrarater reliability was assessed by Shrout-Fleiss κ values. Agreement between each learner and the rhinologist was also assessed using κ values. The interclass correlation among the 5 raters was 0.7600 (95% confidence interval, 0.6917-0.8161) for the Philpott-Javer scoring system, suggesting substantial reliability. Intrarater data showed substantial to almost-perfect reliability (κ values between 0.668 and 0.815) among all raters using this system. There was also moderate to substantial agreement between the learners and the rhinologist (κ values between 0.534 and 0.710). Results suggest that the Philpott-Javer staging system has acceptable intrarater and interrater reliability among learners of differing levels of clinical experience and is suitable for evaluating progress following surgery.
Li, Yingshuang; Ding, Chunge
2017-01-01
The Adult Carer Quality of Life questionnaire (AC-QoL) is a reliable and valid instrument used to assess the quality of life (QoL) of adult family caregivers. We explored the psychometric properties and tested the reliability and validity of a Chinese version of the AC-QoL with reliability and validity testing in 409 Chinese stroke caregivers. We used item-total correlation and extreme group comparison to do item analysis. To evaluate its reliability, we used a test-retest reliability approach, intraclass correlation coefficient (ICC), together with Cronbach’s alpha and model-based internal consistency index; to evaluate its validity, we used scale content validity, confirmatory factor analysis (CFA) and exploratory factor analysis (EFA) via principal component analysis with varimax rotation. We found that the CFA did not in fact confirm the original factor model and our EFA yielded a 31-item measure with a five-factor model. In conclusions, although some items performed differently in our analysis of the original English language version and our Chinese language version, our translated AC-QoL is a reliable and valid tool which can be used to assess the quality of life of stroke caregivers in mainland China. Chinese version AC-QoL is a comprehensive and good measurement to understand caregivers and has the potential to be a screening tool to assess QoL of caregiver. PMID:29131845
Laibhen-Parkes, Natasha; Kimble, Laura P; Melnyk, Bernadette Mazurek; Sudia, Tanya; Codone, Susan
2018-06-01
Instruments used to assess evidence-based practice (EBP) competence in nurses have been subjective, unreliable, or invalid. The Fresno test was identified as the only instrument to measure all the steps of EBP with supportive reliability and validity data. However, the items and psychometric properties of the original Fresno test are only relevant to measure EBP with medical residents. Therefore, the purpose of this paper is to describe the development of the adapted Fresno test for pediatric nurses, and provide preliminary validity and reliability data for its use with Bachelor of Science in Nursing-prepared pediatric bedside nurses. General adaptations were made to the original instrument's case studies, item content, wording, and format to meet the needs of a pediatric nursing sample. The scoring rubric was also modified to complement changes made to the instrument. Content and face validity, and intrarater reliability of the adapted Fresno test were assessed during a mixed-methods pilot study conducted from October to December 2013 with 29 Bachelor of Science in Nursing-prepared pediatric nurses. Validity data provided evidence for good content and face validity. Intrarater reliability estimates were high. The adapted Fresno test presented here appears to be a valid and reliable assessment of EBP competence in Bachelor of Science in Nursing-prepared pediatric nurses. However, further testing of this instrument is warranted using a larger sample of pediatric nurses in diverse settings. This instrument can be a starting point for evaluating the impact of EBP competence on patient outcomes. © 2018 Sigma Theta Tau International.
Miciak, Jeremy; Taylor, Pat; Denton, Carolyn A.; Fletcher, Jack M.
2014-01-01
Purpose Few empirical investigations have evaluated learning disabilities (LD) identification methods based on a pattern of cognitive strengths and weaknesses (PSW). This study investigated the reliability of LD classification decisions of the concordance/discordance method (C/DM) across different psychoeducational assessment batteries. Methods C/DM criteria were applied to assessment data from 177 second grade students based on two psychoeducational assessment batteries. The achievement tests were different, but were highly correlated and measured the same latent construct. Resulting LD identifications were then evaluated for agreement across batteries on LD status and the academic domain of eligibility. Results The two batteries identified a similar number of participants as having LD (80 and 74). However, indices of agreement for classification decisions were low (kappa = .29), especially for percent positive agreement (62%). The two batteries demonstrated agreement on the academic domain of eligibility for only 25 participants. Conclusions Cognitive discrepancy frameworks for LD identification are inherently unstable because of imperfect reliability and validity at the observed level. Methods premised on identifying a PSW profile may never achieve high reliability because of these underlying psychometric factors. An alternative is to directly assess academic skills to identify students in need of intervention. PMID:25243467
Spanager, Lene; Beier-Holgersen, Randi; Dieckmann, Peter; Konge, Lars; Rosenberg, Jacob; Oestergaard, Doris
2013-11-01
Nontechnical skills are essential for safe and efficient surgery. The aim of this study was to evaluate the reliability of an assessment tool for surgeons' nontechnical skills, Non-Technical Skills for Surgeons dk (NOTSSdk), and the effect of rater training. A 1-day course was conducted for 15 general surgeons in which they rated surgeons' nontechnical skills in 9 video recordings of scenarios simulating real intraoperative situations. Data were gathered from 2 sessions separated by a 4-hour training session. Interrater reliability was high for both pretraining ratings (Cronbach's α = .97) and posttraining ratings (Cronbach's α = .98). There was no statistically significant development in assessment skills. The D study showed that 2 untrained raters or 1 trained rater was needed to obtain generalizability coefficients >.80. The high pretraining interrater reliability indicates that videos were easy to rate and Non-Technical Skills for Surgeons dk easy to use. This implies that Non-Technical Skills for Surgeons dk (NOTSSdk) could be an important tool in surgical training, potentially improving safety and quality for surgical patients. Copyright © 2013 Elsevier Inc. All rights reserved.
Pisarnturakit, Pagaporn P; Shaw, Bret R; Tanasukarn, Chanuantong; Vatanasomboon, Paranee
2012-09-01
Primary caregivers' child oral health care beliefs and practices are major factors in the prevention of Early Childhood Caries (ECC). This study assessed the validity and reliability of a newly-developed scale--the Early Childhood Caries Perceptions Scale (ECCPS)--used to measure beliefs regarding ECC preventive practices among primary caregivers of young children. The ECCPS was developed based on the Health Belief Model. The construct validity and reliability of the ECCPS were examined among 254 low-socioeconomic status primary caregivers with children under five years old, recruifed from 4 Bangkok Metropolitan Administration Health Centers and a kindergarten school. Exploratory factor analysis (EFA) revealed a four-factor structure. The four factors were labeled as Perceived Susceptibility, Perceived Severity, Perceived Benefits and Perceived Barriers. Internal consistency measured by the Cronbach's coefficient alpha for those four factors were 0.897, 0.971, 0.975 and 0.789, respectively. The ECCPS demonstrated satisfactory levels of reliability and validity for assessing the health beliefs related to ECC prevention among low-socioeconomic primary caregivers.
Fino, Edita; Mazzetti, Michela
2018-04-23
Smartphone applications are considered as the prime candidate for the purposes of large-scale, low-cost and long-term sleep monitoring. How reliable and scientifically grounded is smartphone-based assessment of healthy and disturbed sleep remains a key issue in this direction. Here we offer a review of validation studies of sleep applications to the aim of providing some guidance in terms of their reliability to assess sleep in healthy and clinical populations, and stimulating further examination of their potential for clinical use and improved sleep hygiene. Electronic literature review was conducted on Pubmed. Eleven validation studies published since 2012 were identified, evaluating smartphone applications' performance compared to standard methods of sleep assessment in healthy and clinical samples. Studies with healthy populations show that most sleep applications meet or exceed accuracy levels of wrist-based actigraphy in sleep-wake cycle discrimination, whereas performance levels drop in individuals with low sleep efficiency (SE) and in clinical populations, mirroring actigraphy results. Poor correlation with polysomnography (PSG) sleep sub-stages is reported by most accelerometer-based apps. However, multiple parameter-based applications (i.e., EarlySense, SleepAp) showed good capability in detection of sleep-wake stages and sleep-related breathing disorders (SRBD) like obstructive sleep apnea (OSA) respectively with values similar to PSG. While the reviewed evidence suggests a potential role of smartphone sleep applications in pre-screening of SRBD, more experimental studies are warranted to assess their reliability in sleep-wake detection particularly. Apps' utility in post treatment follow-up at home or as an adjunct to the sleep diary in clinical setting is also stressed.
Implementing the undergraduate mini-CEX: a tailored approach at Southampton University.
Hill, Faith; Kendall, Kathleen; Galbraith, Kevin; Crossley, Jim
2009-04-01
The mini-clinical evaluation exercise (mini-CEX) is widely used in the UK to assess clinical competence, but there is little evidence regarding its implementation in the undergraduate setting. This study aimed to estimate the validity and reliability of the undergraduate mini-CEX and discuss the challenges involved in its implementation. A total of 3499 mini-CEX forms were completed. Validity was assessed by estimating associations between mini-CEX score and a number of external variables, examining the internal structure of the instrument, checking competency domain response rates and profiles against expectations, and by qualitative evaluation of stakeholder interviews. Reliability was evaluated by overall reliability coefficient (R), estimation of the standard error of measurement (SEM), and from stakeholders' perceptions. Variance component analysis examined the contribution of relevant factors to students' scores. Validity was threatened by various confounding variables, including: examiner status; case complexity; attachment specialty; patient gender, and case focus. Factor analysis suggested that competency domains reflect a single latent variable. Maximum reliability can be achieved by aggregating scores over 15 encounters (R = 0.73; 95% confidence interval [CI] +/- 0.28 based on a 6-point assessment scale). Examiner stringency contributed 29% of score variation and student attachment aptitude 13%. Stakeholder interviews revealed staff development needs but the majority perceived the mini-CEX as more reliable and valid than the previous long case. The mini-CEX has good overall utility for assessing aspects of the clinical encounter in an undergraduate setting. Strengths include fidelity, wide sampling, perceived validity, and formative observation and feedback. Reliability is limited by variable examiner stringency, and validity by confounding variables, but these should be viewed within the context of overall assessment strategies.
Animal-Based Measures to Assess the Welfare of Extensively Managed Ewes
Hemsworth, Paul; Doyle, Rebecca
2017-01-01
Simple Summary The aim of this study was to assess the reliability and practicality of 10 animal-based welfare measures for extensively managed ewes, which were derived from the scientific literature, previous welfare protocols and through consultation with veterinarians and animal welfare scientists. Measures were examined on 100 Merino ewes, which were individually identified and repeatedly examined at mid-pregnancy, mid-lactation and weaning. Body condition score, fleece condition, skin lesions, tail length, dag score and lameness are proposed for on-farm use in welfare assessments of extensive sheep production systems. These six welfare measures, which address the main welfare concerns for extensively managed ewes, can be reliably and feasibly measured in the field. Abstract The reliability and feasibility of 10 animal-based measures of ewe welfare were examined for use in extensive sheep production systems. Measures were: Body condition score (BCS), rumen fill, fleece cleanliness, fleece condition, skin lesions, tail length, dag score, foot-wall integrity, hoof overgrowth and lameness, and all were examined on 100 Merino ewes (aged 2–4 years) during mid-pregnancy, mid-lactation and weaning by a pool of nine trained observers. The measures of BCS, fleece condition, skin lesions, tail length, dag score and lameness were deemed to be reliable and feasible. All had good observer agreement, as determined by the percentage of agreement, Kendall’s coefficient of concordance (W) and Kappa (k) values. When combined, these nutritional and health measures provide a snapshot of the current welfare status of ewes, as well as evidencing previous or potential welfare issues. PMID:29295551
RELIABILITY CONCERNS IN THE REPEATED COMPUTERIZED ASSESSMENT OF ATTENTION IN CHILDREN
Zabel, T. Andrew; von Thomsen, Christian; Cole, Carolyn; Martin, Rebecca; Mahone, E. Mark
2010-01-01
Assessment of attentional processes via computerized assessment is frequently used to quantify intra-individual cognitive improvement or decline in response to treatment. However, assessment of intra-individual change is highly dependent on sufficient test reliability. We examined the test–retest reliability of selected variables from one popular computerized continuous performance test (CPT)—i.e., the Conners’ CPT – Second Edition (CPT-II). Participants were 39 healthy children (20 girls) ages 6–18 without intellectual impairment (mean PPVT-III SS = 102.6), LD, or psychiatric disorders (DICA-IV). Test–retest reliability over the 3–8 month interval (mean = 6 months) was acceptable (Intraclass Correlations [ICC] = .82 to .92) on comparison measures (Beery Test of Visual Perception, WISC-IV Block Design, PPVT-III). In contrast, test–retest reliability was only modest for CPT-II raw scores (ICCs ranging from .62 to .82) and T-scores (ICCs ranging from .33 to .65) for variables of interest (Omissions, Commissions, Variability, Hit Reaction Time, and Attentiveness). Using test–retest reliability information published in the CPT-II manual, 90% confidence intervals based on reliable change index (RCI) methodology were constructed to examine the significance of test–retest difference/change scores. Of the participants in this sample of typically developing youth, 30% generated intra-individual changes in T-scores on the Omissions and Attentiveness variables that exceeded the 90% confidence intervals and qualified as “statistically rare” changes in score. These results suggest a considerable degree of normal variability in CPT-II test scores over extended test–retest intervals, and suggest a need for caution when interpreting test score changes in neurologically unstable clinical populations. PMID:19452302
Singh, Amika S; Chinapaw, Mai J M; Uijtdewilligen, Léonie; Vik, Froydis N; van Lippevelde, Wendy; Fernández-Alvira, Juan M; Stomfai, Sarolta; Manios, Yannis; van der Sluijs, Maria; Terwee, Caroline; Brug, Johannes
2012-08-13
Insight in parental energy balance-related behaviours, their determinants and parenting practices are important to inform childhood obesity prevention. Therefore, reliable and valid tools to measure these variables in large-scale population research are needed. The objective of the current study was to examine the test-retest reliability and construct validity of the parent questionnaire used in the ENERGY-project, assessing parental energy balance-related behaviours, their determinants, and parenting practices among parents of 10-12 year old children. We collected data among parents (n = 316 in the test-retest reliability study; n = 109 in the construct validity study) of 10-12 year-old children in six European countries, i.e. Belgium, Greece, Hungary, the Netherlands, Norway, and Spain. Test-retest reliability was assessed using the intra-class correlation coefficient (ICC) and percentage agreement comparing scores from two measurements, administered one week apart. To assess construct validity, the agreement between questionnaire responses and a subsequent interview was assessed using ICC and percentage agreement.All but one item showed good to excellent test-retest reliability as indicated by ICCs > .60 or percentage agreement ≥ 75%. Construct validity appeared to be good to excellent for 92 out of 121 items, as indicated by ICCs > .60 or percentage agreement ≥ 75%. From the other 29 items, construct validity was moderate for 24 and poor for 5 items. The reliability and construct validity of the items of the ENERGY-parent questionnaire on multiple energy balance-related behaviours, their potential determinants, and parenting practices appears to be good. Based on the results of the validity study, we strongly recommend adapting parts of the ENERGY-parent questionnaire if used in future research.
Reliability assessments in qualitative health promotion research.
Cook, Kay E
2012-03-01
This article contributes to the debate about the use of reliability assessments in qualitative research in general, and health promotion research in particular. In this article, I examine the use of reliability assessments in qualitative health promotion research in response to health promotion researchers' commonly held misconception that reliability assessments improve the rigor of qualitative research. All qualitative articles published in the journal Health Promotion International from 2003 to 2009 employing reliability assessments were examined. In total, 31.3% (20/64) articles employed some form of reliability assessment. The use of reliability assessments increased over the study period, ranging from <20% in 2003/2004 to 50% and above in 2008/2009, while at the same time the total number of qualitative articles decreased. The articles were then classified into four types of reliability assessments, including the verification of thematic codes, the use of inter-rater reliability statistics, congruence in team coding and congruence in coding across sites. The merits of each type were discussed, with the subsequent discussion focusing on the deductive nature of reliable thematic coding, the limited depth of immediately verifiable data and the usefulness of such studies to health promotion and the advancement of the qualitative paradigm.
Busch, Robyn M; Lineweaver, Tara T; Ferguson, Lisa; Haut, Jennifer S
2015-06-01
Reliable change indices (RCIs) and standardized regression-based (SRB) change score norms permit evaluation of meaningful changes in test scores following treatment interventions, like epilepsy surgery, while accounting for test-retest reliability, practice effects, score fluctuations due to error, and relevant clinical and demographic factors. Although these methods are frequently used to assess cognitive change after epilepsy surgery in adults, they have not been widely applied to examine cognitive change in children with epilepsy. The goal of the current study was to develop RCIs and SRB change score norms for use in children with epilepsy. Sixty-three children with epilepsy (age range: 6-16; M=10.19, SD=2.58) underwent comprehensive neuropsychological evaluations at two time points an average of 12 months apart. Practice effect-adjusted RCIs and SRB change score norms were calculated for all cognitive measures in the battery. Practice effects were quite variable across the neuropsychological measures, with the greatest differences observed among older children, particularly on the Children's Memory Scale and Wisconsin Card Sorting Test. There was also notable variability in test-retest reliabilities across measures in the battery, with coefficients ranging from 0.14 to 0.92. Reliable change indices and SRB change score norms for use in assessing meaningful cognitive change in children following epilepsy surgery are provided for measures with reliability coefficients above 0.50. This is the first study to provide RCIs and SRB change score norms for a comprehensive neuropsychological battery based on a large sample of children with epilepsy. Tables to aid in evaluating cognitive changes in children who have undergone epilepsy surgery are provided for clinical use. An Excel sheet to perform all relevant calculations is also available to interested clinicians or researchers. Copyright © 2015 Elsevier Inc. All rights reserved.
de Vries, Nienke M; Staal, J Bart; Olde Rikkert, Marcel G M; Nijhuis-van der Sanden, Maria W G
2013-04-01
Physical activity is assumed to be important in the prevention and treatment of frailty. It is unclear, however, to what extent frailty can be influenced because instruments designed to assess frailty have not been validated as evaluative outcome instruments in clinical practice. The aims of this study were: (1) to develop a frailty index (i.e., the evaluative frailty index for physical activity [EFIP]) based on the method of deficit accumulation and (2) to test the clinimetric properties of the EFIP. The content of the EFIP was determined using a written Delphi procedure. Intrarater reliability, interrater reliability, and construct validity were determined in an observational study (n=24). Intrarater reliability and interrater reliability were calculated using Cohen kappa and intraclass correlation coefficients (ICCs). Construct validity was determined by correlating the score on the EFIP with those on the timed "up & go" test (TUG), the performance-oriented mobility assessment (POMA), and the Cumulative Illness Rating Scale for Geriatrics (CIRS-G). Fifty items were included in the EFIP. Interrater reliability (Cohen kappa=0.72, ICC=.96) and intrarater reliability (Cohen kappa=0.77 and 0.80, ICC=.93 and .98) were good. As expected, a fair to moderate correlation with the TUG, POMA, and CIRS-G was found (.61, -.70, and .66, respectively). Reliability and validity of the EFIP have been tested in a small sample. These and other clinimetric properties, such as responsiveness, will be assessed or reassessed in a larger study population. The EFIP is a reliable and valid instrument to evaluate the effect of physical activity on frailty in research and in clinical practice.
School Psychologists and the Assessment of Culturally and Linguistically Diverse Students
ERIC Educational Resources Information Center
Vega, Desireé; Lasser, Jon; Afifi, Amanda F. M.
2016-01-01
In recent years, school psychologists have increasingly recognized the importance of using valid and reliable methods to assess culturally and linguistically diverse (CLD) students for special education eligibility. However, little is known about their assessment practices or preparation in this area. To address these questions, a Web-based survey…
ERIC Educational Resources Information Center
Pena, Elizabeth D.; Gillam, Ronald B.; Malek, Melynn; Ruiz-Felter, Roxanna; Resendiz, Maria; Fiestas, Christine; Sabel, Tracy
2006-01-01
Two experiments examined reliability and classification accuracy of a narration-based dynamic assessment task. Purpose: The first experiment evaluated whether parallel results were obtained from stories created in response to 2 different wordless picture books. If so, the tasks and measures would be appropriate for assessing pretest and posttest…
Evaluating the Use of Criteria for Assessing Profession-Specific Communication Skills in Pharmacy
ERIC Educational Resources Information Center
Hyvarinen, Marja-Leena; Tanskanen, Paavo; Katajavuori, Nina; Isotalus, Pekka
2012-01-01
One central task in higher education is to provide students with interpersonal communication competence in their profession. To achieve this, specialised training, based on an understanding of disciplinary communication practices and appropriate assessment methods, is needed. However, there is a lack of reliable assessment instruments which are…
Tomassen, P; Newson, R B; Hoffmans, R; Lötvall, J; Cardell, L O; Gunnbjörnsdóttir, M; Thilsing, T; Matricardi, P; Krämer, U; Makowska, J S; Brozek, G; Gjomarkaj, M; Howarth, P; Loureiro, C; Toskala, E; Fokkens, W; Bachert, C; Burney, P; Jarvis, D
2011-04-01
The European Position Paper on Rhinosinusitis and Nasal Polyps (EP3OS) incorporates symptomatic, endoscopic, and radiologic criteria in the clinical diagnosis of chronic rhinosinusitis (CRS), while in epidemiological studies, the definition is based on symptoms only. We aimed to assess the reliability and validity of a symptom-based definition of CRS using data from the GA(2) LEN European survey. On two separate occasions, 1700 subjects from 11 centers provided information on symptoms of CRS, allergic rhinitis, and asthma. CRS was defined by the epidemiological EP3OS symptom criteria. The difference in prevalence of CRS between two study points, the standardized absolute repeatability, and the chance-corrected repeatability (kappa) were determined. In two centers, 342 participants underwent nasal endoscopy. The association of symptom-based CRS with endoscopy and self-reported doctor-diagnosed CRS was assessed. There was a decrease in prevalence of CRS between the two study phases, and this was consistent across all centers (-3.0%, 95% CI: -5.0 to -1.0%, I(2) = 0). There was fair to moderate agreement between the two occasions (kappa = 39.6). Symptom-based CRS was significantly associated with positive endoscopy in nonallergic subjects, and with self-reported doctor-diagnosed CRS in all subjects, irrespective of the presence of allergic rhinitis. Our findings suggest that a symptom-based definition of CRS, according to the epidemiological part of the EP3OS criteria, has a moderate reliability over time, is stable between study centers, is not influenced by the presence of allergic rhinitis, and is suitable for the assessment of geographic variation in prevalence of CRS. © 2010 John Wiley & Sons A/S.
ERIC Educational Resources Information Center
Vu, Nu Viet; And Others
1992-01-01
The use of a performance-based assessment of senior medical students' clinical skills utilizing standardized patients was evaluated, with 6,804 student-patient encounters involving 405 students over 6 years. Results provide evidence for test security, content validity, construct validity, reliability, and test ability to discriminate a wide range…
Sfendla, Anis; Laita, Meriame; Nejjar, Basma; Souirti, Zouhayr; Touhami, Ahami Ahmed Omar; Senhaji, Meftaha
2018-05-01
The extensive accessibility to smartphones in the last decade raises the concerns of addictive behavior patterns toward these technologies worldwide and in developing countries, and Arabic ones in particular. In an area of stigmatized behavior such as Internet and smartphone addiction, the hypothesis extends to whether there is a reliable instrument that can assess smartphone addiction. To our knowledge, no scale in Arabic language is available to assess maladaptive behavior associated with smartphone use. This study aims to assess the factorial validity and internal reliability of the Arabic Smartphone Addiction Scale (SAS) and Smartphone Addiction Scale-Short Version (SAS-SV) in a Moroccan surveyed population. Participants (N = 440 and N = 310) completed an online survey, including SAS, SAS-SV, and questions about sociodemographic status. Factor analysis results showed six factors with factor loading ranging from 0.25 to 0.99 for SAS. Reliability, based on Cronbach's alpha, was excellent (α = 0.94) for this instrument. The SAS-SV showed one factor (unidimensional construct), and internal reliability was in the good range with an alpha coefficient of (α = 0.87). The prevalence of excessive users was 55.8 percent with highest symptom prevalence reported for tolerance and preoccupation. This study proved factor validity of the Arabic SAS and SAS-SV instruments and confirmed their internal reliability.
Reliability and agreement on embryo assessment: 5 years of an external quality control programme.
Martínez-Granados, Luis; Serrano, María; González-Utor, Antonio; Ortiz, Nereyda; Badajoz, Vicente; López-Regalado, María Luisa; Boada, Montserrat; Castilla, Jose A
2018-03-01
An external quality-control programme for morphology-based embryo quality assessment, incorporating a standardized embryo grading scheme, was evaluated over a period of 5 years to determine levels of inter-observer reliability and agreement between practising clinical embryologists at IVF centres and the opinions of a panel of experts. Following Guidelines for Reporting Reliability and Agreement Studies, the Gwet index and proportion of positive (Ppos) and negative agreement were calculated. For embryo morphology assessment, a substantial degree of reliability was measured between the centres and the panel of experts (Gwet index: 0.76; 95% CI 0.70 to 0.84). The agreement was higher for good- versus poor-quality embryos. When multinucleation or vacuoles were observed, low levels of reliability were obtained (Ppos: 0.56 and 0.43, respectively). In blastocysts, the characteristic that presented the largest discrepancy was that related to the inner cell mass. In decisions about the final disposition of the embryo, reliability between centre and the panel of experts was moderate (Gwet index: 0.51; 95% CI 0.41 to 0.60). In conclusion, the ability of clinical embryologists to evaluate the presence of multinucleation and vacuoles in the early cleavage embryo, and to determine the category of the inner cell mass in blastocysts, needs to be improved. Copyright © 2017 Reproductive Healthcare Ltd. All rights reserved.
The Role of Reliability, Vulnerability and Resilience in the Management of Water Quality Systems
NASA Astrophysics Data System (ADS)
Lence, B. J.; Maier, H. R.
2001-05-01
The risk based performance indicators reliability, vulnerability and resilience provide measures of the frequency, magnitude and duration of the failure of water resources systems, respectively. They have been applied primarily to water supply problems, including the assessment of the performance of reservoirs and water distribution systems. Applications to water quality case studies have been limited, although the need to consider the length and magnitude of violations of a particular water quality standard has been recognized for some time. In this research, the role of reliability, vulnerability and resilience in water quality management applications is investigated by examining their significance as performance measures for water quality systems and assessing their potential for assisting in decision making processes. The importance of each performance indicator is discussed and a framework for classifying such systems, based on the relative significance of each of these indicators, is introduced and illustrated qualitatively with various case studies. Quantitative examples drawn from both lake and river water quality modeling exercises are then provided.
The Shutdown Dissociation Scale (Shut-D)
Schalinski, Inga; Schauer, Maggie; Elbert, Thomas
2015-01-01
The evolutionary model of the defense cascade by Schauer and Elbert (2010) provides a theoretical frame for a short interview to assess problems underlying and leading to the dissociative subtype of posttraumatic stress disorder. Based on known characteristics of the defense stages “fright,” “flag,” and “faint,” we designed a structured interview to assess the vulnerability for the respective types of dissociation. Most of the scales that assess dissociative phenomena are designed as self-report questionnaires. Their items are usually selected based on more heuristic considerations rather than a theoretical model and thus include anything from minor dissociative experiences to major pathological dissociation. The shutdown dissociation scale (Shut-D) was applied in several studies in patients with a history of multiple traumatic events and different disorders that have been shown previously to be prone to symptoms of dissociation. The goal of the present investigation was to obtain psychometric characteristics of the Shut-D (including factor structure, internal consistency, retest reliability, predictive, convergent and criterion-related concurrent validity). A total population of 225 patients and 68 healthy controls were accessed. Shut-D appears to have sufficient internal reliability, excellent retest reliability, high convergent validity, and satisfactory predictive validity, while the summed score of the scale reliably separates patients with exposure to trauma (in different diagnostic groups) from healthy controls. The Shut-D is a brief structured interview for assessing the vulnerability to dissociate as a consequence of exposure to traumatic stressors. The scale demonstrates high-quality psychometric properties and may be useful for researchers and clinicians in assessing shutdown dissociation as well as in predicting the risk of dissociative responding. PMID:25976478
Papinutto, N.; Schlaeger, R.; Panara, V.; Caverzasi, E.; Ahn, S.; Johnson, K.J.; Zhu, A.H.; Stern, W.A.; Laub, G.; Hauser, S.L.; Henry, R.G.
2018-01-01
PURPOSE In-vivo assessment of spinal cord gray matter (GM) and white matter (WM) could become pivotal to study various neurological diseases, but it is challenging because of insufficient GM/WM contrast provided by conventional MRI. Here we present and assess a procedure for measurement of spinal cord total cross-sectional area (TCA) and GM areas based on phase sensitive inversion recovery imaging (PSIR). MATERIALS AND METHODS We acquired 2D PSIR images at 3T at each disc level of the spinal axis on 10 healthy subjects and measured TCA, cord diameters, WM and GM area, and GM area/TCA ratio. We secondly investigated 32 healthy subjects at 4 selected levels (C2–C3, C3–C4, T8–T9, T9–T10, total acquisition time <8 minutes) and generated normative reference values of TCA and GM areas. We assessed test-retest, intra- and inter-operator reliability of the acquisition strategy and measurement steps. RESULTS The measurement procedure based on 2D PSIR imaging allowed TCA and GM area assessments along the entire spinal cord axis. The tests we performed revealed high test-retest/intra-operator reliability (mean coefficient of variation (COV) at C2–C3: TCA=0.41%, GM area=2.75%) and inter-operator reliability of the measurements (mean COV on the 4 levels: TCA=0.44%, GM area= 4.20%; mean intra-class correlation coefficient: TCA=0.998, GM area=0.906). CONCLUSION 2D PSIR allows reliable in-vivo assessment of spinal cord TCA, GM and WM areas in clinically feasible acquisition times. The area measurements presented here are in agreement with previous MRI and post-mortem studies. PMID:25483607
El Miedany, Yasser; El Gaafary, Maha; Youssef, Sally; Ahmed, Ihab
2016-01-01
Objectives. To assess the validity, reliability, and responsiveness to change of a patient self-reported questionnaire combining the Widespread Pain Index and the Symptom Severity Score as well as construct outcome measures and comorbidities assessment in fibromyalgia patients. Methods. The PROMs-FM was conceptualized based on frameworks used by the WHO Quality of Life tool and the PROMIS. Initially, cognitive interviews were conducted to identify item pool of questions. Item selection and reduction were achieved based on patients as well as an interdisciplinary group of specialists. Rasch and internal consistency reliability analyses were implemented. The questionnaire included the modified ACR criteria main items (Symptom Severity Score and Widespread Pain Index), in addition to assessment of functional disability, quality of life (QoL), review of the systems, and comorbidities. Every patient completed HAQ and EQ-5D questionnaires. Results. A total of 146 fibromyalgia patients completed the questionnaire. The PROMs-FM questionnaire was reliable as demonstrated by a high standardized alpha (0.886–0.982). Content construct assessment of the functional disability and QoL revealed significant correlation (p < 0.01) with both HAQ and EQ-5D. Changes in functional disability and QoL showed significant (p < 0.01) variation with diseases activity status in response to therapy. There was higher prevalence of autonomic symptoms, CVS risk, sexual dysfunction, and falling. Conclusions. The developed PROMs-FM questionnaire is a reliable and valid instrument for assessment of fibromyalgia patients. A phased treatment regimen depending on the severity of FMS as well as preferences and comorbidities of the patient is the best approach to tailored patient management. PMID:27190648
Sorsdahl, Anne Brit; Moe-Nilssen, Rolf; Strand, Liv Inger
2008-02-01
The aim of this study was to examine observer reliability of the Gross Motor Performance Measure (GMPM) and the Quality of Upper Extremity Skills Test (QUEST) based on video clips. The tests were administered to 26 children with cerebral palsy (CP; 14 males, 12 females; range 2-13y, mean 7y 6mo), 24 with spastic CP, and two with dyskinesia. Respectively, five, six, five, four, and six children were classified in Gross Motor Function Classification System Levels I to V; and four, nine, five, five, and three children were classified in Manual Ability Classification System levels I to V. The children's performances were recorded and edited. Two experienced paediatric physical therapists assessed the children from watching the video clips. Intraobserver and interobserver reliability values of the total scores were mostly high, intraclass correlation coefficient (ICC)(1,1) varying from 0.69 to 0.97 with only one coefficient below 0.89. The ICCs of subscores varied from 0.36 to 0.95, finding'Alignment'and'Weight shift'in GMPM and'Protective extension'in QUEST highly reliable. The subscores'Dissociated movements'in GMPM and QUEST, and'Grasp'in QUEST were the least reliable, and recommendations are made to increase reliability of these subscores. Video scoring was time consuming, but was found to offer many advantages; the possibility to review performance, to use special trained observers for scoring and less demanding assessment for the children.
Gale, T C E; Roberts, M J; Sice, P J; Langton, J A; Patterson, F C; Carr, A S; Anderson, I R; Lam, W H; Davies, P R F
2010-11-01
Assessment centres are an accepted method of recruitment in industry and are gaining popularity within medicine. We describe the development and validation of a selection centre for recruitment to speciality training in anaesthesia based on an assessment centre model incorporating the rating of candidate's non-technical skills. Expert consensus identified non-technical skills suitable for assessment at the point of selection. Four stations-structured interview, portfolio review, presentation, and simulation-were developed, the latter two being realistic scenarios of work-related tasks. Evaluation of the selection centre focused on applicant and assessor feedback ratings, inter-rater agreement, and internal consistency reliability coefficients. Predictive validity was sought via correlations of selection centre scores with subsequent workplace-based ratings of appointed trainees. Two hundred and twenty-four candidates were assessed over two consecutive annual recruitment rounds; 68 were appointed and followed up during training. Candidates and assessors demonstrated strong approval of the selection centre with more than 70% of ratings 'good' or 'excellent'. Mean inter-rater agreement coefficients ranged from 0.62 to 0.77 and internal consistency reliability of the selection centre score was high (Cronbach's α=0.88-0.91). The overall selection centre score was a good predictor of workplace performance during the first year of appointment. An assessment centre model based on the rating of non-technical skills can produce a reliable and valid selection tool for recruitment to speciality training in anaesthesia. Early results on predictive validity are encouraging and justify further development and evaluation.
NASA Astrophysics Data System (ADS)
Flanigan, Katherine A.; Johnson, Nephi R.; Hou, Rui; Ettouney, Mohammed; Lynch, Jerome P.
2017-04-01
The ability to quantitatively assess the condition of railroad bridges facilitates objective evaluation of their robustness in the face of hazard events. Of particular importance is the need to assess the condition of railroad bridges in networks that are exposed to multiple hazards. Data collected from structural health monitoring (SHM) can be used to better maintain a structure by prompting preventative (rather than reactive) maintenance strategies and supplying quantitative information to aid in recovery. To that end, a wireless monitoring system is validated and installed on the Harahan Bridge which is a hundred-year-old long-span railroad truss bridge that crosses the Mississippi River near Memphis, TN. This bridge is exposed to multiple hazards including scour, vehicle/barge impact, seismic activity, and aging. The instrumented sensing system targets non-redundant structural components and areas of the truss and floor system that bridge managers are most concerned about based on previous inspections and structural analysis. This paper details the monitoring system and the analytical method for the assessment of bridge condition based on automated data-driven analyses. Two primary objectives of monitoring the system performance are discussed: 1) monitoring fatigue accumulation in critical tensile truss elements; and 2) monitoring the reliability index values associated with sub-system limit states of these members. Moreover, since the reliability index is a scalar indicator of the safety of components, quantifiable condition assessment can be used as an objective metric so that bridge owners can make informed damage mitigation strategies and optimize resource management on single bridge or network levels.
Gorlin, Eugenia I; Dalrymple, Kristy; Chelminski, Iwona; Zimmerman, Mark
2016-08-30
Despite growing recognition that the symptoms and functional impairments of Attention Deficit/Hyperactivity Disorder (ADHD) persist into adulthood, only a few psychometrically sound diagnostic measures have been developed for the assessment of ADHD in adults, and none have been validated for use in a broad treatment-seeking psychiatric sample. The current study presents the reliability and validity of a semi-structured DSM-based diagnostic interview module for ADHD, which was administered to 1194 adults presenting to an outpatient psychiatric practice. The module showed excellent internal consistency and interrater reliability, good convergent and discriminant validity (as indexed by relatively high correlations with self-report measures of ADHD and ADHD-related constructs and little or no correlation with other, non-ADHD symptom domains), and good construct validity (as indexed by significantly higher rates of psychosocial impairment and self-reported family history of ADHD in individuals who meet criteria for an ADHD diagnosis). This instrument is thus a reliable and valid diagnostic tool for the detection of ADHD in adults presenting for psychiatric evaluation and treatment. Published by Elsevier Ireland Ltd.
Grant, Jon E; Kim, Suck Won; McCabe, James S
2006-06-01
Kleptomania presents difficulties in diagnosis for clinicians. This study aimed to develop and test a DSM-IV-based diagnostic instrument for kleptomania. To assess for current kleptomania the Structured Clinical Interview for Kleptomania (SCI-K) was administered to 112 consecutive subjects requesting psychiatric outpatient treatment for a variety of disorders. Reliability and validity were determined. Classification accuracy was examined using the longitudinal course of illness. The SCI-K demonstrated excellent test-retest (Phi coefficient = 0.956 (95% CI = 0.937, 0.970)) and inter-rater reliability (phi coefficient = 0.718 (95% CI = 0.506, 0.848)) in the diagnosis of kleptomania. Concurrent validity was observed with a self-report measure using DSM-IV kleptomania criteria (phi coefficient = 0.769 (95% CI = 0.653, 0.850)). Discriminant validity was observed with a measure of depression (point biserial coefficient = -0.020 (95% CI = -0.205, 0.166)). The SCI-K demonstrated both high sensitivity and specificity based on longitudinal assessment. The SCI-K demonstrated excellent reliability and validity in diagnosing kleptomania in subjects presenting with various psychiatric problems. These findings require replication in larger groups, including non-psychiatric populations, to examine their generalizability. Copyright (c) 2006 John Wiley & Sons, Ltd.
Carvalho, Teresa; Cunha, Marina; Pinto-Gouveia, José; Duarte, Joana
2015-03-30
The PTSD Checklist-Military Version (PCL-M) is a brief self-report instrument widely used to assess Post-traumatic Stress Disorder (PTSD) symptomatology in war Veterans, according to DSM-IV. This study sought out to explore the factor structure and reliability of the Portuguese version of the PCL-M. A sample of 660 Portuguese Colonial War Veterans completed the PCL-M. Several Confirmatory Factor Analyses were conducted to test different structures for PCL-M PTSD symptoms. Although the respecified first-order four-factor model based on King et al.'s model showed the best fit to the data, the respecified first and second-order models based on the DSM-IV symptom clusters also presented an acceptable fit. In addition, the PCL-M showed adequate reliability. The Portuguese version of the PCL-M is thus a valid and reliable measure to assess the severity of PTSD symptoms as described in DSM-IV. Its use with Portuguese Colonial War Veterans may ease screening of possible PTSD cases, promote more suitable treatment planning, and enable monitoring of therapeutic outcomes. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Hopmans, Cornelis J; den Hoed, Pieter T; Wallenburg, Iris; van der Laan, Lijkckle; van der Harst, Erwin; van der Elst, Maarten; Mannaerts, Guido H H; Dawson, Imro; van Lanschot, Jan J B; Ijzermans, Jan N M
2013-01-01
Currently, most surgical training programs are focused on the development and evaluation of professional competencies. Also in the Netherlands, competency-based training and assessment programs were introduced to restructure postgraduate medical training. The current surgical residency program is based on the Canadian Medical Education Directives for Specialists (CanMEDS) competencies and uses assessment tools to evaluate residents' competence progression. In this study, we examined the attitude of surgical residents and attending surgeons toward a competency-based training and assessment program used to restructure general surgical training in the Netherlands in 2009. In 2011, all residents (n = 51) and attending surgeons (n = 108) in 1 training region, consisting of 7 hospitals, were surveyed. Participants were asked to rate the importance of the CanMEDS competencies and the suitability of the adopted assessment tools. Items were rated on a 5-point Likert scale and considered relevant when at least 80% of the respondents rated an item with a score of 4 or 5 (indicating a positive attitude). Reliability was evaluated by calculating the Cronbach's α, and the Mann-Whitney test was applied to assess differences between groups. The response rate was 88% (n = 140). The CanMEDS framework demonstrated good reliability (Cronbach's α = 0.87). However, the importance of the competencies 'Manager' (78%) and 'Health Advocate' (70%) was undervalued. The assessment tools failed to achieve an acceptable reliability (Cronbach's α = 0.55), and individual tools were predominantly considered unsuitable for assessment. Exceptions were the tools 'in-training evaluation report' (91%) and 'objective structured assessment of technical skill' (82%). No significant differences were found between the residents and the attending surgeons. This study has demonstrated that, 2 years after the reform of the general surgical residency program, residents and attending surgeons in a large training region in the Netherlands do not acknowledge the importance of all CanMEDS competencies and consider the assessment tools generally unsuitable for competence evaluation. Copyright © 2013 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.
Kortüm, K; Reznicek, L; Leicht, S; Ulbig, M; Wolf, A
2013-07-01
The importance and complexity of clinical trials is continuously increasing, especially in innovative specialties like ophthalmology. Therefore an efficient clinical trial site organisational structure is essential. In modern internet times, this can be accomplished by web-based applications. In total, 3 software applications (Vibe on Prem, Sharepoint and open source software) were evaluated in a clinical trial site in ophthalmology. Assessment criteria were set; they were: reliability, easiness of administration, usability, scheduling, task list, knowledge management, operating costs and worldwide availability. Vibe on Prem customised by the local university met the assessment criteria best. Other applications were not as strong. By introducing a web-based application for administrating and organising an ophthalmological trial site, studies can be conducted in a more efficient and reliable manner. Georg Thieme Verlag KG Stuttgart · New York.
Frame-of-Reference Training: Establishing Reliable Assessment of Teaching Effectiveness.
Newman, Lori R; Brodsky, Dara; Jones, Richard N; Schwartzstein, Richard M; Atkins, Katharyn Meredith; Roberts, David H
2016-01-01
Frame-of-reference (FOR) training has been used successfully to teach faculty how to produce accurate and reliable workplace-based ratings when assessing a performance. We engaged 21 Harvard Medical School faculty members in our pilot and implementation studies to determine the effectiveness of using FOR training to assess health professionals' teaching performances. All faculty were novices at rating their peers' teaching effectiveness. Before FOR training, we asked participants to evaluate a recorded lecture using a criterion-based peer assessment of medical lecturing instrument. At the start of training, we discussed the instrument and emphasized its precise behavioral standards. During training, participants practiced rating lectures and received immediate feedback on how well they categorized and scored performances as compared with expert-derived scores of the same lectures. At the conclusion of the training, we asked participants to rate a post-training recorded lecture to determine agreement with the experts' scores. Participants and experts had greater rating agreement for the post-training lecture compared with the pretraining lecture. Through this investigation, we determined that FOR training is a feasible method to teach faculty how to accurately and reliably assess medical lectures. Medical school instructors and continuing education presenters should have the opportunity to be observed and receive feedback from trained peer observers. Our results show that it is possible to use FOR rater training to teach peer observers how to accurately rate medical lectures. The process is time efficient and offers the prospect for assessment and feedback beyond traditional learner evaluation of instruction.
Cigarette dependence questionnaire: development and psychometric testing with male smokers.
Huang, Chih-Ling; Lin, Hsi-Hui; Wang, Hsiu-Hung
2010-10-01
This paper is a report of a study conducted to develop and test a theoretically derived Cigarette Dependence Questionnaire for adult male smokers. Fagerstrom questionnaires have been used worldwide to assess cigarette dependence. However, these assessments lack any theoretical perspective. A theory-based approach is needed to ensure valid assessment. In 2007, an initial pool of 103 Cigarette Dependence Questionnaire items was distributed to 109 adult smokers in Taiwan. Item analysis was conducted to select items for inclusion in the refined scale. The psychometric properties of the Cigarette Dependence Questionnaire were further evaluated 2007-08, when it was administered to 256 respondents and their saliva was collected and analysed for cotinine levels. Criterion validity was established through the Pearson correlation between the scale and saliva cotinine levels. Exploratory factor analysis was used to test construct validity. Reliability was determined with Cronbach's alpha coefficient and a 2-week test-retest coefficient. The selection of 30 items for seven perspectives was based on item analysis. One factor accounting for 44.9% of the variance emerged from the factor analysis. The factor was named as cigarette dependence. Cigarette Dependence Questionnaire scores were statistically significantly correlated with saliva cotinine levels (r = 0.21, P = 0.01). Cronbach's alpha was 0.95 and test-retest reliability using an intra-class correlation was 0.92. The Cigarette Dependence Questionnaire showed sound reliability and validity and could be used by nurses to set up smoking cessation interventions based on assessment of cigarette dependence. © 2010 Blackwell Publishing Ltd.
NASA Astrophysics Data System (ADS)
Martowicz, Adam; Uhl, Tadeusz
2012-10-01
The paper discusses the applicability of a reliability- and performance-based multi-criteria robust design optimization technique for micro-electromechanical systems, considering their technological uncertainties. Nowadays, micro-devices are commonly applied systems, especially in the automotive industry, taking advantage of utilizing both the mechanical structure and electronic control circuit on one board. Their frequent use motivates the elaboration of virtual prototyping tools that can be applied in design optimization with the introduction of technological uncertainties and reliability. The authors present a procedure for the optimization of micro-devices, which is based on the theory of reliability-based robust design optimization. This takes into consideration the performance of a micro-device and its reliability assessed by means of uncertainty analysis. The procedure assumes that, for each checked design configuration, the assessment of uncertainty propagation is performed with the meta-modeling technique. The described procedure is illustrated with an example of the optimization carried out for a finite element model of a micro-mirror. The multi-physics approach allowed the introduction of several physical phenomena to correctly model the electrostatic actuation and the squeezing effect present between electrodes. The optimization was preceded by sensitivity analysis to establish the design and uncertain domains. The genetic algorithms fulfilled the defined optimization task effectively. The best discovered individuals are characterized by a minimized value of the multi-criteria objective function, simultaneously satisfying the constraint on material strength. The restriction of the maximum equivalent stresses was introduced with the conditionally formulated objective function with a penalty component. The yielded results were successfully verified with a global uniform search through the input design domain.
Lee, Robert H; Bott, Marjorie J; Forbes, Sarah; Redford, Linda; Swagerty, Daniel L; Taunton, Roma Lee
2003-01-01
Understanding how quality improvement affects costs is important. Unfortunately, low-cost, reliable ways of measuring direct costs are scarce. This article builds on the principles of process improvement to develop a costing strategy that meets both criteria. Process-based costing has 4 steps: developing a flowchart, estimating resource use, valuing resources, and calculating direct costs. To illustrate the technique, this article uses it to cost the care planning process in 3 long-term care facilities. We conclude that process-based costing is easy to implement; generates reliable, valid data; and allows nursing managers to assess the costs of new or modified processes.
Suthar, Jalpa Vashishth; Patel, Varsha J
2014-01-01
To determine the quality of prescribing in hypertension in primary and secondary health care settings using the Prescription Quality Index (PQI) tool and to assess the reliability of this tool. An observational cross-sectional study was carried out for 6 months in order to assess quality of prescribing of antihypertensive drugs using Prescription Quality Index (PQI) at four primary (PHC) and two secondary (SHC) health care facilities. Patients attending these facilities for at least 3 months were included. Complete medical history and prescriptions received were noted. Total and criteria wise PQI scores were derived for each prescription. Prescriptions were categorized as poor (score of ≤31), medium (score 32-33) and high quality (score 34-43) based on PQI total score. Psychometric analysis using factor analysis was carried out to assess reliability and validity. Total 73 hypertensive patients were included. Mean age was 61.2 ± 11 years with 35 (48%) patients above 65 years of age. Total PQI score was 26 ± 11. There was a significant difference in PQI score between PHC and SHC (P < 0.05) Out of 73 prescriptions, 43 (59%) were of poor quality with PQI score <31. The value of Cronbach's α for the entire 22 criteria of PQI was 0.71 suggesting good reliability of PQI tool in our setting. Based on PQI scores, quality of prescribing in hypertensive patients was poor, somewhat better in primary as compared to secondary health care facility. PQI is reliable for measuring prescribing quality in hypertension in Indian set up.
Hales, M; Biros, E; Reznik, J E
2015-01-01
Since 1982, the International Standards for Neurological Classification of Spinal Cord Injury (ISNCSCI) has been used to classify sensation of spinal cord injury (SCI) through pinprick and light touch scores. The absence of proprioception, pain, and temperature within this scale creates questions about its validity and accuracy. To assess whether the sensory component of the ISNCSCI represents a reliable and valid measure of classification of SCI. A systematic review of studies examining the reliability and validity of the sensory component of the ISNCSCI published between 1982 and February 2013 was conducted. The electronic databases MEDLINE via Ovid, CINAHL, PEDro, and Scopus were searched for relevant articles. A secondary search of reference lists was also completed. Chosen articles were assessed according to the Oxford Centre for Evidence-Based Medicine hierarchy of evidence and critically appraised using the McMasters Critical Review Form. A statistical analysis was conducted to investigate the variability of the results given by reliability studies. Twelve studies were identified: 9 reviewed reliability and 3 reviewed validity. All studies demonstrated low levels of evidence and moderate critical appraisal scores. The majority of the articles (~67%; 6/9) assessing the reliability suggested that training was positively associated with better posttest results. The results of the 3 studies that assessed the validity of the ISNCSCI scale were confounding. Due to the low to moderate quality of the current literature, the sensory component of the ISNCSCI requires further revision and investigation if it is to be a useful tool in clinical trials.
Hales, M.; Biros, E.
2015-01-01
Background: Since 1982, the International Standards for Neurological Classification of Spinal Cord Injury (ISNCSCI) has been used to classify sensation of spinal cord injury (SCI) through pinprick and light touch scores. The absence of proprioception, pain, and temperature within this scale creates questions about its validity and accuracy. Objectives: To assess whether the sensory component of the ISNCSCI represents a reliable and valid measure of classification of SCI. Methods: A systematic review of studies examining the reliability and validity of the sensory component of the ISNCSCI published between 1982 and February 2013 was conducted. The electronic databases MEDLINE via Ovid, CINAHL, PEDro, and Scopus were searched for relevant articles. A secondary search of reference lists was also completed. Chosen articles were assessed according to the Oxford Centre for Evidence-Based Medicine hierarchy of evidence and critically appraised using the McMasters Critical Review Form. A statistical analysis was conducted to investigate the variability of the results given by reliability studies. Results: Twelve studies were identified: 9 reviewed reliability and 3 reviewed validity. All studies demonstrated low levels of evidence and moderate critical appraisal scores. The majority of the articles (~67%; 6/9) assessing the reliability suggested that training was positively associated with better posttest results. The results of the 3 studies that assessed the validity of the ISNCSCI scale were confounding. Conclusions: Due to the low to moderate quality of the current literature, the sensory component of the ISNCSCI requires further revision and investigation if it is to be a useful tool in clinical trials. PMID:26363591
Reliability of the Adult Myopathy Assessment Tool in Individuals with Myositis
Harris-Love, Michael O.; Joe, Galen; Davenport, Todd E.; Koziol, Deloris; Rose, Kristen Abbett; Shrader, Joseph A.; Vasconcelos, Olavo M.; McElroy, Beverly; Dalakas, Marinos C.
2015-01-01
Objective The Adult Myopathy Assessment Tool (AMAT) is a 13-item performance-based battery developed to assess functional status and muscle endurance. The purpose of this study was to determine the intrarater and interrater reliability of the AMAT in adults with myosits. Methods Nineteen raters (13 physical therapists and 6 physicians) scored videotaped recordings of patients with myositis performing the AMAT for a total of 114 tests and 1,482 item observations per session. Raters rescored the AMAT test and item observations during a follow up session (19 ±6 days between scoring sessions). All raters completed a single, self-directed, electronic training module prior to the initial scoring session. Results Intrarater and interrater reliability correlation coefficients were .94 or greater for the AMAT Functional Subscale, Endurance Subscale, and Total score (all p < 0.02 for Ho:ρ ≤ 0.75). All AMAT items had satisfactory intrarater agreement (Kappa statistics with Fleiss-Cohen weights, Kw = .57-1.00). Interrater agreement was acceptable for each AMAT item (K = .56-.89) except the sit up (K = .16). The standard error of measurement and 95% confidence interval range for the AMAT Total scores did not exceed 2 points across all observations (AMAT Total score range = 0-45). Conclusions The AMAT is a reliable, domain-specific assessment of functional status and muscle endurance for adult subjects with myositis. Results of this study suggest that physicians and physical therapists may reliably score the AMAT following a single training session. The AMAT Functional Subscale, Endurance Subscale, and Total score exhibit interrater and intrarater reliability suitable for clinical and research use. PMID:25201624
Development of confidence limits by pivotal functions for estimating software reliability
NASA Technical Reports Server (NTRS)
Dotson, Kelly J.
1987-01-01
The utility of pivotal functions is established for assessing software reliability. Based on the Moranda geometric de-eutrophication model of reliability growth, confidence limits for attained reliability and prediction limits for the time to the next failure are derived using a pivotal function approach. Asymptotic approximations to the confidence and prediction limits are considered and are shown to be inadequate in cases where only a few bugs are found in the software. Departures from the assumed exponentially distributed interfailure times in the model are also investigated. The effect of these departures is discussed relative to restricting the use of the Moranda model.
da Costa, Bruno R; Beckett, Brooke; Diaz, Alison; Resta, Nina M; Johnston, Bradley C; Egger, Matthias; Jüni, Peter; Armijo-Olivo, Susan
2017-03-03
The Cochrane risk of bias tool is commonly criticized for having a low reliability. We aimed to investigate whether training of raters, with objective and standardized instructions on how to assess risk of bias, can improve the reliability of the Cochrane risk of bias tool. In this pilot study, four raters inexperienced in risk of bias assessment were randomly allocated to minimal or intensive standardized training for risk of bias assessment of randomized trials of physical therapy treatments for patients with knee osteoarthritis pain. Two raters were experienced risk of bias assessors who served as reference. The primary outcome of our study was between-group reliability, defined as the agreement of the risk of bias assessments of inexperienced raters with the reference assessments of experienced raters. Consensus-based assessments were used for this purpose. The secondary outcome was within-group reliability, defined as the agreement of assessments within pairs of inexperienced raters. We calculated the chance-corrected weighted Kappa to quantify agreement within and between groups of raters for each of the domains of the risk of bias tool. A total of 56 trials were included in our analysis. The Kappa for the agreement of inexperienced raters with reference across items of the risk of bias tool ranged from 0.10 to 0.81 for the minimal training group and from 0.41 to 0.90 for the standardized training group. The Kappa values for the agreement within pairs of inexperienced raters across the items of the risk of bias tool ranged from 0 to 0.38 for the minimal training group and from 0.93 to 1 for the standardized training group. Between-group differences in Kappa for the agreement of inexperienced raters with reference always favored the standardized training group and was most pronounced for incomplete outcome data (difference in Kappa 0.52, p < 0.001) and allocation concealment (difference in Kappa 0.30, p = 0.004). Intensive, standardized training on risk of bias assessment may significantly improve the reliability of the Cochrane risk of bias tool.
Reliability of System Identification Techniques to Assess Standing Balance in Healthy Elderly
Maier, Andrea B.; Aarts, Ronald G. K. M.; van Gerven, Joop M. A.; Arendzen, J. Hans; Schouten, Alfred C.; Meskers, Carel G. M.; van der Kooij, Herman
2016-01-01
Objectives System identification techniques have the potential to assess the contribution of the underlying systems involved in standing balance by applying well-known disturbances. We investigated the reliability of standing balance parameters obtained with multivariate closed loop system identification techniques. Methods In twelve healthy elderly balance tests were performed twice a day during three days. Body sway was measured during two minutes of standing with eyes closed and the Balance test Room (BalRoom) was used to apply four disturbances simultaneously: two sensory disturbances, to the proprioceptive and the visual system, and two mechanical disturbances applied at the leg and trunk segment. Using system identification techniques, sensitivity functions of the sensory disturbances and the neuromuscular controller were estimated. Based on the generalizability theory (G theory), systematic errors and sources of variability were assessed using linear mixed models and reliability was assessed by computing indexes of dependability (ID), standard error of measurement (SEM) and minimal detectable change (MDC). Results A systematic error was found between the first and second trial in the sensitivity functions. No systematic error was found in the neuromuscular controller and body sway. The reliability of 15 of 25 parameters and body sway were moderate to excellent when the results of two trials on three days were averaged. To reach an excellent reliability on one day in 7 out of 25 parameters, it was predicted that at least seven trials must be averaged. Conclusion This study shows that system identification techniques are a promising method to assess the underlying systems involved in standing balance in elderly. However, most of the parameters do not appear to be reliable unless a large number of trials are collected across multiple days. To reach an excellent reliability in one third of the parameters, a training session for participants is needed and at least seven trials of two minutes must be performed on one day. PMID:26953694
Cheng, Shu-Fen; Rose, Susan
2009-01-01
This study investigated the technical adequacy of curriculum-based measures of written expression (CBM-W) in terms of writing prompts and scoring methods for deaf and hard-of-hearing students. Twenty-two students at the secondary school-level completed 3-min essays within two weeks, which were scored for nine existing and alternative curriculum-based measurement (CBM) scoring methods. The technical features of the nine scoring methods were examined for interrater reliability, alternate-form reliability, and criterion-related validity. The existing CBM scoring method--number of correct minus incorrect word sequences--yielded the highest reliability and validity coefficients. The findings from this study support the use of the CBM-W as a reliable and valid tool for assessing general writing proficiency with secondary students who are deaf or hard of hearing. The CBM alternative scoring methods that may serve as additional indicators of written expression include correct subject-verb agreements, correct clauses, and correct morphemes.
Lima, Maria José Barbosa de; Portela, Margareth Crisóstomo
2010-08-01
This study presents an instrument, the health-related quality of life (HRQOL) profile for independent elderly, to measure the health-related quality of life of the functionally independent elderly assisted in the outpatient setting, based on the adaptation of four validated scales: Short-Form Health Survey (SF-36), Duke-UNC Health Profile (DUHP), Sickness Impact Profile (SIP), and Nottingham Health Profile (NHP). The study also evaluates the instrument's reliability based on its use by two different observers with a 15-day interval. The instrument includes five dimensions (health perception, symptoms, physical function, psychological function, and social function) and 45 items. Reliability evaluation of the QUASI instrument was based on interviews with 142 elderly outpatients in the city of Rio de Janeiro, Brazil. Prevalence-adjusted kappa statistic was used to assess all 45 items. Correlation was also calculated between overall scores and scores on individual dimensions. In the reliability evaluation, 39 of the 45 items showed prevalence-adjusted kappa greater than 0.60.
Hou, Xianlong; Hodges, Ben R; Feng, Dongyu; Liu, Qixiao
2017-03-15
As oil transport increasing in the Texas bays, greater risks of ship collisions will become a challenge, yielding oil spill accidents as a consequence. To minimize the ecological damage and optimize rapid response, emergency managers need to be informed with how fast and where oil will spread as soon as possible after a spill. The state-of-the-art operational oil spill forecast modeling system improves the oil spill response into a new stage. However uncertainty due to predicted data inputs often elicits compromise on the reliability of the forecast result, leading to misdirection in contingency planning. Thus understanding the forecast uncertainty and reliability become significant. In this paper, Monte Carlo simulation is implemented to provide parameters to generate forecast probability maps. The oil spill forecast uncertainty is thus quantified by comparing the forecast probability map and the associated hindcast simulation. A HyosPy-based simple statistic model is developed to assess the reliability of an oil spill forecast in term of belief degree. The technologies developed in this study create a prototype for uncertainty and reliability analysis in numerical oil spill forecast modeling system, providing emergency managers to improve the capability of real time operational oil spill response and impact assessment. Copyright © 2017 Elsevier Ltd. All rights reserved.
The Reliability of Randomly Generated Math Curriculum-Based Measurements
ERIC Educational Resources Information Center
Strait, Gerald G.; Smith, Bradley H.; Pender, Carolyn; Malone, Patrick S.; Roberts, Jarod; Hall, John D.
2015-01-01
"Curriculum-Based Measurement" (CBM) is a direct method of academic assessment used to screen and evaluate students' skills and monitor their responses to academic instruction and intervention. Interventioncentral.org offers a math worksheet generator at no cost that creates randomly generated "math curriculum-based measures"…
Validation of the Physical Activity Questionnaire for Older Children (PAQ-C) among Chinese Children.
Wang, Jing Jing; Baranowski, Tom; Lau, Wc Patrick; Chen, Tzu An; Pitkethly, Amanda Jane
2016-03-01
This study initially validates the Chinese version of the Physical Activity Questionnaire for Older Children (PAQ-C), which has been identified as a potentially valid instrument to assess moderate-to-vigorous physical activity (MVPA) in children among diverse racial groups. The psychometric properties of the PAQ-C with 742 Hong Kong Chinese children were assessed with the scale's internal consistency, reliability, test-retest reliability, confirmatory factory analysis (CFA) in the overall sample, and multistep invariance tests across gender groups as well as convergent validity with body mass index (BMI), and an accelerometry-based MVPA. The Cronbach alpha coefficient (α=0.79), composite reliability value (ρ=0.81), and the intraclass correlation coefficient (α=0.82) indicate the satisfactory reliability of the PAQ-C score. The CFA indicated data fit a single factor model, suggesting that the PAQ-C measures only one construct, on MVPA over the previous 7 days. The multiple-group CFAs suggested that the factor loadings and variances and covariances of the PAQ-C measurement model were invariant across gender groups. The PAQ-C score was related to accelerometry-based MVPA (r=0.33) and inversely related to BMI (r=-0.18). This study demonstrates the reliability and validity of the PAQ-C in Chinese children. Copyright © 2016 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.
A Reliable Method to Measure Lip Height Using Photogrammetry in Unilateral Cleft Lip Patients.
van der Zeeuw, Frederique; Murabit, Amera; Volcano, Johnny; Torensma, Bart; Patel, Brijesh; Hay, Norman; Thorburn, Guy; Morris, Paul; Sommerlad, Brian; Gnarra, Maria; van der Horst, Chantal; Kangesu, Loshan
2015-09-01
There is still no reliable tool to determine the outcome of the repaired unilateral cleft lip (UCL). The aim of this study was therefore to develop an accurate, reliable tool to measure vertical lip height from photographs. The authors measured the vertical height of the cutaneous and vermilion parts of the lip in 72 anterior-posterior view photographs of 17 patients with repairs to a UCL. Points on the lip's white roll and vermillion were marked on both the cleft and the noncleft sides on each image. Two new concepts were tested. First, photographs were standardized using the horizontal (medial to lateral) eye fissure width (EFW) for calibration. Second, the authors tested the interpupillary line (IPL) and the alar base line (ABL) for their reliability as horizontal lines of reference. Measurements were taken by 2 independent researchers, at 2 different time points each. Overall 2304 data points were obtained and analyzed. Results showed that the method was very effective in measuring the height of the lip on the cleft side with the noncleft side. When using the IPL, inter- and intra-rater reliability was 0.99 to 1.0, with the ABL it varied from 0.91 to 0.99 with one exception at 0.84. The IPL was easier to define because in some subjects the overhanging nasal tip obscured the alar base and gave more consistent measurements possibly because the reconstructed alar base was sometimes indistinct. However, measurements from the IPL can only give the percentage difference between the left and right sides of the lip, whereas those from the ABL can also give exact measurements. Patient examples were given that show how the measurements correlate with clinical assessment. The authors propose this method of photogrammetry with the innovative use of the IPL as a reliable horizontal plane and use of the EFW for calibration as a useful and reliable tool to assess the outcome of UCL repair.
Reliability and validity of current physical examination techniques of the foot and ankle.
Wrobel, James S; Armstrong, David G
2008-01-01
This literature review was undertaken to evaluate the reliability and validity of the orthopedic, neurologic, and vascular examination of the foot and ankle. We searched PubMed-the US National Library of Medicine's database of biomedical citations-and abstracts for relevant publications from 1966 to 2006. We also searched the bibliographies of the retrieved articles. We identified 35 articles to review. For discussion purposes, we used reliability interpretation guidelines proposed by others. For the kappa statistic that calculates reliability for dichotomous (eg, yes or no) measures, reliability was defined as moderate (0.4-0.6), substantial (0.6-0.8), and outstanding (> 0.8). For the intraclass correlation coefficient that calculates reliability for continuous (eg, degrees of motion) measures, reliability was defined as good (> 0.75), moderate (0.5-0.75), and poor (< 0.5). Intraclass correlations, based on the various examinations performed, varied widely. The range was from 0.08 to 0.98, depending on the examination performed. Concurrent and predictive validity ranged from poor to good. Although hundreds of articles exist describing various methods of lower-extremity assessment, few rigorously assess the measurement properties. This information can be used both by the discerning clinician in the art of clinical examination and by the scientist in the measurement properties of reproducibility and validity.
Reliability-Based Life Assessment of Stirling Convertor Heater Head
NASA Technical Reports Server (NTRS)
Shah, Ashwin R.; Halford, Gary R.; Korovaichuk, Igor
2004-01-01
Onboard radioisotope power systems being developed and planned for NASA's deep-space missions require reliable design lifetimes of up to 14 yr. The structurally critical heater head of the high-efficiency Stirling power convertor has undergone extensive computational analysis of operating temperatures, stresses, and creep resistance of the thin-walled Inconel 718 bill of material. A preliminary assessment of the effect of uncertainties in the material behavior was also performed. Creep failure resistance of the thin-walled heater head could show variation due to small deviations in the manufactured thickness and in uncertainties in operating temperature and pressure. Durability prediction and reliability of the heater head are affected by these deviations from nominal design conditions. Therefore, it is important to include the effects of these uncertainties in predicting the probability of survival of the heater head under mission loads. Furthermore, it may be possible for the heater head to experience rare incidences of small temperature excursions of short duration. These rare incidences would affect the creep strain rate and, therefore, the life. This paper addresses the effects of such rare incidences on the reliability. In addition, the sensitivities of variables affecting the reliability are quantified, and guidelines developed to improve the reliability are outlined. Heater head reliability is being quantified with data from NASA Glenn Research Center's accelerated benchmark testing program.
A Passive System Reliability Analysis for a Station Blackout
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brunett, Acacia; Bucknor, Matthew; Grabaskas, David
2015-05-03
The latest iterations of advanced reactor designs have included increased reliance on passive safety systems to maintain plant integrity during unplanned sequences. While these systems are advantageous in reducing the reliance on human intervention and availability of power, the phenomenological foundations on which these systems are built require a novel approach to a reliability assessment. Passive systems possess the unique ability to fail functionally without failing physically, a result of their explicit dependency on existing boundary conditions that drive their operating mode and capacity. Argonne National Laboratory is performing ongoing analyses that demonstrate various methodologies for the characterization of passivemore » system reliability within a probabilistic framework. Two reliability analysis techniques are utilized in this work. The first approach, the Reliability Method for Passive Systems, provides a mechanistic technique employing deterministic models and conventional static event trees. The second approach, a simulation-based technique, utilizes discrete dynamic event trees to treat time- dependent phenomena during scenario evolution. For this demonstration analysis, both reliability assessment techniques are used to analyze an extended station blackout in a pool-type sodium fast reactor (SFR) coupled with a reactor cavity cooling system (RCCS). This work demonstrates the entire process of a passive system reliability analysis, including identification of important parameters and failure metrics, treatment of uncertainties and analysis of results.« less
Chaudhary, Richa; Grover, Chander; Bhattacharya, S N; Sharma, Arun
2017-01-01
The assessment of dermatology undergraduates is being done through computer assisted objective structured clinical examination at our institution for the last 4 years. We attempted to compare objective structured clinical examination (OSCE) and computer assisted objective structured clinical examination (CA-OSCE) as assessment tools. To assess the relative effectiveness of CA-OSCE and OSCE as assessment tools for undergraduate dermatology trainees. Students underwent CA-OSCE as well as OSCE-based evaluation of equal weightage as an end of posting assessment. The attendance as well as the marks in both the examination formats were meticulously recorded and statistically analyzed using SPSS version 20.0. Intercooled Stata V9.0 was used to assess the reliability and internal consistency of the examinations conducted. Feedback from both students and examiners was also recorded. The mean attendance for the study group was 77% ± 12.0%. The average score on CA- OSCE and OSCE was 47.4% ± 19.8% and 53.5% ± 18%, respectively. These scores showed a mutually positive correlation, with Spearman's coefficient being 0.593. Spearman's rank correlation coefficient between attendance scores and assessment score was 0.485 for OSCE and 0.451 for CA-OSCE. The Cronbach's alpha coefficient for all the tests ranged from 0.76 to 0.87 indicating high reliability. The comparison was based on a single batch of 139 students. Such an evaluation on more students in larger number of batches over successive years could help throw more light on the subject. Computer assisted objective structured clinical examination was found to be a valid, reliable and effective format for dermatology assessment, being rated as the preferred format by examiners.
Invited review: Animal-based indicators for on-farm welfare assessment for dairy goats.
Battini, M; Vieira, A; Barbieri, S; Ajuda, I; Stilwell, G; Mattiello, S
2014-11-01
This paper reviews animal-based welfare indicators to develop a valid, reliable, and feasible on-farm welfare assessment protocol for dairy goats. The indicators were considered in the light of the 4 accepted principles (good feeding, good housing, good health, appropriate behavior) subdivided into 12 criteria developed by the European Welfare Quality program. We will only examine the practical indicators to be used on-farm, excluding those requiring the use of specific instruments or laboratory analysis and those that are recorded at the slaughterhouse. Body condition score, hair coat condition, and queuing at the feed barrier or at the drinker seem the most promising indicators for the assessment of the "good feeding" principle. As to "good housing," some indicators were considered promising for assessing "comfort around resting" (e.g., resting in contact with a wall) or "thermal comfort" (e.g., panting score for the detection of heat stress and shivering score for the detection of cold stress). Several indicators related to "good health," such as lameness, claw overgrowth, presence of external abscesses, and hair coat condition, were identified. As to the "appropriate behavior" principle, different criteria have been identified: agonistic behavior is largely used as the "expression of social behavior" criterion, but it is often not feasible for on-farm assessment. Latency to first contact and the avoidance distance test can be used as criteria for assessing the quality of the human-animal relationship. Qualitative behavior assessment seems to be a promising indicator for addressing the "positive emotional state" criterion. Promising indicators were identified for most of the considered criteria; however, no valid indicator has been identified for "expression of other behaviors." Interobserver reliability has rarely been assessed and warrants further attention; in contrast, short-term intraobserver reliability is frequently assessed and some studies consider mid- and long-term reliability. The feasibility of most of the reviewed indicators in commercial farms still needs to be carefully evaluated, as several studies were performed under experimental conditions. Our review highlights some aspects of goat welfare that have been widely studied, but some indicators need to be investigated further and drafted before being included in a valid, reliable, and feasible welfare assessment protocol. The indicators selected and examined may be an invaluable starting point for the development of an on-farm welfare assessment protocol for dairy goats. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Soja, Stacie L.; Pandharipande, Pratik P.; Fleming, Sloan B.; Cotton, Bryan A.; Miller, Leanna R.; Weaver, Stefanija G.; Lee, Byron T.; Ely, E. Wesley
2013-01-01
Objective To implement delirium monitoring, test reliability, and monitor compliance of performing the Confusion Assessment Method for the Intensive Care Unit (CAM-ICU) in trauma patients. Design and setting Prospective, observational study in a Level 1 trauma unit of a tertiary care, university-based medical center. Patients Acutely injured patients admitted to the trauma unit from February 1, 2006–April 16, 2006. Measurements and Results Following web-based teaching modules and group in-services, bedside nurses evaluated patients daily for depth of sedation with the Richmond Agitation-Sedation Scale (RASS) and for the presence of delirium with the CAM-ICU. On randomly assigned days over a 10-week period, evaluations by nursing staff were followed by evaluations by an expert evaluator of the RASS and the CAM-ICU, in order to assess compliance and reliability of the CAM-ICU in trauma patients. Following the audit period, the nurses completed a post-implementation survey. One thousand and eleven random CAM-ICU assessments were performed by the expert evaluator, within 1 hour of the bedside nurses’ assessments. Nurses completed the CAM-ICU assessments in 84% (849 of 1011) of evaluations. Overall agreement (κ) between nurses and the expert evaluator was 0.77 (0.721, 0.822; p<0.0001). In TBI patients κ was 0.75 (0.667, 0.829; p<0.0001), while in mechanically-ventilated patients κ was 0.62 (0.534, 0.704; p<0.0001). The survey revealed nurses were confident in performing the CAM-ICU, realized the importance of delirium, and were satisfied with the training they received. The survey also acknowledged obstacles to implementation including nursing time and failure of physicians/surgeons to address treatment approaches for delirium. Conclusions The CAM-ICU can be successfully implemented in a university-based trauma unit with high compliance and reliability. Quality improvement projects seeking to implement delirium monitoring would be wise to address potential pitfalls including time complaints and the negative impact of physician indifference regarding this form of organ dysfunction. PMID:18297270
Satoh, Masayuki; Mori, Chika; Matsuda, Kana; Ueda, Yukito; Tabei, Ken-ichi; Kida, Hirotaka; Tomimoto, Hidekazu
2016-01-01
Background/Aims Constructional apraxia (CA) is usually diagnosed by having patients draw figures; however, the reported assessments only evaluate the drawn figure. We designed a new assessment battery for CA (the Mie Constructional Apraxia Scale, MCAS) which includes both the shape and drawing process, and investigated its utility against other assessment methods. Methods We designed the MCAS, and evaluated inter- and intrarater reliability. We also investigated the sensitivity, specificity, and positive and negative predictive values in dementia patients, and compared MCAS assessment with other reported batteries in the same subjects. Results Moderate interrater reliability was shown for speech therapists with limited experience. Moderate to substantial intrarater reliability was shown several weeks after initial assessment. When cutoff scores and times were set at 2/3 points and 39/40 s, sensitivity and specificity were 77.1 and 70.4%, respectively, with positive and negative predictive values of 80.0 and 66.7%, respectively. Dementia patients had significantly worse scores and times for Necker cube drawing than an elderly control group on the MCAS, and on other assessments. Conclusions We conclude that the MCAS, which includes both the assessment of the drawn Necker cube shape and the drawing process, is useful for detecting even mild CA. PMID:27790241
Childress, M O; Fulkerson, C M; Lahrman, S A; Weng, H-Y
2016-08-01
The purpose of this study was to assess reliability of lymph node measurements between and within raters in dogs with nodal lymphomas. Three raters measured lymph nodes from 20 dogs twice prior to and once after administering chemotherapy. Sum tumour volume (TV) and sum longest diameter (LD) of all lymph nodes at each time point, and the percent change in measurements following chemotherapy, were calculated for each dog. Inter- and intra-rater reliability were assessed with the intraclass correlation coefficient (ICC). ICC for inter-rater sum TV and sum LD prior to chemotherapy were 0.86 and 0.80, respectively. ICC for inter-rater sum TV and sum LD after chemotherapy were 0.95 and 0.91, respectively. ICC for percent change in sum TV and sum LD were 0.96 and 0.94, respectively. ICC for intra-rater reliability ranged from 0.90 to 0.98 for each rater. Inter- and intra-rater reliability in measurements among the three raters was good to excellent. © 2014 John Wiley & Sons Ltd.
Do aggressive signals evolve towards higher reliability or lower costs of assessment?
Ręk, P
2014-12-01
It has been suggested that the evolution of signals must be a wasteful process for the signaller, aimed at the maximization of signal honesty. However, the reliability of communication depends not only on the costs paid by signallers but also on the costs paid by receivers during assessment, and less attention has been given to the interaction between these two types of costs during the evolution of signalling systems. A signaller and receiver may accept some level of signal dishonesty by choosing signals that are cheaper in terms of assessment but that are stabilized with less reliable mechanisms. I studied the potential trade-off between signal reliability and the costs of signal assessment in the corncrake (Crex crex). I found that the birds prefer signals that are less costly regarding assessment rather than more reliable. Despite the fact that the fundamental frequency of calls was a strong predictor of male size, it was ignored by receivers unless they could directly compare signal variants. My data revealed a response advantage of costly signals when comparison between calls differing with fundamental frequencies is fast and straightforward, whereas cheap signalling is preferred in natural conditions. These data might improve our understanding of the influence of receivers on signal design because they support the hypothesis that fully honest signalling systems may be prone to dishonesty based on the effects of receiver costs and be replaced by signals that are cheaper in production and reception but more susceptible to cheating. © 2014 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2014 European Society For Evolutionary Biology.
Byram, Jessica N; Seifert, Mark F; Brooks, William S; Fraser-Cotlin, Laura; Thorp, Laura E; Williams, James M; Wilson, Adam B
2017-03-01
With integrated curricula and multidisciplinary assessments becoming more prevalent in medical education, there is a continued need for educational research to explore the advantages, consequences, and challenges of integration practices. This retrospective analysis investigated the number of items needed to reliably assess anatomical knowledge in the context of gross anatomy and histology. A generalizability analysis was conducted on gross anatomy and histology written and practical examination items that were administered in a discipline-based format at Indiana University School of Medicine and in an integrated fashion at the University of Alabama School of Medicine and Rush University Medical College. Examination items were analyzed using a partially nested design s×(i:o) in which items were nested within occasions (i:o) and crossed with students (s). A reliability standard of 0.80 was used to determine the minimum number of items needed across examinations (occasions) to make reliable and informed decisions about students' competence in anatomical knowledge. Decision study plots are presented to demonstrate how the number of items per examination influences the reliability of each administered assessment. Using the example of a curriculum that assesses gross anatomy knowledge over five summative written and practical examinations, the results of the decision study estimated that 30 and 25 items would be needed on each written and practical examination to reach a reliability of 0.80, respectively. This study is particularly relevant to educators who may question whether the amount of anatomy content assessed in multidisciplinary evaluations is sufficient for making judgments about the anatomical aptitude of students. Anat Sci Educ 10: 109-119. © 2016 American Association of Anatomists. © 2016 American Association of Anatomists.
Barnett, Lisa M; Ridgers, Nicola D; Zask, Avigdor; Salmon, Jo
2015-01-01
To determine reliability and face validity of an instrument to assess young children's perceived fundamental movement skill competence. Validation and reliability study. A pictorial instrument based on the Test Gross Motor Development-2 assessed perceived locomotor (six skills) and object control (six skills) competence using the format and item structure from the physical competence subscale of the Pictorial Scale of Perceived Competence and Acceptance for Young Children. Sample 1 completed object control items in May (n=32) and locomotor items in October 2012 (n=23) at two time points seven days apart. Children were asked at the end of the test-retest their understanding of what was happening in each picture to determine face validity. Sample 2 (n=58) completed 12 items in November 2012 on a single occasion to test internal reliability only. Sample 1 children were aged 5-7 years (M=6.0, SD=0.8) at object control assessment and 5-8 years at locomotor assessment (M=6.5, SD=0.9). Sample 2 children were aged 6-8 years (M=7.2, SD=0.73). Intra-class correlations assessed in Sample 1 children were excellent for object control (intra-class correlation=0.78), locomotor (intra-class correlation=0.82) and all 12 skills (intra-class correlations=0.83). Face validity was acceptable. Internal consistency was adequate in both samples for each subscale and all 12 skills (alpha range 0.60-0.81). This study has provided preliminary evidence for instrument reliability and face validity. This enables future alignment between the measurement of perceived and actual fundamental movement skill competence in young children. Crown Copyright © 2014. Published by Elsevier Ltd. All rights reserved.
Nicholson, Patricia; Griffin, Patrick; Gillis, Shelley; Wu, Margaret; Dunning, Trisha
2013-09-01
Concern about the process of identifying underlying competencies that contribute to effective nursing performance has been debated with a lack of consensus surrounding an approved measurement instrument for assessing clinical performance. Although a number of methodologies are noted in the development of competency-based assessment measures, these studies are not without criticism. The primary aim of the study was to develop and validate a Performance Based Scoring Rubric, which included both analytical and holistic scales. The aim included examining the validity and reliability of the rubric, which was designed to measure clinical competencies in the operating theatre. The fieldwork observations of 32 nurse educators and preceptors assessing the performance of 95 instrument nurses in the operating theatre were used in the calibration of the rubric. The Rasch model, a particular model among Item Response Models, was used in the calibration of each item in the rubric in an attempt at improving the measurement properties of the scale. This is done by establishing the 'fit' of the data to the conditions demanded by the Rasch model. Acceptable reliability estimates, specifically a high Cronbach's alpha reliability coefficient (0.940), as well as empirical support for construct and criterion validity for the rubric were achieved. Calibration of the Performance Based Scoring Rubric using Rasch model revealed that the fit statistics for most items were acceptable. The use of the Rasch model offers a number of features in developing and refining healthcare competency-based assessments, improving confidence in measuring clinical performance. The Rasch model was shown to be useful in developing and validating a competency-based assessment for measuring the competence of the instrument nurse in the operating theatre with implications for use in other areas of nursing practice. Crown Copyright © 2012. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Buczyński, P.
2018-05-01
This article presents a new approach to reliability assessment of the road structure in which the base layer will be constructed in the process of cold deep recycling with foamed bitumen. In order to properly assess the reliability of the structure with the recycled base, it is necessary to determine the distribution of stress and strain in typical pavement layer systems. The true stress and strain values were established for particular structural layers using the complex modulus (E*) determined based on the master curves. The complex modulus was determined by the direct tension-compression test on cylindrical specimens (DTC-CY) at five temperatures (-7°C, 5°C, 13°C, 25°C, 40°C) and six loading times (0.1 Hz, 0.3 Hz, 1 Hz, 3 Hz, 10 Hz, 20 Hz) in accordance with EN 12697-26 in the linear viscoelasticity (LVE) range for small strains ranging from 25 to 50 με. The master curves of the complex modulus were constructed using the Richards model for the mixtures typically incorporated in structural layers, i.e., SMA11, AC16W, AC22P and MCAS. The values of the modulus characterizing particular layers were determined with temperature distribution in the structure taken into account, when the surface temperature was 40°C. The stress distribution was established for those calculation models. The stress values were used to evaluate the fatigue life under controlled stress conditions (IT-FT). This evaluation, with the controlled stress corresponding to that in the structure, facilitated the quality assessment of the rehabilitated recycled base course. Results showed that the recycled base mixtures having the indirect tensile strength (ITSDRY) similar to the stress in the structure under analysis needed an additional fatigue life evaluation in the indirect tensile test ITT. This approach to the recycled base quality assessment will allow eliminating the damage induced by overloading.
ERIC Educational Resources Information Center
Wesolowski, Brian C.; Amend, Ross M.; Barnstead, Thomas S.; Edwards, Andrew S.; Everhart, Matthew; Goins, Quentin R.; Grogan, Robert J., III; Herceg, Amanda M.; Jenkins, S. Ira; Johns, Paul M.; McCarver, Christopher J.; Schaps, Robin E.; Sorrell, Gary W.; Williams, Jonathan D.
2017-01-01
The purpose of this study was to describe the development of a valid and reliable rubric to assess secondary-level solo instrumental music performance based on principles of invariant measurement. The research questions that guided this study included (1) What is the psychometric quality (i.e., validity, reliability, and precision) of a scale…
ERIC Educational Resources Information Center
Sampson, James P., Jr.; Peterson, Gary W.; Reardon, Robert C.; Lenz, Janet G.
2000-01-01
Responds to Jepsen's (this issue [2000]) commentary on Sampson et al.'s theory-based approach to using readiness assessment to improve career services. Three topics are included: the reliability and utility of using readiness assessment measures; verbal ability and the use of cognitive information-processing theory in practice; and the potential…
ERIC Educational Resources Information Center
Lakshmipathy, K.
2015-01-01
The objectives of the present study were to 1) assess student attitudes to physiology, 2) evaluate student opinions about the influence of an objective structured practical examination (OSPE) on competence, and 3) assess the validity and reliability of an indigenously designed feedback questionnaire. A structured questionnaire containing 16 item…
The Infant Motor Profile: A Standardized and Qualitative Method to Assess Motor Behaviour in Infancy
ERIC Educational Resources Information Center
Heineman, Kirsten R.; Bos, Arend F.; Hadders-Algra, Mijna
2008-01-01
A reliable and valid instrument to assess neuromotor condition in infancy is a prerequisite for early detection of developmental motor disorders. We developed a video-based assessment of motor behaviour, the Infant Motor Profile (IMP), to evaluate motor abilities, movement variability, ability to select motor strategies, movement symmetry, and…
[Psychometric properties and diagnostic value of 'lexical screening for aphasias'].
Pena-Chavez, R; Martinez-Jimenez, L; Lopez-Espinoza, M
2014-09-16
INTRODUCTION. Language assessment in persons with brain injury makes it possible to know whether they require language rehabilitation or not. Given the importance of a precise evaluation, assessment instruments must be valid and reliable, so as to avoid mistaken and subjective diagnoses. AIM. To validate 'lexical screening for aphasias' in a sample of 58 Chilean individuals. SUBJECTS AND METHODS. A screening-type language test, lasting 20 minutes and based on the lexical processing model devised by Patterson and Shewell (1987), was constructed. The sample was made up of two groups containing 29 aphasic subjects and 29 control subjects from different health centres in the regions of Biobio and Maule, Chile. Their ages ranged between 24 and 79 years and had between 0 and 17 years' schooling. Tests were carried out to determine discriminating validity, concurrent validity with the aphasia disorder assessment battery, reliability, sensitivity and specificity. RESULTS. The statistical analysis showed a high discriminating validity (p < 0.001), an acceptable mean concurrent validity with aphasia disorder assessment battery (rs = 0.65), high mean reliability (alpha = 0.87), moderate mean sensitivity (69%) and high mean specificity (86%). CONCLUSION. 'Lexical screening for aphasias' is valid and reliable for assessing language in persons with aphasias; it is sensitive for detecting aphasic subjects and is specific for precluding language disorders in persons with normal language abilities.
Assessing child and adolescent pragmatic language competencies: toward evidence-based assessments.
Russell, Robert L; Grizzle, Kenneth L
2008-06-01
Using language appropriately and effectively in social contexts requires pragmatic language competencies (PLCs). Increasingly, deficits in PLCs are linked to child and adolescent disorders, including autism spectrum, externalizing, and internalizing disorders. As the role of PLCs expands in diagnosis and treatment of developmental psychopathology, psychologists and educators will need to appraise and select clinical and research PLC instruments for use in assessments and/or studies. To assist in this appraisal, 24 PLC instruments, containing 1,082 items, are assessed by addressing four questions: (1) Can PLC domains targeted by assessment items be reliably identified?, (2) What are the core PLC domains that emerge across the 24 instruments?, (3) Do PLC questionnaires and tests assess similar PLC domains?, and (4) Do the instruments achieve content, structural, diagnostic, and ecological validity? Results indicate that test and questionnaire items can be reliably categorized into PLC domains, that PLC domains featured in questionnaires and tests significantly differ, and that PLC instruments need empirical confirmation of their dimensional structure, content validity across all developmental age bands, and ecological validity. Progress in building a better evidence base for PLC assessments should be a priority in future research.
West, Robert; Evans, Adam; Michie, Susan
2011-12-01
To develop a reliable coding scheme for components of group-based behavioral support for smoking cessation, to establish the frequency of inclusion in English Stop-Smoking Service (SSS) treatment manuals of specific components, and to investigate the associations between inclusion of behavior change techniques (BCTs) and service success rates. A taxonomy of BCTs specific to group-based behavioral support was developed and reliability of use assessed. All English SSSs (n = 145) were contacted to request their group-support treatment manuals. BCTs included in the manuals were identified using this taxonomy. Associations between inclusion of specific BCTs and short-term (4-week) self-reported quit outcomes were assessed. Fourteen group-support BCTs were identified with >90% agreement between coders. One hundred and seven services responded to the request for group-support manuals of which 30 had suitable documents. On average, 7 BCTs were included in each manual. Two were positively associated with 4-week quit rates: "communicate group member identities" and a "betting game" (a financial deposit that is lost if a stop-smoking "buddy" relapses). It is possible to reliably code group-specific BCTs for smoking cessation. Fourteen such techniques are present in guideline documents of which 2 appear to be associated with higher short-term self-reported quit rates when included in treatment manuals of English SSSs.
Graffigna, Guendalina; Barello, Serena; Bonanomi, Andrea; Lozza, Edoardo
2015-01-01
Beyond the rhetorical call for increasing patients' engagement, policy makers recognize the urgency to have an evidence-based measure of patients' engagement and capture its effect when planning and implementing initiatives aimed at sustaining the engagement of consumers in their health. In this paper, authors describe the Patient Health Engagement Scale (PHE-scale), a measure of patient engagement that is grounded in rigorous conceptualization and appropriate psychometric methods. The scale was developed based on our previous conceptualization of patient engagement (the PHE-model). In particular, the items of the PHE-scale were developed based on the findings from the literature review and from interviews with chronic patients. Initial psychometric analysis was performed to pilot test a preliminary version of the items. The items were then refined and administered to a national sample of chronic patients (N = 382) to assess the measure's psychometric performance. A final phase of test-retest reliability was performed. The analysis showed that the PHE Scale has good psychometric properties with good correlation with concurrent measures and solid reliability. Having a valid and reliable measure to assess patient engagement is the first step in understanding patient engagement and its role in health care quality, outcomes, and cost containment. The PHE Scale shows a promising clinical relevance, indicating that it can be used to tailor intervention and assess changes after patient engagement interventions. PMID:25870566
Graffigna, Guendalina; Barello, Serena; Bonanomi, Andrea; Lozza, Edoardo
2015-01-01
Beyond the rhetorical call for increasing patients' engagement, policy makers recognize the urgency to have an evidence-based measure of patients' engagement and capture its effect when planning and implementing initiatives aimed at sustaining the engagement of consumers in their health. In this paper, authors describe the Patient Health Engagement Scale (PHE-scale), a measure of patient engagement that is grounded in rigorous conceptualization and appropriate psychometric methods. The scale was developed based on our previous conceptualization of patient engagement (the PHE-model). In particular, the items of the PHE-scale were developed based on the findings from the literature review and from interviews with chronic patients. Initial psychometric analysis was performed to pilot test a preliminary version of the items. The items were then refined and administered to a national sample of chronic patients (N = 382) to assess the measure's psychometric performance. A final phase of test-retest reliability was performed. The analysis showed that the PHE Scale has good psychometric properties with good correlation with concurrent measures and solid reliability. Having a valid and reliable measure to assess patient engagement is the first step in understanding patient engagement and its role in health care quality, outcomes, and cost containment. The PHE Scale shows a promising clinical relevance, indicating that it can be used to tailor intervention and assess changes after patient engagement interventions.
Newton, Robert L; Thomson, Jessica L; Rau, Kristi K; Ragusa, Shelly A; Sample, Alicia D; Singleton, Nakisha N; Anton, Stephen D; Webber, Larry S; Williamson, Donald A
2011-01-01
To evaluate the implementation of intervention components of the Louisiana Health study, which was a multicomponent childhood obesity prevention program conducted in rural schools. Content analysis. Process evaluation assessed implementation in classrooms, gym classes, and cafeterias. Classroom teachers (n = 232), physical education teachers (n = 53), food service managers (n = 33), and trained observers (n = 9). Five process evaluation measures were created: Physical Education Questionnaire (PEQ), Intervention Questionnaire (IQ), Food Service Manager Questionnaire (FSMQ), Classroom Observation (CO), and School Nutrition Environment Observation (SNEO). Interrater reliability and internal consistency were assessed on all measures. Analysis of variance and χ(2) were used to compare differences across study groups on questionnaires and observations. The PEQ and one subscale from the FSMQ were eliminated because their reliability coefficients fell below acceptable standards. The subscale internal consistencies for the IQ, FSMQ, CO, and SNEO (all Cronbach α > .60) were acceptable. After the initial 4 months of intervention, there was evidence that the Louisiana Health intervention was being implemented as it was designed. In summary, four process evaluation measures were found to be sufficiently reliable and valid for assessing the delivery of various aspects of a school-based obesity prevention program. These process measures could be modified to evaluate the delivery of other similar school-based interventions.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-01-15
... that is based on rigorous scientifically based research methods to assess the effectiveness of a...) Relies on measurements or observational methods that provide reliable and valid data across evaluators... of innovative, cohesive models that are based on research and have demonstrated that they effectively...
Evaluation Criteria for Micro-CAI: A Psychometric Approach
Wallace, Douglas; Slichter, Mark; Bolwell, Christine
1985-01-01
The increased use of microcomputer-based instructional programs has resulted in a greater need for third-party evaluation of the software. This in turn has prompted the development of micro-CAI evaluation tools. The present project sought to develop a prototype instrument to assess the impact of CAI program presentation characteristics on students. Data analysis and scale construction was conducted using standard item reliability analyses and factor analytic techniques. Adequate subscale reliabilities and factor structures were found, suggesting that a psychometric approach to CAI evaluation may possess some merit. Efforts to assess the utility of the resultant instrument are currently underway.
Fatigue reliability of deck structures subjected to correlated crack growth
NASA Astrophysics Data System (ADS)
Feng, G. Q.; Garbatov, Y.; Guedes Soares, C.
2013-12-01
The objective of this work is to analyse fatigue reliability of deck structures subjected to correlated crack growth. The stress intensity factors of the correlated cracks are obtained by finite element analysis and based on which the geometry correction functions are derived. The Monte Carlo simulations are applied to predict the statistical descriptors of correlated cracks based on the Paris-Erdogan equation. A probabilistic model of crack growth as a function of time is used to analyse the fatigue reliability of deck structures accounting for the crack propagation correlation. A deck structure is modelled as a series system of stiffened panels, where a stiffened panel is regarded as a parallel system composed of plates and are longitudinal. It has been proven that the method developed here can be conveniently applied to perform the fatigue reliability assessment of structures subjected to correlated crack growth.
Beyhun, Nazim Ercument; Can, Gamze; Tiryaki, Ahmet; Karakullukcu, Serdar; Bulut, Bekir; Yesilbas, Sehbal; Kavgaci, Halil; Topbas, Murat
2016-01-01
Background Needs based biopsychosocial distress instrument for cancer patients (CANDI) is a scale based on needs arising due to the effects of cancer. Objectives The aim of this research was to determine the reliability and validity of the CANDI scale in the Turkish language. Patients and Methods The study was performed with the participation of 172 cancer patients aged 18 and over. Factor analysis (principal components analysis) was used to assess construct validity. Criterion validities were tested by computing Spearman correlation between CANDI and hospital anxiety depression scale (HADS), and brief symptom inventory (BSI) (convergent validity) and quality of life scales (FACT-G) (divergent validity). Test-retest reliabilities and internal consistencies were measured with intraclass correlation (ICC) and Cronbach-α. Results A three-factor solution (emotional, physical and social) was found with factor analysis. Internal reliability (α = 0.94) and test-retest reliability (ICC = 0.87) were significantly high. Correlations between CANDI and HADS (rs = 0.67), and BSI (rs = 0.69) and FACT-G (rs = -0.76) were moderate and significant in the expected direction. Conclusions CANDI is a valid and reliable scale in cancer patients with a three-factor structure (emotional, physical and social) in the Turkish language. PMID:27621931
Bruyn, George A W; Hanova, Petra; Iagnocco, Annamaria; d'Agostino, Maria-Antonietta; Möller, Ingrid; Terslev, Lene; Backhaus, Marina; Balint, Peter V; Filippucci, Emilio; Baudoin, Paul; van Vugt, Richard; Pineda, Carlos; Wakefield, Richard; Garrido, Jesus; Pecha, Ondrej; Naredo, Esperanza
2014-11-01
To develop the first ultrasound scoring system of tendon damage in rheumatoid arthritis (RA) and assess its intraobserver and interobserver reliability. We conducted a Delphi study on ultrasound-defined tendon damage and ultrasound scoring system of tendon damage in RA among 35 international rheumatologists with experience in musculoskeletal ultrasound. Twelve patients with RA were included and assessed twice by 12 rheumatologists-sonographers. Ultrasound examination for tendon damage in B mode of five wrist extensor compartments (extensor carpi radialis brevis and longus; extensor pollicis longus; extensor digitorum communis; extensor digiti minimi; extensor carpi ulnaris) and one ankle tendon (tibialis posterior) was performed blindly, independently and bilaterally in each patient. Intraobserver and interobserver reliability were calculated by κ coefficients. A three-grade semiquantitative scoring system was agreed for scoring tendon damage in B mode. The mean intraobserver reliability for tendon damage scoring was excellent (κ value 0.91). The mean interobserver reliability assessment showed good κ values (κ value 0.75). The most reliable were the extensor digiti minimi, the extensor carpi ulnaris, and the tibialis posterior tendons. An ultrasound reference image atlas of tenosynovitis and tendon damage was also developed. Ultrasound is a reproducible tool for evaluating tendon damage in RA. This study strongly supports a new reliable ultrasound scoring system for tendon damage. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Hoben, Matthias; Estabrooks, Carole A.; Squires, Janet E.; Behrens, Johann
2016-01-01
We translated the Canadian residential long term care versions of the Alberta Context Tool (ACT) and the Conceptual Research Utilization (CRU) Scale into German, to study the association between organizational context factors and research utilization in German nursing homes. The rigorous translation process was based on best practice guidelines for tool translation, and we previously published methods and results of this process in two papers. Both instruments are self-report questionnaires used with care providers working in nursing homes. The aim of this study was to assess the factor structure, reliability, and measurement invariance (MI) between care provider groups responding to these instruments. In a stratified random sample of 38 nursing homes in one German region (Metropolregion Rhein-Neckar), we collected questionnaires from 273 care aides, 196 regulated nurses, 152 allied health providers, 6 quality improvement specialists, 129 clinical leaders, and 65 nursing students. The factor structure was assessed using confirmatory factor models. The first model included all 10 ACT concepts. We also decided a priori to run two separate models for the scale-based and the count-based ACT concepts as suggested by the instrument developers. The fourth model included the five CRU Scale items. Reliability scores were calculated based on the parameters of the best-fitting factor models. Multiple-group confirmatory factor models were used to assess MI between provider groups. Rather than the hypothesized ten-factor structure of the ACT, confirmatory factor models suggested 13 factors. The one-factor solution of the CRU Scale was confirmed. The reliability was acceptable (>0.7 in the entire sample and in all provider groups) for 10 of 13 ACT concepts, and high (0.90–0.96) for the CRU Scale. We could demonstrate partial strong MI for both ACT models and partial strict MI for the CRU Scale. Our results suggest that the scores of the German ACT and the CRU Scale for nursing homes are acceptably reliable and valid. However, as the ACT lacked strict MI, observed variables (or scale scores based on them) cannot be compared between provider groups. Rather, group comparisons should be based on latent variable models, which consider the different residual variances of each group. PMID:27656156
Applications of computerized adaptive testing (CAT) to the assessment of headache impact.
Ware, John E; Kosinski, Mark; Bjorner, Jakob B; Bayliss, Martha S; Batenhorst, Alice; Dahlöf, Carl G H; Tepper, Stewart; Dowson, Andrew
2003-12-01
To evaluate the feasibility of computerized adaptive testing (CAT) and the reliability and validity of CAT-based estimates of headache impact scores in comparison with 'static' surveys. Responses to the 54-item Headache Impact Test (HIT) were re-analyzed for recent headache sufferers (n = 1016) who completed telephone interviews during the National Survey of Headache Impact (NSHI). Item response theory (IRT) calibrations and the computerized dynamic health assessment (DYNHA) software were used to simulate CAT assessments by selecting the most informative items for each person and estimating impact scores according to pre-set precision standards (CAT-HIT). Results were compared with IRT estimates based on all items (total-HIT), computerized 6-item dynamic estimates (CAT-HIT-6), and a developmental version of a 'static' 6-item form (HIT-6-D). Analyses focused on: respondent burden (survey length and administration time), score distributions ('ceiling' and 'floor' effects), reliability and standard errors, and clinical validity (diagnosis, level of severity). A random sample (n = 245) was re-assessed to test responsiveness. A second study (n = 1103) compared actual CAT surveys and an improved 'static' HIT-6 among current headache sufferers sampled on the Internet. Respondents completed measures from the first study and the generic SF-8 Health Survey; some (n = 540) were re-tested on the Internet after 2 weeks. In the first study, simulated CAT-HIT and total-HIT scores were highly correlated (r = 0.92) without 'ceiling' or 'floor' effects and with a substantial reduction (90.8%) in respondent burden. Six of the 54 items accounted for the great majority of item administrations (3603/5028, 77.6%). CAT-HIT reliability estimates were very high (0.975-0.992) in the range where 95% of respondents scored, and relative validity (RV) coefficients were high for diagnosis (RV = 0.87) and severity (RV = 0.89); patient-level classifications were accurate 91.3% for a diagnosis of migraine. For all three criteria of change, CAT-HIT scores were more responsive than all other measures. In the second study, estimates of respondent burden, item usage, reliability and clinical validity were replicated. The test-retest reliability of CAT-HIT was 0.79 and alternate forms coefficients ranged from 0.85 to 0.91. All correlations with the generic SF-8 were negative. CAT-based administrations of headache impact items achieved very large reductions in respondent burden without compromising validity for purposes of patient screening or monitoring changes in headache impact over time. IRT models and CAT-based dynamic health assessments warrant testing among patients with other conditions.
ERIC Educational Resources Information Center
Keller, Lisa A.; Clauser, Brian E.; Swanson, David B.
2010-01-01
In recent years, demand for performance assessments has continued to grow. However, performance assessments are notorious for lower reliability, and in particular, low reliability resulting from task specificity. Since reliability analyses typically treat the performance tasks as randomly sampled from an infinite universe of tasks, these estimates…
Guo, Yi; Bian, Jiang; Leavitt, Trevor; Vincent, Heather K; Vander Zalm, Lindsey; Teurlings, Tyler L; Smith, Megan D; Modave, François
2017-03-07
Regular physical activity can not only help with weight management, but also lower cardiovascular risks, cancer rates, and chronic disease burden. Yet, only approximately 20% of Americans currently meet the physical activity guidelines recommended by the US Department of Health and Human Services. With the rapid development of mobile technologies, mobile apps have the potential to improve participation rates in exercise programs, particularly if they are evidence-based and are of sufficient content quality. The goal of this study was to develop and test an instrument, which was designed to score the content quality of exercise program apps with respect to the exercise guidelines set forth by the American College of Sports Medicine (ACSM). We conducted two focus groups (N=14) to elicit input for developing a preliminary 27-item scoring instruments based on the ACSM exercise prescription guidelines. Three reviewers who were no sports medicine experts independently scored 28 exercise program apps using the instrument. Inter- and intra-rater reliability was assessed among the 3 reviewers. An expert reviewer, a Fellow of the ACSM, also scored the 28 apps to create criterion scores. Criterion validity was assessed by comparing nonexpert reviewers' scores to the criterion scores. Overall, inter- and intra-rater reliability was high with most coefficients being greater than .7. Inter-rater reliability coefficients ranged from .59 to .99, and intra-rater reliability coefficients ranged from .47 to 1.00. All reliability coefficients were statistically significant. Criterion validity was found to be excellent, with the weighted kappa statistics ranging from .67 to .99, indicating a substantial agreement between the scores of expert and nonexpert reviewers. Finally, all apps scored poorly against the ACSM exercise prescription guidelines. None of the apps received a score greater than 35, out of a possible maximal score of 70. We have developed and presented valid and reliable scoring instruments for exercise program apps. Our instrument may be useful for consumers and health care providers who are looking for apps that provide safe, progressive general exercise programs for health and fitness. ©Yi Guo, Jiang Bian, Trevor Leavitt, Heather K Vincent, Lindsey Vander Zalm, Tyler L Teurlings, Megan D Smith, François Modave. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 07.03.2017.
How reliable are Functional Movement Screening scores? A systematic review of rater reliability.
Moran, Robert W; Schneiders, Anthony G; Major, Katherine M; Sullivan, S John
2016-05-01
Several physical assessment protocols to identify intrinsic risk factors for injury aetiology related to movement quality have been described. The Functional Movement Screen (FMS) is a standardised, field-expedient test battery intended to assess movement quality and has been used clinically in preparticipation screening and in sports injury research. To critically appraise and summarise research investigating the reliability of scores obtained using the FMS battery. Systematic literature review. Systematic search of Google Scholar, Scopus (including ScienceDirect and PubMed), EBSCO (including Academic Search Complete, AMED, CINAHL, Health Source: Nursing/Academic Edition), MEDLINE and SPORTDiscus. Studies meeting eligibility criteria were assessed by 2 reviewers for risk of bias using the Quality Appraisal of Reliability Studies checklist. Overall quality of evidence was determined using van Tulder's levels of evidence approach. 12 studies were appraised. Overall, there was a 'moderate' level of evidence in favour of 'acceptable' (intraclass correlation coefficient ≥0.6) inter-rater and intra-rater reliability for composite scores derived from live scoring. For inter-rater reliability of composite scores derived from video recordings there was 'conflicting' evidence, and 'limited' evidence for intra-rater reliability. For inter-rater reliability based on live scoring of individual subtests there was 'moderate' evidence of 'acceptable' reliability (κ≥0.4) for 4 subtests (Deep Squat, Shoulder Mobility, Active Straight-leg Raise, Trunk Stability Push-up) and 'conflicting' evidence for the remaining 3 (Hurdle Step, In-line Lunge, Rotary Stability). This review found 'moderate' evidence that raters can achieve acceptable levels of inter-rater and intra-rater reliability of composite FMS scores when using live ratings. Overall, there were few high-quality studies, and the quality of several studies was impacted by poor study reporting particularly in relation to rater blinding. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Comprehensive clinical assessment in community setting: applicability of the MDS-HC.
Morris, J N; Fries, B E; Steel, K; Ikegami, N; Bernabei, R; Carpenter, G I; Gilgen, R; Hirdes, J P; Topinková, E
1997-08-01
To describe the results of an international trial of the home care version of the MDS assessment and problem identification system (the MDS-HC), including reliability estimates, a comparison of MDS-HC reliabilities with reliabilities of the same items in the MDS 2.0 nursing home assessment instrument, and an examination of the types of problems found in home care clients using the MDS-HC. Independent, dual assessment of clients of home-care agencies by trained clinicians using a draft of the MDS-HC, with additional descriptive data regarding problem profiles for home care clients. Reliability data from dual assessments of 241 randomly selected clients of home care agencies in five countries, all of whom volunteered to test the MDS-HC. Also included are an expanded sample of 780 home care assessments from these countries and 187 dually assessed residents from 21 nursing homes in the United States. The array of MDS-HC assessment items included measures in the following areas: personal items, cognitive patterns, communication/hearing, vision, mood and behavior, social functioning, informal support services, physical functioning, continence, disease diagnoses health conditions and preventive health measures, nutrition/hydration, dental status, skin condition, environmental assessment, service utilization, and medications. Forty-seven percent of the functional, health status, social environment, and service items in the MDS-HC were taken from the MDS 2.0 for nursing homes. For this item set, it is estimated that the average weighted Kappa is .74 for the MDS-HC and .75 for the MDS 2.0. Similarly, high reliability values were found for items newly introduced in the MDS-HC (weighted Kappa = .70). Descriptive findings also characterize the problems of home care clients, with subanalyses within cognitive performance levels. Findings indicate that the core set of items in the MDS 2.0 work equally well in community and nursing home settings. New items are highly reliable. In tandem, these instruments can be used within the international community, assisting and planning care for older adults within a broad spectrum of service settings, including nursing homes and home care programs. With this community-based, second-generation problem and care plan-driven assessment instrument, disability assessment can be performed consistently across the world.
Klein, Britt; Meyer, Denny; Austin, David William; Abbott, Jo-Anne M
2015-01-01
Background Internet-based assessment has the potential to assist with the diagnosis of mental health disorders and overcome the barriers associated with traditional services (eg, cost, stigma, distance). Further to existing online screening programs available, there is an opportunity to deliver more comprehensive and accurate diagnostic tools to supplement the assessment and treatment of mental health disorders. Objective The aim was to evaluate the diagnostic criterion validity and test-retest reliability of the electronic Psychological Assessment System (e-PASS), an online, self-report, multidisorder, clinical assessment and referral system. Methods Participants were 616 adults residing in Australia, recruited online, and representing prospective e-PASS users. Following e-PASS completion, 158 participants underwent a telephone-administered structured clinical interview and 39 participants repeated the e-PASS within 25 days of initial completion. Results With structured clinical interview results serving as the gold standard, diagnostic agreement with the e-PASS varied considerably from fair (eg, generalized anxiety disorder: κ=.37) to strong (eg, panic disorder: κ=.62). Although the e-PASS’ sensitivity also varied (0.43-0.86) the specificity was generally high (0.68-1.00). The e-PASS sensitivity generally improved when reducing the e-PASS threshold to a subclinical result. Test-retest reliability ranged from moderate (eg, specific phobia: κ=.54) to substantial (eg, bulimia nervosa: κ=.87). Conclusions The e-PASS produces reliable diagnostic results and performs generally well in excluding mental disorders, although at the expense of sensitivity. For screening purposes, the e-PASS subclinical result generally appears better than a clinical result as a diagnostic indicator. Further development and evaluation is needed to support the use of online diagnostic assessment programs for mental disorders. Trial Registration Australian and New Zealand Clinical Trials Registry ACTRN121611000704998; http://www.anzctr.org.au/trial_view.aspx?ID=336143 (Archived by WebCite at http://www.webcitation.org/618r3wvOG). PMID:26392066
Caries-based treatment need assessment by clinical dental nurses in Anguilla, British West Indies.
Adewakun, Adenike Adejoke; Amaechi, Bennett Tochukwu
2005-09-01
To determine the ability of dental nurses in Anguilla to assess treatment need following training in WHO criteria. Sixty-six randomly selected schoolchildren aged 6, 9, 12, 14, 15 and 17 years. Point prevalence study involving three different groups of schoolchildren [n = 20 (C0), 23 (D1), 23 (D2)] and four plaster casts comprising 52 extracted teeth (T0). Tooth- and person-based inter and intra-examiner agreement. The only three government dental nurses in Anguilla were trained and calibrated by a benchmark dentist in June 2000 using WHO criteria. Tooth-based Treatment Need and person-based Treatment Urgency were assessed during 466 evaluations involving 1,733 teeth. Examiner agreement levels were compared during two calibration exercises (T0, C0) and two duplicate examinations (D1 and D2). The treatment components were classified as preventive (diet modification, prophylaxis, OHI, sealants); restorative (restorations, pulp care and crowns); and rehabilitative (tooth removal). All scores presented are Kappa scores. Substantial agreement was obtained at T0 (0.614-0.764), and almost perfect agreement at C0(0.832-0.872), D1(0.917-0.954) and D2(0.966-0.977). Almost perfect reliability occurred at C0(0.963-0.991) and D1(0.971-0.992) while perfect reliability was attained by all examiners at D2(1.0). Substantial and almost perfect agreement was obtained for all treatment modalities irrespective of caries prevalence and severity. Agreement levels increased as the survey progressed. Perfect agreement was obtained for all categories of treatment urgency. Dental nurses in Anguilla can validly and reliably assess treatment need provided training is adequate, of realistic duration and they are involved in all aspects of the exercise.
Filippou, Georgios; Scirè, Carlo A; Damjanov, Nemanja; Adinolfi, Antonella; Carrara, Greta; Picerno, Valentina; Toscano, Carmela; Bruyn, George A; D'Agostino, Maria Antonietta; Delle Sedie, Andrea; Filippucci, Emilio; Gutierrez, Marwin; Micu, Mihaela; Möller, Ingrid; Naredo, Esperanza; Pineda, Carlos; Porta, Francesco; Schmidt, Wolfgang A; Terslev, Lene; Vlad, Violeta; Zufferey, Pascal; Iagnocco, Annamaria
2017-11-01
To define the ultrasonographic characteristics of calcium pyrophosphate crystal (CPP) deposits in joints and periarticular tissues and to evaluate the intra- and interobserver reliability of expert ultrasonographers in the assessment of CPP deposition disease (CPPD) according to the new definitions. After a systematic literature review, a Delphi survey was circulated among a group of expert ultrasonographers, who were members of the CPPD Ultrasound (US) Outcome Measures in Rheumatology (OMERACT) subtask force, to obtain definitions of the US characteristics of CPPD at the level of fibrocartilage (FC), hyaline cartilage (HC), tendon, and synovial fluid (SF). Subsequently, the reliability of US in assessing CPPD at knee and wrist levels according to the agreed definitions was tested in static images and in patients with CPPD. Cohen's κ was used for statistical analysis. HC and FC of the knee yielded the highest interobserver κ values among all the structures examined, in both the Web-based (0.73 for HC and 0.58 for FC) and patient-based exercises (0.55 for the HC and 0.64 for the FC). Kappa values for the other structures were lower, ranging from 0.28 in tendons to 0.50 in SF in the static exercise and from 0.09 (proximal patellar tendon) to 0.27 (triangular FC of the wrist) in the patient-based exercise. The new OMERACT definitions for the US identification of CPPD proved to be reliable at the level of the HC and FC of the knee. Further studies are needed to better define the US characteristics of CPPD and optimize the scanning technique in other anatomical sites.
Gagné, Myriam; Boulet, Louis-Philippe; Pérez, Norma; Moisan, Jocelyne
2018-04-30
To systematically identify the measurement properties of patient-reported outcome instruments (PROs) that evaluate adherence to inhaled maintenance medication in adults with asthma. We conducted a systematic review of six databases. Two reviewers independently included studies on the measurement properties of PROs that evaluated adherence in asthmatic participants aged ≥18 years. Based on the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN), the reviewers (1) extracted data on internal consistency, reliability, measurement error, content validity, structural validity, hypotheses testing, cross-cultural validity, criterion validity, and responsiveness; (2) assessed the methodological quality of the included studies; (3) assessed the quality of the measurement properties (positive or negative); and (4) summarised the level of evidence (limited, moderate, or strong). We screened 6,068 records and included 15 studies (14 PROs). No studies evaluated measurement error or responsiveness. Based on methodological and measurement property quality assessments, we found limited positive evidence of: (a) internal consistency of the Adherence Questionnaire, Refined Medication Adherence Reason Scale (MAR-Scale), Medication Adherence Report Scale for Asthma (MARS-A), and Test of the Adherence to Inhalers (TAI); (b) reliability of the TAI; and (c) structural validity of the Adherence Questionnaire, MAR-Scale, MARS-A, and TAI. We also found limited negative evidence of: (d) hypotheses testing of Adherence Questionnaire; (e) reliability of the MARS-A; and (f) criterion validity of the MARS-A and TAI. Our results highlighted the need to conduct further high-quality studies that will positively evaluate the reliability, validity, and responsiveness of the available PROs. This article is protected by copyright. All rights reserved.
Kim, Sara; Brock, Doug; Prouty, Carolyn D; Odegard, Peggy Soule; Shannon, Sarah E; Robins, Lynne; Boggs, Jim G; Clark, Fiona J; Gallagher, Thomas
2011-01-01
Multiple-choice exams are not well suited for assessing communication skills. Standardized patient assessments are costly and patient and peer assessments are often biased. Web-based assessment using video content offers the possibility of reliable, valid, and cost-efficient means for measuring complex communication skills, including interprofessional communication. We report development of the Web-based Team-Oriented Medical Error Communication Assessment Tool, which uses videotaped cases for assessing skills in error disclosure and team communication. Steps in development included (a) defining communication behaviors, (b) creating scenarios, (c) developing scripts, (d) filming video with professional actors, and (e) writing assessment questions targeting team communication during planning and error disclosure. Using valid data from 78 participants in the intervention group, coefficient alpha estimates of internal consistency were calculated based on the Likert-scale questions and ranged from α=.79 to α=.89 for each set of 7 Likert-type discussion/planning items and from α=.70 to α=.86 for each set of 8 Likert-type disclosure items. The preliminary test-retest Pearson correlation based on the scores of the intervention group was r=.59 for discussion/planning and r=.25 for error disclosure sections, respectively. Content validity was established through reliance on empirically driven published principles of effective disclosure as well as integration of expert views across all aspects of the development process. In addition, data from 122 medicine and surgical physicians and nurses showed high ratings for video quality (4.3 of 5.0), acting (4.3), and case content (4.5). Web assessment of communication skills appears promising. Physicians and nurses across specialties respond favorably to the tool.
Development, scoring, and reliability of the Microscale Audit of Pedestrian Streetscapes (MAPS)
2013-01-01
Background Streetscape (microscale) features of the built environment can influence people’s perceptions of their neighborhoods’ suitability for physical activity. Many microscale audit tools have been developed, but few have published systematic scoring methods. We present the development, scoring, and reliability of the Microscale Audit of Pedestrian Streetscapes (MAPS) tool and its theoretically-based subscales. Methods MAPS was based on prior instruments and was developed to assess details of streetscapes considered relevant for physical activity. MAPS sections (route, segments, crossings, and cul-de-sacs) were scored by two independent raters for reliability analyses. There were 290 route pairs, 516 segment pairs, 319 crossing pairs, and 53 cul-de-sac pairs in the reliability sample. Individual inter-rater item reliability analyses were computed using Kappa, intra-class correlation coefficient (ICC), and percent agreement. A conceptual framework for subscale creation was developed using theory, expert consensus, and policy relevance. Items were grouped into subscales, and subscales were analyzed for inter-rater reliability at tiered levels of aggregation. Results There were 160 items included in the subscales (out of 201 items total). Of those included in the subscales, 80 items (50.0%) had good/excellent reliability, 41 items (25.6%) had moderate reliability, and 18 items (11.3%) had low reliability, with limited variability in the remaining 21 items (13.1%). Seventeen of the 20 route section subscales, valence (positive/negative) scores, and overall scores (85.0%) demonstrated good/excellent reliability and 3 demonstrated moderate reliability. Of the 16 segment subscales, valence scores, and overall scores, 12 (75.0%) demonstrated good/excellent reliability, three demonstrated moderate reliability, and one demonstrated poor reliability. Of the 8 crossing subscales, valence scores, and overall scores, 6 (75.0%) demonstrated good/excellent reliability, and 2 demonstrated moderate reliability. The cul-de-sac subscale demonstrated good/excellent reliability. Conclusions MAPS items and subscales predominantly demonstrated moderate to excellent reliability. The subscales and scoring system represent a theoretically based framework for using these complex microscale data and may be applicable to other similar instruments. PMID:23621947
Gutiérrez-Vilahú, Lourdes; Massó-Ortigosa, Núria; Rey-Abella, Ferran; Costa-Tutusaus, Lluís; Guerra-Balic, Myriam
2016-05-01
People with Down syndrome present skeletal abnormalities in their feet that can be analyzed by commonly used gold standard indices (the Hernández-Corvo index, the Chippaux-Smirak index, the Staheli arch index, and the Clarke angle) based on footprint measurements. The use of Photoshop CS5 software (Adobe Systems Software Ireland Ltd, Dublin, Ireland) to measure footprints has been validated in the general population. The present study aimed to assess the reliability and validity of this footprint assessment technique in the population with Down syndrome. Using optical podography and photography, 44 footprints from 22 patients with Down syndrome (11 men [mean ± SD age, 23.82 ± 3.12 years] and 11 women [mean ± SD age, 24.82 ± 6.81 years]) were recorded in a static bipedal standing position. A blinded observer performed the measurements using a validated manual method three times during the 4-month study, with 2 months between measurements. Test-retest was used to check the reliability of the Photoshop CS5 software measurements. Validity and reliability were obtained by intraclass correlation coefficient (ICC). The reliability test for all of the indices showed very good values for the Photoshop CS5 method (ICC, 0.982-0.995). Validity testing also found no differences between the techniques (ICC, 0.988-0.999). The Photoshop CS5 software method is reliable and valid for the study of footprints in young people with Down syndrome.
Elaboration and Validation of the Medication Prescription Safety Checklist 1
Pires, Aline de Oliveira Meireles; Ferreira, Maria Beatriz Guimarães; do Nascimento, Kleiton Gonçalves; Felix, Márcia Marques dos Santos; Pires, Patrícia da Silva; Barbosa, Maria Helena
2017-01-01
ABSTRACT Objective: to elaborate and validate a checklist to identify compliance with the recommendations for the structure of medication prescriptions, based on the Protocol of the Ministry of Health and the Brazilian Health Surveillance Agency. Method: methodological research, conducted through the validation and reliability analysis process, using a sample of 27 electronic prescriptions. Results: the analyses confirmed the content validity and reliability of the tool. The content validity, obtained by expert assessment, was considered satisfactory as it covered items that represent the compliance with the recommendations regarding the structure of the medication prescriptions. The reliability, assessed through interrater agreement, was excellent (ICC=1.00) and showed perfect agreement (K=1.00). Conclusion: the Medication Prescription Safety Checklist showed to be a valid and reliable tool for the group studied. We hope that this study can contribute to the prevention of adverse events, as well as to the improvement of care quality and safety in medication use. PMID:28793128
Wagner, Flávia; Martel, Michelle M; Cogo-Moreira, Hugo; Maia, Carlos Renato Moreira; Pan, Pedro Mario; Rohde, Luis Augusto; Salum, Giovanni Abrahão
2016-01-01
The best structural model for attention-deficit/hyperactivity disorder (ADHD) symptoms remains a matter of debate. The objective of this study is to test the fit and factor reliability of competing models of the dimensional structure of ADHD symptoms in a sample of randomly selected and high-risk children and pre-adolescents from Brazil. Our sample comprised 2512 children aged 6-12 years from 57 schools in Brazil. The ADHD symptoms were assessed using parent report on the development and well-being assessment (DAWBA). Fit indexes from confirmatory factor analysis were used to test unidimensional, correlated, and bifactor models of ADHD, the latter including "g" ADHD and "s" symptom domain factors. Reliability of all models was measured with omega coefficients. A bifactor model with one general factor and three specific factors (inattention, hyperactivity, impulsivity) exhibited the best fit to the data, according to fit indices, as well as the most consistent factor loadings. However, based on omega reliability statistics, the specific inattention, hyperactivity, and impulsivity dimensions provided very little reliable information after accounting for the reliable general ADHD factor. Our study presents some psychometric evidence that ADHD specific ("s") factors might be unreliable after taking common ("g" factor) variance into account. These results are in accordance with the lack of longitudinal stability among subtypes, the absence of dimension-specific molecular genetic findings and non-specific effects of treatment strategies. Therefore, researchers and clinicians might most effectively rely on the "g" ADHD to characterize ADHD dimensional phenotype, based on currently available symptom items.
Learmonth, Yvonne C; Dlugonski, Deirdre D; Pilutti, Lara A; Sandroff, Brian M; Motl, Robert W
2013-11-01
Assessing walking impairment in those with multiple sclerosis (MS) is common, however little is known about the reliability, precision and clinically important change of walking outcomes. The purpose of this study was to determine the reliability, precision and clinically important change of the Timed 25-Foot Walk (T25FW), Six-Minute Walk (6MW), Multiple Sclerosis Walking Scale-12 (MSWS-12) and accelerometry. Data were collected from 82 persons with MS at two time points, six months apart. Analyses were undertaken for the whole sample and stratified based on disability level and usage of walking aids. Intraclass correlation coefficient (ICC) analyses established reliability: standard error of measurement (SEM) and coefficient of variation (CV) determined precision; and minimal detectable change (MDC) defined clinically important change. All outcome measures were reliable with precision and MDC varying between measures in the whole sample: T25FW: ICC=0.991; SEM=1 s; CV=6.2%; MDC=2.7 s (36%), 6MW: ICC=0.959; SEM=32 m; CV=6.2%; MDC=88 m (20%), MSWS-12: ICC=0.927; SEM=8; CV=27%; MDC=22 (53%), accelerometry counts/day: ICC=0.883; SEM=28450; CV=17%; MDC=78860 (52%), accelerometry steps/day: ICC=0.907; SEM=726; CV=16%; MDC=2011 (45%). Variation in these estimates was seen based on disability level and walking aid. The reliability of these outcomes is good and falls within acceptable ranges. Precision and clinically important change estimates provide guidelines for interpreting these outcomes in clinical and research settings.
NASA Astrophysics Data System (ADS)
Yusmaita, E.; Nasra, Edi
2018-04-01
This research aims to produce instrument for measuring chemical literacy assessment in basic chemistry courses with solubility topic. The construction of this measuring instrument is adapted to the PISA (Programme for International Student Assessment) problem’s characteristics and the Syllaby of Basic Chemistry in KKNI-IndonesianNational Qualification Framework. The PISA is a cross-country study conducted periodically to monitor the outcomes of learners' achievement in each participating country. So far, studies conducted by PISA include reading literacy, mathematic literacy and scientific literacy. Refered to the scientific competence of the PISA study on science literacy, an assessment designed to measure the chemical literacy of the chemistry department’s students in UNP. The research model used is MER (Model of Educational Reconstruction). The validity and reliability values of discourse questions is measured using the software ANATES. Based on the acquisition of these values is obtained a valid and reliable chemical literacy questions.There are seven question items limited response on the topic of solubility with valid category, the acquisition value of test reliability is 0,86, and has a difficulty index and distinguishing good
Promoting the Quality of Health Research-based News: Introduction of a Tool
Ashoorkhani, Mahnaz; Majdzadeh, Reza; Nedjat, Saharnaz; Gholami, Jaleh
2017-01-01
Introduction: While disseminating health research findings to the public, it is very important to present appropriate and accurate information to give the target audience a correct understanding of the subject matter. The objective of this study was to design and psychometrically evaluate a checklist for health journalists to help them prepare news of appropriate accuracy and authenticity. Methods: The study consisted of two phases, checklist design and psychometrics. Literature review and expert opinion were used to extract the items of the checklist in the first phase. In the second phase, to assess content and face validity, the judgment of 38 persons (epidemiologists with a tool production history, editors-in-chief, and health journalists) was used to check the items’ understandability, nonambiguity, relevancy, and clarity. Reliability was assessed by the test–retest method using intra-cluster correlation (ICC) indices in the two phases. Cronbach's alpha was used to assess internal validity of the checklist. Results: Based on the participants’ opinions, the items were reduced from 20 to 14 in number. The items were categorized into the following three domains: (a) items assessing the source of news and its validity, (b) items addressing the presentation of complete and accurate information on research findings, and (c) items which if adhered to lead to the target audiences’ better understanding. The checklist was approved for content and face validity. The reliability of the checklist was assessed in the last stage; the ICC was 1 for 12 items and above 0.8 for the other two. Internal consistency (Cronbach's alpha) was 0.98. Discussion and Conclusions: The resultant indices of the study indicate that the checklist has appropriate validity and reliability. Hence, it can be used by health journalists to develop health research-based news. PMID:29184638
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bucknor, Matthew; Grabaskas, David; Brunett, Acacia J.
We report that many advanced reactor designs rely on passive systems to fulfill safety functions during accident sequences. These systems depend heavily on boundary conditions to induce a motive force, meaning the system can fail to operate as intended because of deviations in boundary conditions, rather than as the result of physical failures. Furthermore, passive systems may operate in intermediate or degraded modes. These factors make passive system operation difficult to characterize within a traditional probabilistic framework that only recognizes discrete operating modes and does not allow for the explicit consideration of time-dependent boundary conditions. Argonne National Laboratory has beenmore » examining various methodologies for assessing passive system reliability within a probabilistic risk assessment for a station blackout event at an advanced small modular reactor. This paper provides an overview of a passive system reliability demonstration analysis for an external event. Considering an earthquake with the possibility of site flooding, the analysis focuses on the behavior of the passive Reactor Cavity Cooling System following potential physical damage and system flooding. The assessment approach seeks to combine mechanistic and simulation-based methods to leverage the benefits of the simulation-based approach without the need to substantially deviate from conventional probabilistic risk assessment techniques. Lastly, although this study is presented as only an example analysis, the results appear to demonstrate a high level of reliability of the Reactor Cavity Cooling System (and the reactor system in general) for the postulated transient event.« less
Evaluating the use of in-store measures in retail food stores and restaurants in Brazil.
Duran, Ana Clara; Lock, Karen; Latorre, Maria do Rosario D O; Jaime, Patricia Constante
2015-01-01
To assess inter-rater reliability, test-retest reliability, and construct validity of retail food store, open-air food market, and restaurant observation tools adapted to the Brazilian urban context. This study is part of a cross-sectional observation survey conducted in 13 districts across the city of Sao Paulo, Brazil in 2010-2011. Food store and restaurant observational tools were developed based on previously available tools, and then tested it. They included measures on the availability, variety, quality, pricing, and promotion of fruits and vegetables and ultra-processed foods. We used Kappa statistics and intra-class correlation coefficients to assess inter-rater and test-retest reliabilities in samples of 142 restaurants, 97 retail food stores (including open-air food markets), and of 62 restaurants and 45 retail food stores (including open-air food markets), respectively. Construct validity as the tool's abilities to discriminate based on store types and different income contexts were assessed in the entire sample: 305 retail food stores, 8 fruits and vegetable markets, and 472 restaurants. Inter-rater and test-retest reliability were generally high, with most Kappa values greater than 0.70 (range 0.49-1.00). Both tools discriminated between store types and neighborhoods with different median income. Fruits and vegetables were more likely to be found in middle to higher-income neighborhoods, while soda, fruit-flavored drink mixes, cookies, and chips were cheaper and more likely to be found in lower-income neighborhoods. The measures were reliable and able to reveal significant differences across store types and different contexts. Although some items may require revision, results suggest that the tools may be used to reliably measure the food stores and restaurant food environment in urban settings of middle-income countries. Such studies can help .inform health promotion interventions and policies in these contexts.
Evaluating the use of in-store measures in retail food stores and restaurants in Brazil
Duran, Ana Clara; Lock, Karen; Latorre, Maria do Rosario D O; Jaime, Patricia Constante
2015-01-01
ABSTRACT OBJECTIVE To assess inter-rater reliability, test-retest reliability, and construct validity of retail food store, open-air food market, and restaurant observation tools adapted to the Brazilian urban context. METHODS This study is part of a cross-sectional observation survey conducted in 13 districts across the city of Sao Paulo, Brazil in 2010-2011. Food store and restaurant observational tools were developed based on previously available tools, and then tested it. They included measures on the availability, variety, quality, pricing, and promotion of fruits and vegetables and ultra-processed foods. We used Kappa statistics and intra-class correlation coefficients to assess inter-rater and test-retest reliabilities in samples of 142 restaurants, 97 retail food stores (including open-air food markets), and of 62 restaurants and 45 retail food stores (including open-air food markets), respectively. Construct validity as the tool’s abilities to discriminate based on store types and different income contexts were assessed in the entire sample: 305 retail food stores, 8 fruits and vegetable markets, and 472 restaurants. RESULTS Inter-rater and test-retest reliability were generally high, with most Kappa values greater than 0.70 (range 0.49-1.00). Both tools discriminated between store types and neighborhoods with different median income. Fruits and vegetables were more likely to be found in middle to higher-income neighborhoods, while soda, fruit-flavored drink mixes, cookies, and chips were cheaper and more likely to be found in lower-income neighborhoods. CONCLUSIONS The measures were reliable and able to reveal significant differences across store types and different contexts. Although some items may require revision, results suggest that the tools may be used to reliably measure the food stores and restaurant food environment in urban settings of middle-income countries. Such studies can help .inform health promotion interventions and policies in these contexts. PMID:26538101
Assessment Literacy: Building a Base for Better Teaching and Learning
ERIC Educational Resources Information Center
Rogler, Dawn
2014-01-01
This article presents principles and practices of effective assessment, outlining seven key concepts--usefulness, reliability, validity, practicality, washback, authenticity, and transparency--and demonstrating how to apply them in creating an exam blueprint. The article also discusses the importance of providing feedback after a test has been…
Voice Recognition: A New Assessment Tool?
ERIC Educational Resources Information Center
Jones, Darla
2005-01-01
This article presents the results of a study conducted in Anchorage, Alaska, that evaluated the accuracy and efficiency of using voice recognition (VR) technology to collect oral reading fluency data for classroom-based assessments. The primary research question was as follows: Is voice recognition technology a valid and reliable alternative to…
Assessing Young Adolescents' Personality with the Five-Factor Personality Inventory
ERIC Educational Resources Information Center
Hendriks, A. A. Jolijn; Kuyper, Hans; Offringa, G. Johan; Van der Werf, Margaretha P. C.
2008-01-01
The Five-Factor Personality Inventory (FFPI) assesses a person's position on the (Dutch) psycholexically based Big Five factors: Extraversion, Agreeableness, Conscientiousness, Emotional Stability, and Autonomy. FFPI factor scores are reliable and valid if ratings are made by adults. The present study yields preliminary evidence of whether young…
Assessing Student Learning: A Collection of Evaluation Tools
ERIC Educational Resources Information Center
Gottfried, Gail M.; Johnson, Kathy E.; Vosmik, Jordan R.
2009-01-01
Whereas grading systems based on tacit knowledge may be the norm in practice, the recent trend toward educational accountability--from granting organizations, accreditation boards, journals on the teaching of psychology, and even tenure/promotion committees--suggests a real need for reliable, validated assessment measures that can be used to…
METHOD FOR MEASURING BASE/NEUTRAL AND CARBAMATE PESTICIDES IN PERSONAL DIETARY SAMPLES
Dietary uptake may be a significant pathway of exposure to contaminants. As such,dietary exposure assessments should be considered an important part of the total exposure assessment process. The objective of this work was to develop reliable methods that are applicable to a wide ...
METHOD FOR MEASURING BASE/NEUTRAL AND CARBAMATE PESTICIDES IN PERSONAL DIETARY SAMPLES
Dietary uptake may be a significant pathway of exposure to contaminants. As such, dietary exposure assessments should be considered an important part of the total exposure assessment process. The objective of this work was to develop reliable methods that are applicable to a wide...
Zijlstra, Agnes; Zijlstra, Wiebren
2013-09-01
Inverted pendulum (IP) models of human walking allow for wearable motion-sensor based estimations of spatio-temporal gait parameters during unconstrained walking in daily-life conditions. At present it is unclear to what extent different IP based estimations yield different results, and reliability and validity have not been investigated in older persons without a specific medical condition. The aim of this study was to compare reliability and validity of four different IP based estimations of mean step length in independent-living older persons. Participants were assessed twice and walked at different speeds while wearing a tri-axial accelerometer at the lower back. For all step-length estimators, test-retest intra-class correlations approached or were above 0.90. Intra-class correlations with reference step length were above 0.92 with a mean error of 0.0 cm when (1) multiplying the estimated center-of-mass displacement during a step by an individual correction factor in a simple IP model, or (2) adding an individual constant for bipedal stance displacement to the estimated displacement during single stance in a 2-phase IP model. When applying generic corrections or constants in all subjects (i.e. multiplication by 1.25, or adding 75% of foot length), correlations were above 0.75 with a mean error of respectively 2.0 and 1.2 cm. Although the results indicate that an individual adjustment of the IP models provides better estimations of mean step length, the ease of a generic adjustment can be favored when merely evaluating intra-individual differences. Further studies should determine the validity of these IP based estimations for assessing gait in daily life. Copyright © 2013 Elsevier B.V. All rights reserved.
Quantitative Rapid Assessment of Leukoaraiosis in CT : Comparison to Gold Standard MRI.
Hanning, Uta; Sporns, Peter Bernhard; Schmidt, Rene; Niederstadt, Thomas; Minnerup, Jens; Bier, Georg; Knecht, Stefan; Kemmling, André
2017-10-20
The severity of white matter lesions (WML) is a risk factor of hemorrhage and predictor of clinical outcome after ischemic stroke; however, in contrast to magnetic resonance imaging (MRI) reliable quantification for this surrogate marker is limited for computed tomography (CT), the leading stroke imaging technique. We aimed to present and evaluate a CT-based automated rater-independent method for quantification of microangiopathic white matter changes. Patients with suspected minor stroke (National Institutes of Health Stroke scale, NIHSS < 4) were screened for the analysis of non-contrast computerized tomography (NCCT) at admission and compared to follow-up MRI. The MRI-based WML volume and visual Fazekas scores were assessed as the gold standard reference. We employed a recently published probabilistic brain segmentation algorithm for CT images to determine the tissue-specific density of WM space. All voxel-wise densities were quantified in WM space and weighted according to partial probabilistic WM content. The resulting mean weighted density of WM space in NCCT, the surrogate of WML, was correlated with reference to MRI-based WML parameters. The process of CT-based tissue-specific segmentation was reliable in 79 cases with varying severity of microangiopathy. Voxel-wise weighted density within WM spaces showed a noticeable correlation (r = -0.65) with MRI-based WML volume. Particularly in patients with moderate or severe lesion load according to the visual Fazekas score the algorithm provided reliable prediction of MRI-based WML volume. Automated observer-independent quantification of voxel-wise WM density in CT significantly correlates with microangiopathic WM disease in gold standard MRI. This rapid surrogate of white matter lesion load in CT may support objective WML assessment and therapeutic decision-making during acute stroke triage.
ERIC Educational Resources Information Center
Farwell, Tricia M.; Alligood, Leon; Fitzgerald, Sharon; Blake, Ken
2016-01-01
This article introduces an objective grammar and math assessment and evaluates the assessment's outcome and reliability when fielded among eighty-one students in media writing courses. In addition, the article proposes a rubric for grading straight news leads and compares the rubric's reliability with the reliability of rating straight news leads…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
1987-03-01
An assessment of needs was completed, and a five-year project plan was developed with extensive input from private industry. Objective is to develop the industrial technology base required for reliable ceramics for application in advanced automotive heat engines. The project approach includes determining the mechanisms controlling reliability, improving processes for fabricating existing ceramics, developing new materials with increased reliability, and testing these materials in simulated engine environments to confirm reliability. Although this is a generic materials project, the focus is on structural ceramics for advanced gas turbine and diesel engines, ceramic bearings and attachments, and ceramic coatings for thermal barriermore » and wear applications in these engines.« less
Automated reliability assessment for spectroscopic redshift measurements
NASA Astrophysics Data System (ADS)
Jamal, S.; Le Brun, V.; Le Fèvre, O.; Vibert, D.; Schmitt, A.; Surace, C.; Copin, Y.; Garilli, B.; Moresco, M.; Pozzetti, L.
2018-03-01
Context. Future large-scale surveys, such as the ESA Euclid mission, will produce a large set of galaxy redshifts (≥106) that will require fully automated data-processing pipelines to analyze the data, extract crucial information and ensure that all requirements are met. A fundamental element in these pipelines is to associate to each galaxy redshift measurement a quality, or reliability, estimate. Aim. In this work, we introduce a new approach to automate the spectroscopic redshift reliability assessment based on machine learning (ML) and characteristics of the redshift probability density function. Methods: We propose to rephrase the spectroscopic redshift estimation into a Bayesian framework, in order to incorporate all sources of information and uncertainties related to the redshift estimation process and produce a redshift posterior probability density function (PDF). To automate the assessment of a reliability flag, we exploit key features in the redshift posterior PDF and machine learning algorithms. Results: As a working example, public data from the VIMOS VLT Deep Survey is exploited to present and test this new methodology. We first tried to reproduce the existing reliability flags using supervised classification in order to describe different types of redshift PDFs, but due to the subjective definition of these flags (classification accuracy 58%), we soon opted for a new homogeneous partitioning of the data into distinct clusters via unsupervised classification. After assessing the accuracy of the new clusters via resubstitution and test predictions (classification accuracy 98%), we projected unlabeled data from preliminary mock simulations for the Euclid space mission into this mapping to predict their redshift reliability labels. Conclusions: Through the development of a methodology in which a system can build its own experience to assess the quality of a parameter, we are able to set a preliminary basis of an automated reliability assessment for spectroscopic redshift measurements. This newly-defined method is very promising for next-generation large spectroscopic surveys from the ground and in space, such as Euclid and WFIRST. A table of the reclassified VVDS redshifts and reliability is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/611/A53
Helfrich, Christian D; Li, Yu-Fang; Sharp, Nancy D; Sales, Anne E
2009-01-01
Background The Promoting Action on Research Implementation in Health Services, or PARIHS, framework is a theoretical framework widely promoted as a guide to implement evidence-based clinical practices. However, it has as yet no pool of validated measurement instruments that operationalize the constructs defined in the framework. The present article introduces an Organizational Readiness to Change Assessment instrument (ORCA), organized according to the core elements and sub-elements of the PARIHS framework, and reports on initial validation. Methods We conducted scale reliability and factor analyses on cross-sectional, secondary data from three quality improvement projects (n = 80) conducted in the Veterans Health Administration. In each project, identical 77-item ORCA instruments were administered to one or more staff from each facility involved in quality improvement projects. Items were organized into 19 subscales and three primary scales corresponding to the core elements of the PARIHS framework: (1) Strength and extent of evidence for the clinical practice changes represented by the QI program, assessed with four subscales, (2) Quality of the organizational context for the QI program, assessed with six subscales, and (3) Capacity for internal facilitation of the QI program, assessed with nine subscales. Results Cronbach's alpha for scale reliability were 0.74, 0.85 and 0.95 for the evidence, context and facilitation scales, respectively. The evidence scale and its three constituent subscales failed to meet the conventional threshold of 0.80 for reliability, and three individual items were eliminated from evidence subscales following reliability testing. In exploratory factor analysis, three factors were retained. Seven of the nine facilitation subscales loaded onto the first factor; five of the six context subscales loaded onto the second factor; and the three evidence subscales loaded on the third factor. Two subscales failed to load significantly on any factor. One measured resources in general (from the context scale), and one clinical champion role (from the facilitation scale). Conclusion We find general support for the reliability and factor structure of the ORCA. However, there was poor reliability among measures of evidence, and factor analysis results for measures of general resources and clinical champion role did not conform to the PARIHS framework. Additional validation is needed, including criterion validation. PMID:19594942
Alghadir, Ahmad; Anwer, Shahnawaz; Iqbal, Zaheen Ahmed; Alsanawi, Hisham Abdulaziz
2016-01-01
We adapted the reduced Western Ontario and McMaster Universities Osteoarthritis (WOMAC) index for the Arabic language and tested its metric properties in patients with knee osteoarthritis (OA). One hundred and twenty-one consecutive patients who were referred for physiotherapy to the outpatient department were asked to answer the Arabic version of the reduced WOMAC index (ArWOMAC). After the completion of the ArWOMAC, the intensity of knee pain and general health status were assessed using the visual analog scale (VAS) and the 12-item short form health survey (SF-12), respectively. A second assessment was performed at least 48 h after the first session to assess test-retest reliability. The test-retest reliability was quantified using the intra-class correlation coefficient (ICC), and Cronbach's alpha was calculated to assess the internal consistency of the Arabic questionnaire. The construct validity was assessed using Spearman rank correlation coefficients. The total ArWOMAC scale and pain and function subscales were internally consistent with Cronbach's coefficient alpha of 0.91, 0.89 and 0.90, respectively. Test-retest reliability was good to excellent with ICC of 0.91, 0.89 and 0.90, respectively. SF-12 and VAS score significantly correlated with ArWOMAC index (p < 0.01), which support the construct validity. The standard error of measurement (SEM) of the total scale was 2.94, based on repeated measurements for test-retest. The minimum detectable change based on the SEM for test-retest was 8.15. The ArWOMAC index is a reliable and valid instrument for evaluating the severity of knee OA, with metric properties in agreement with the original version. Although, the reduced WOMAC index has been clinically utilized within the Saudi population, the Arabic version of this instrument is not validated for an Arab population to measure lower limb functional disability caused by OA. The Arabic version of reduced WOMAC (ArWOMAC) index is a reliable and valid scale to measure lower limb functional disability in patients with knee OA. The ArWOMAC index could be suitable in Saudi Arabia and other Arab countries where the language, culture and the life style are similar.
Helfrich, Christian D; Li, Yu-Fang; Sharp, Nancy D; Sales, Anne E
2009-07-14
The Promoting Action on Research Implementation in Health Services, or PARIHS, framework is a theoretical framework widely promoted as a guide to implement evidence-based clinical practices. However, it has as yet no pool of validated measurement instruments that operationalize the constructs defined in the framework. The present article introduces an Organizational Readiness to Change Assessment instrument (ORCA), organized according to the core elements and sub-elements of the PARIHS framework, and reports on initial validation. We conducted scale reliability and factor analyses on cross-sectional, secondary data from three quality improvement projects (n = 80) conducted in the Veterans Health Administration. In each project, identical 77-item ORCA instruments were administered to one or more staff from each facility involved in quality improvement projects. Items were organized into 19 subscales and three primary scales corresponding to the core elements of the PARIHS framework: (1) Strength and extent of evidence for the clinical practice changes represented by the QI program, assessed with four subscales, (2) Quality of the organizational context for the QI program, assessed with six subscales, and (3) Capacity for internal facilitation of the QI program, assessed with nine subscales. Cronbach's alpha for scale reliability were 0.74, 0.85 and 0.95 for the evidence, context and facilitation scales, respectively. The evidence scale and its three constituent subscales failed to meet the conventional threshold of 0.80 for reliability, and three individual items were eliminated from evidence subscales following reliability testing. In exploratory factor analysis, three factors were retained. Seven of the nine facilitation subscales loaded onto the first factor; five of the six context subscales loaded onto the second factor; and the three evidence subscales loaded on the third factor. Two subscales failed to load significantly on any factor. One measured resources in general (from the context scale), and one clinical champion role (from the facilitation scale). We find general support for the reliability and factor structure of the ORCA. However, there was poor reliability among measures of evidence, and factor analysis results for measures of general resources and clinical champion role did not conform to the PARIHS framework. Additional validation is needed, including criterion validation.
Villota, Orlando; Diaz, Mario; Ceron, Carmen; Moller, Ingrid; Naredo, Esperanza; Saaibi, Diego Luis
2017-07-28
To assess the intra- and inter-observer reliability of ultrasound (US) in scoring B-mode, Doppler synovitis and combined B-mode and Doppler synovitis scores in different peripheral joints of rheumatoid arthritis (RA) patients. Four rheumatologists with a formal training in musculoskeletal US (MSKUS) particularly focus on definitions and scoring synovitis on B-mode and Doppler mode participated in a patient-based reliability exercise on 16 active RA patients. The four rheumatologists independently and consecutively performed a B-mode and power Doppler (PD) US assessment of 7 joints of each patient in two rounds in a blinded fashion. Each joint was semi quantitatively scored from 0 to 3 for B-mode synovitis (BS), Doppler synovitis (DS), and combined B-mode/Doppler synovitis (CS). Intraobserver reliability was assessed by Cohen's κ. Interobserver reliability was assessed by unweight Light's κ. The mean prevalence of synovitis on B-mode was 83% of joints; scores ranging from grade 1 in 18% of joints, to grade 3 in 33%. In 55% of joints synovial PD signal was detected and the distribution of scores range from 14% of joints for grade 3, to 26% for grade 2. After a total of 448 joints scanned with 896 adquired images our intraobserver and interobserver reliability was good to excellent for most of the joints. Formal, structured and continuous training in musculoskeletal ultrasound would bring a good to excellent reproducibility in rheumatological hands with a high reliability in real time acquisition BS, DS and CS modalities for scoring synovitis in patients with active rheumatoid arthritis. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Randall Simpson, Janis; Gumbley, Jillian; Whyte, Kylie; Lac, Jane; Morra, Crystal; Rysdale, Lee; Turfryer, Mary; McGibbon, Kim; Beyers, Joanne; Keller, Heather
2015-09-01
Nutrition is vital for optimal growth and development of young children. Nutrition risk screening can facilitate early intervention when followed by nutritional assessment and treatment. NutriSTEP (Nutrition Screening Tool for Every Preschooler) is a valid and reliable nutrition risk screening questionnaire for preschoolers (aged 3-5 years). A need was identified for a similar questionnaire for toddlers (aged 18-35 months). The purpose was to develop a reliable and valid Toddler NutriSTEP. Toddler NutriSTEP was developed in 4 phases. Content and face validity were determined with a literature review, parent focus groups (n = 6; 48 participants), and experts (n = 13) (phase A). A draft questionnaire was refined with key intercept interviews of 107 parents/caregivers (phase B). Test-retest reliability (phase C), based on intra-class correlations (ICC), Kappa (κ) statistics, and Wilcoxon tests was assessed with 133 parents/caregivers. Criterion validity (phase D) was assessed using Receiver Operating Characteristic (ROC) curves by comparing scores on the Toddler NutriSTEP to a comprehensive nutritional assessment of 200 toddlers with a registered dietitian (RD). The Toddler NutriSTEP was reliable between 2 administrations (ICC = 0.951, F = 20.53, p < 0.001); most questions had moderate (κ ≥ 0.6) or excellent (κ ≥ 0.8) agreement. Scores on the RD nutrition risk rating and the Toddler NutriSTEP were correlated (r = 0.67, p < 0.000). The area under the ROC curve for moderate and high RD risk ratings were 84.6% and 82.7%, respectively. Cut-points of ≥21 (sensitivity 86%; specificity 61%) (moderate risk) and ≥26 (sensitivity 95%; specificity 63%) (high risk) were determined. The Toddler NutriSTEP questionnaire is both reliable and valid for screening for nutritional risk in toddlers.
DiClemente, Carlo C; Crouch, Taylor Berens; Norwood, Amber E Q; Delahanty, Janine; Welsh, Christopher
2015-03-01
Screening, brief intervention, and referral to treatment (SBIRT) has become an empirically supported and widely implemented approach in primary and specialty care for addressing substance misuse. Accordingly, training of providers in SBIRT has increased exponentially in recent years. However, the quality and fidelity of training programs and subsequent interventions are largely unknown because of the lack of SBIRT-specific evaluation tools. The purpose of this study was to create a coding scale to assess quality and fidelity of SBIRT interactions addressing alcohol, tobacco, illicit drugs, and prescription medication misuse. The scale was developed to evaluate performance in an SBIRT residency training program. Scale development was based on training protocol and competencies with consultation from Motivational Interviewing coding experts. Trained medical residents practiced SBIRT with standardized patients during 10- to 15-min videotaped interactions. This study included 25 tapes from the Family Medicine program coded by 3 unique coder pairs with varying levels of coding experience. Interrater reliability was assessed for overall scale components and individual items via intraclass correlation coefficients. Coder pair-specific reliability was also assessed. Interrater reliability was excellent overall for the scale components (>.85) and nearly all items. Reliability was higher for more experienced coders, though still adequate for the trained coder pair. Descriptive data demonstrated a broad range of adherence and skills. Subscale correlations supported concurrent and discriminant validity. Data provide evidence that the MD3 SBIRT Coding Scale is a psychometrically reliable coding system for evaluating SBIRT interactions and can be used to evaluate implementation skills for fidelity, training, assessment, and research. Recommendations for refinement and further testing of the measure are discussed. (PsycINFO Database Record (c) 2015 APA, all rights reserved).
Gao, Wenjun; Yuan, Changrong; Wang, Jichuan; Du, Jiarui; Wu, Huiqiao; Qian, Xiaojie; Hinds, Pamela S
2013-01-01
The City of Hope Quality of Life-Ostomy Questionnaire is a widely accepted scale to assess quality of life in ostomy patients. However, the validity and reliability of the Chinese version (C-COH) have not been studied. The objective of the study was to assess the validity and reliability of the C-COH among ostomy patients sampled from Shanghai from August 2010 to June 2011. Content validity was examined based on the reviews of a panel of 10 experts; test-retest was conducted to assess the item reliabilities of the scale; a pilot sample (n = 274) was selected to explore the factorial structure of the C-COH using exploratory factor analysis; a validation sample (n = 370) was selected to confirm the findings from the exploratory study using confirmatory factor analysis (CFA). Statistical package SPSS version 16.0 was used for the exploratory factor analysis, and Amos 17.0 was used for the CFA. The C-COH was developed by modifying 1 item and excluding 11 items from the original scale. Four factors/subscales (physical well-being, psychological well-being, social well-being, and spiritual well-being) were identified and confirmed in the C-COH The scale reliabilities estimated from the CFA results for the 4 subscales were 0.860, 0.885, 0.864, and 0.686, respectively. Findings support the reliability and validity of the C-COH. The C-COH could be a useful measure of the level of quality of life among Chinese patients with a stoma and may provide important intervention implications for healthcare providers to help improve the life quality of patients with a stoma.
Beehler, Sarah; Ahern, Jennifer; Balmer, Brandi; Kuhlman, Jennifer
2017-01-01
This pilot study evaluated the validity and reliability of an Experience of Neighborhood (EON) measure developed to assess neighborhood characteristics that shape reintegration opportunities for returning service members and their families. A total of 91 post-9/11 veterans and spouses completed a survey administered at the Minnesota State Fair. Participants self-reported on their reintegration status (veterans), social functioning (spouses), social support, and mental health. EON factor structure, internal consistency reliability, and validity (discriminant, content, criterion) were analyzed. The EON measure showed adequate reliability, discriminant validity, and content validity. More work is needed to assess criterion validity because EON scores were not correlated with scores on a Census-based index used to measure quality of military neighborhoods. The EON may be useful in assessing broad local factors influencing health among returning veterans and spouses. More research is needed to understand geographic variation in neighborhood conditions and how those affect reintegration and mental health for military families.
Nunes, Andreia; Limpo, Teresa; Lima, César F.; Castro, São Luís
2018-01-01
The importance of quickly assessing personality traits in many studies prompted the development of brief scales such as the Ten-Item Personality Inventory (TIPI), a measure of five personality traits (extraversion, agreeableness, conscientiousness, emotional stability, and openness). In the current study, we present the Portuguese version of TIPI and examine its psychometric properties, based on a sample of 333 Portuguese adults aged 18 to 65 years. The results revealed reliability coefficients similar to the original version (α = 0.39–0.72), very good 4-week test–retest reliability (n = 81, rs > 0.71), expected factorial structure, high convergent validity with the Big-Five Inventory (rs > 0.60), and correlations with self-esteem, affect, and aggressiveness similar to those found with standard measures of personality traits. Overall, our findings suggest that the Portuguese TIPI is a reliable and valid alternative to longer measures: it offers a promising tool for research contexts in which the available time for personality assessment is highly limited. PMID:29674989
Nunes, Andreia; Limpo, Teresa; Lima, César F; Castro, São Luís
2018-01-01
The importance of quickly assessing personality traits in many studies prompted the development of brief scales such as the Ten-Item Personality Inventory (TIPI), a measure of five personality traits (extraversion, agreeableness, conscientiousness, emotional stability, and openness). In the current study, we present the Portuguese version of TIPI and examine its psychometric properties, based on a sample of 333 Portuguese adults aged 18 to 65 years. The results revealed reliability coefficients similar to the original version (α = 0.39-0.72), very good 4-week test-retest reliability ( n = 81, r s > 0.71), expected factorial structure, high convergent validity with the Big-Five Inventory ( r s > 0.60), and correlations with self-esteem, affect, and aggressiveness similar to those found with standard measures of personality traits. Overall, our findings suggest that the Portuguese TIPI is a reliable and valid alternative to longer measures: it offers a promising tool for research contexts in which the available time for personality assessment is highly limited.
Beehler, Sarah; Ahern, Jennifer; Balmer, Brandi; Kuhlman, Jennifer
2017-01-01
This pilot study evaluated the validity and reliability of an Experience of Neighborhood (EON) measure developed to assess neighborhood characteristics that shape reintegration opportunities for returning service members and their families. A total of 91 post-9/11 veterans and spouses completed a survey administered at the Minnesota State Fair. Participants self-reported on their reintegration status (veterans), social functioning (spouses), social support, and mental health. EON factor structure, internal consistency reliability, and validity (discriminant, content, criterion) were analyzed. The EON measure showed adequate reliability, discriminant validity, and content validity. More work is needed to assess criterion validity because EON scores were not correlated with scores on a Census-based index used to measure quality of military neighborhoods. The EON may be useful in assessing broad local factors influencing health among returning veterans and spouses. More research is needed to understand geographic variation in neighborhood conditions and how those affect reintegration and mental health for military families. PMID:28936370
The Reliability and Predictive Validity of the Stalking Risk Profile.
McEwan, Troy E; Shea, Daniel E; Daffern, Michael; MacKenzie, Rachel D; Ogloff, James R P; Mullen, Paul E
2018-03-01
This study assessed the reliability and validity of the Stalking Risk Profile (SRP), a structured measure for assessing stalking risks. The SRP was administered at the point of assessment or retrospectively from file review for 241 adult stalkers (91% male) referred to a community-based forensic mental health service. Interrater reliability was high for stalker type, and moderate-to-substantial for risk judgments and domain scores. Evidence for predictive validity and discrimination between stalking recidivists and nonrecidivists for risk judgments depended on follow-up duration. Discrimination was moderate (area under the curve = 0.66-0.68) and positive and negative predictive values good over the full follow-up period ( Mdn = 170.43 weeks). At 6 months, discrimination was better than chance only for judgments related to stalking of new victims (area under the curve = 0.75); however, high-risk stalkers still reoffended against their original victim(s) 2 to 4 times as often as low-risk stalkers. Implications for the clinical utility and refinement of the SRP are discussed.
The long case and its modifications: a literature review.
Ponnamperuma, Gominda G; Karunathilake, Indika M; McAleer, Sean; Davis, Margery H
2009-10-01
This review provides a summary of the published literature on the suitability of the long case and its modifications for high-stakes assessment. Databases related to medicine were searched for articles published from 2000 to 2008, using the keywords 'long case', 'clinical examinations' and 'clinical assessment'. Reference lists of review articles were hand-searched. Articles related to the objective structured clinical examination were eliminated. Research-based articles with hard data were given more emphasis in this review than those based on opinion. Eighteen articles were identified. The main disadvantage of the long case is its inability to sample the curriculum widely, resulting in low reliability. The main advantage of the long case is its ability to assess the candidate's overall (holistic) approach to the patient. Modifications to the long case attempt to: structure the format and the marking scheme; increase the number of examiners; observe the candidate's behaviour, and increase the number of cases. The long case is a traditional clinical examination format for the assessment of clinical competence and assessment at this level is important. The starting point for the majority of recent research on the long case has been an acceptance of its low reliability and modifications to the format have been proposed. Further evidence of the efficacy of these modifications is required, however, before they can be recommended for summative assessment. If further research is to be undertaken on the long case, it should focus on finding practicable ways of sampling the curriculum widely to increase reliability while maintaining the holistic approach towards the patient, which represents the attraction of the long case.
Hand assessment in older adults with musculoskeletal hand problems: a reliability study.
Myers, Helen L; Thomas, Elaine; Hay, Elaine M; Dziedzic, Krysia S
2011-01-07
Musculoskeletal hand pain is common in the general population. This study aims to investigate the inter- and intra-observer reliability of two trained observers conducting a simple clinical interview and physical examination for hand problems in older adults. The reliability of applying the American College of Rheumatology (ACR) criteria for hand osteoarthritis to community-dwelling older adults will also be investigated. Fifty-five participants aged 50 years and over with a current self-reported hand problem and registered with one general practice were recruited from a previous health questionnaire study. Participants underwent a standardised, structured clinical interview and physical examination by two independent trained observers and again by one of these observers a month later. Agreement beyond chance was summarised using Kappa statistics and intra-class correlation coefficients. Median values for inter- and intra-observer reliability for clinical interview questions were found to be "substantial" and "moderate" respectively [median agreement beyond chance (Kappa) was 0.75 (range: -0.03, 0.93) for inter-observer ratings and 0.57 (range: -0.02, 1.00) for intra-observer ratings]. Inter- and intra-observer reliability for physical examination items was variable, with good reliability observed for some items, such as grip and pinch strength, and poor reliability observed for others, notably assessment of altered sensation, pain on resisted movement and judgements based on observation and palpation of individual features at single joints, such as bony enlargement, nodes and swelling. Moderate agreement was observed both between and within observers when applying the ACR criteria for hand osteoarthritis. Standardised, structured clinical interview is reliable for taking a history in community-dwelling older adults with self reported hand problems. Agreement between and within observers for physical examination items is variable. Low Kappa values may have resulted, in part, from a low prevalence of clinical signs and symptoms in the study participants. The decision to use clinical interview and hand assessment variables in clinical practice or further research in primary care should include consideration of clinical applicability and training alongside reliability. Further investigation is required to determine the relationship between these clinical questions and assessments and the clinical course of hand pain and hand problems in community-dwelling older adults.
Beronius, Anna; Molander, Linda; Zilliacus, Johanna; Rudén, Christina; Hanberg, Annika
2018-05-28
The Science in Risk Assessment and Policy (SciRAP) web-based platform was developed to promote and facilitate structure and transparency in the evaluation of ecotoxicity and toxicity studies for hazard and risk assessment of chemicals. The platform includes sets of criteria and a colour-coding tool for evaluating the reliability and relevance of individual studies. The SciRAP method for evaluating in vivo toxicity studies was first published in 2014 and the aim of the work presented here was to evaluate and develop that method further. Toxicologists and risk assessors from different sectors and geographical areas were invited to test the SciRAP criteria and tool on a specific set of in vivo toxicity studies and to provide feedback concerning the scientific soundness and user-friendliness of the SciRAP approach. The results of this expert assessment were used to refine and improve both the evaluation criteria and the colour-coding tool. It is expected that the SciRAP web-based platform will continue to be developed and enhanced to keep up to date with the needs of end-users. Copyright © 2018 John Wiley & Sons, Ltd.
Benjamin, Sara E; Neelon, Brian; Ball, Sarah C; Bangdiwala, Shrikant I; Ammerman, Alice S; Ward, Dianne S
2007-01-01
Background Few assessment instruments have examined the nutrition and physical activity environments in child care, and none are self-administered. Given the emerging focus on child care settings as a target for intervention, a valid and reliable measure of the nutrition and physical activity environment is needed. Methods To measure inter-rater reliability, 59 child care center directors and 109 staff completed the self-assessment concurrently, but independently. Three weeks later, a repeat self-assessment was completed by a sub-sample of 38 directors to assess test-retest reliability. To assess criterion validity, a researcher-administered environmental assessment was conducted at 69 centers and was compared to a self-assessment completed by the director. A weighted kappa test statistic and percent agreement were calculated to assess agreement for each question on the self-assessment. Results For inter-rater reliability, kappa statistics ranged from 0.20 to 1.00 across all questions. Test-retest reliability of the self-assessment yielded kappa statistics that ranged from 0.07 to 1.00. The inter-quartile kappa statistic ranges for inter-rater and test-retest reliability were 0.45 to 0.63 and 0.27 to 0.45, respectively. When percent agreement was calculated, questions ranged from 52.6% to 100% for inter-rater reliability and 34.3% to 100% for test-retest reliability. Kappa statistics for validity ranged from -0.01 to 0.79, with an inter-quartile range of 0.08 to 0.34. Percent agreement for validity ranged from 12.9% to 93.7%. Conclusion This study provides estimates of criterion validity, inter-rater reliability and test-retest reliability for an environmental nutrition and physical activity self-assessment instrument for child care. Results indicate that the self-assessment is a stable and reasonably accurate instrument for use with child care interventions. We therefore recommend the Nutrition and Physical Activity Self-Assessment for Child Care (NAP SACC) instrument to researchers and practitioners interested in conducting healthy weight intervention in child care. However, a more robust, less subjective measure would be more appropriate for researchers seeking an outcome measure to assess intervention impact. PMID:17615078
Cheung, Gordon; Goonewardene, Mithran Suresh; Islam, Syed Mohammed Shamsul; Murray, Kevin; Koong, Bernard
2013-05-01
To assess the validity of using jugale (J) and Antegonion (Ag) on Posterior-Anterior cephalograms (PAC) as landmarks for transverse intermaxillary analysis when compared with Cone Beam Computed Tomography (CBCT). Conventional PAC and CBCT images were taken of 28 dry skulls. Craniometric measurements between the bilateral landmarks, Antegonion and Jugale, were obtained from the skulls using a microscribe and recorded as the base standard. The corresponding andmarks were identified and measured on CBCT and PAC and compared with the base standard measurements. The accuracy and reliability of the measurements were statistically evaluated and the validity was assessed by comparing the ability of the two image modalities to accurately diagnose an arbitrarily selected J-J/Ag-Ag ratio. All measurements were repeated at least 7 weeks apart. Intra-class correlations (ICC) and Bland-Altman plots were used to analyse the data. All three methods were shown to be reliable as all had a mean error of less than 0.5 mm between repeated measurements. When compared with the base standard, CBCT measurements were shown to have higher agreement (ICC: 0.861-0.964) compared with measurements taken from PAC (ICC: 0.794-0.796). When the arbitrary J-J/Ag-Ag ratio was assessed, 18 per cent of cases were incorrectly diagnosed with a transverse discrepancy on the PAC compared with the CBCT which incorrectly diagnosed 8.7 per cent. CBCT was shown to be more reliable in assessing intermaxillary transverse discrepancy compared with PAC when using J-J/Ag-Ag ratios.
ERIC Educational Resources Information Center
Contino, Julie
2013-01-01
In a standards-based system, it is important for all components of the system to align in order to achieve the intended goals. No Child Left Behind law mandates that assessments be fully aligned with state standards, be valid, reliable and fair, be reported to all stakeholders, and provide evidence that all students in the state are meeting the…
Chen, Yu-Cheng; Coble, Joseph B; Deziel, Nicole C; Ji, Bu-Tian; Xue, Shouzheng; Lu, Wei; Stewart, Patricia A; Friesen, Melissa C
2014-11-01
The reliability and validity of six experts' exposure ratings were evaluated for 64 nickel-exposed and 72 chromium-exposed workers from six Shanghai electroplating plants based on airborne and urinary nickel and chromium measurements. Three industrial hygienists and three occupational physicians independently ranked the exposure intensity of each metal on an ordinal scale (1-4) for each worker's job in two rounds: the first round was based on responses to an occupational history questionnaire and the second round also included responses to an electroplating industry-specific questionnaire. The Spearman correlation (r(s)) was used to compare each rating's validity to its corresponding subject-specific arithmetic mean of four airborne or four urinary measurements. Reliability was moderately high (weighted kappa range=0.60-0.64). Validity was poor to moderate (r(s)=-0.37-0.46) for both airborne and urinary concentrations of both metals. For airborne nickel concentrations, validity differed by plant. For dichotomized metrics, sensitivity and specificity were higher based on urinary measurements (47-78%) than airborne measurements (16-50%). Few patterns were observed by metal, assessment round, or expert type. These results suggest that, for electroplating exposures, experts can achieve moderately high agreement and (reasonably) distinguish between low and high exposures when reviewing responses to in-depth questionnaires used in population-based case-control studies.
Chen, Yu-Cheng; Coble, Joseph B; Deziel, Nicole C.; Ji, Bu-Tian; Xue, Shouzheng; Lu, Wei; Stewart, Patricia A; Friesen, Melissa C
2014-01-01
The reliability and validity of six experts’ exposure ratings were evaluated for 64 nickel-exposed and 72 chromium-exposed workers from six Shanghai electroplating plants based on airborne and urinary nickel and chromium measurements. Three industrial hygienists and three occupational physicians independently ranked the exposure intensity of each metal on an ordinal scale (1–4) for each worker's job in two rounds: the first round was based on responses to an occupational history questionnaire and the second round also included responses to an electroplating industry-specific questionnaire. Spearman correlation (rs) was used to compare each rating's validity to its corresponding subject-specific arithmetic mean of four airborne or four urinary measurements. Reliability was moderately-high (weighted kappa range=0.60–0.64). Validity was poor to moderate (rs= -0.37–0.46) for both airborne and urinary concentrations of both metals. For airborne nickel concentrations, validity differed by plant. For dichotomized metrics, sensitivity and specificity were higher based on urinary measurements (47–78%) than airborne measurements (16–50%). Few patterns were observed by metal, assessment round, or expert type. These results suggest that, for electroplating exposures, experts can achieve moderately-high agreement and (reasonably) distinguish between low and high exposures when reviewing responses to in-depth questionnaires used in population-based case-control studies. PMID:24736099
Esteba-Castillo, Susanna; Torrents-Rodas, David; García-Alba, Javier; Ribas-Vidal, Núria; Novell-Alsina, Ramon
2016-12-21
The Health of the Nation Outcome Scales for People with Learning Disabilities (HoNOS-LD) is a brief instrument that assesses functioning in people with intellectual development disorder and mental health problems/behaviour disorders. The aim of the present study was to examine the evidence on the validity of the scores based on the Spanish version of the HoNOS-LD. The study included 111 participants that were assessed by the Spanish version of the HoNOS-LD and other questionnaires that measured different variables related to the scale. Thirty-three participants were assessed by 2 examiners, and retested 7 days later, in order to study inter-examiner reliability and test-retest reliabilities. Based on clinical and conceptual criteria, and on the results of the parallel analysis, a factorial solution with one factor was selected. Internal consistency was good (Omega coefficient of 0.87). Inter-examiner and test-retest reliabilities were excellent (intraclass correlation coefficients of 0.95 and 0.98, respectively). Correlations between sections of the HoNOS-LD and the related instruments showed the expected direction, and were highly significant (P<.001), and the HoNOS-LD score increased with the intensity of the support required by the participants. These results showed evidence of the validity of association with other external variables. The Spanish version of the HoNOS-LD is a brief, valid and reliable instrument, which will enable a routine assessment of functioning for different uses, including diagnosis and intervention. Copyright © 2016 SEP y SEPB. Publicado por Elsevier España, S.L.U. All rights reserved.
Saub, R; Locker, D; Allison, P; Disman, M
2007-09-01
The aim of this project was to develop an oral health related-quality of life measure for the Malaysian adult population aged 18 and above by the cross-cultural adaption the Oral Health Impact Profile (OHIP). The adaptation of the OHIP was based on the framework proposed by Herdman et al (1998). The OHIP was translated into the Malay language using a forward-backward translation technique. Thirty-six patients were interviewed to assess the conceptual equivalence and relevancy of each item. Based on the translation process and interview results a Malaysian version of the OHIP questionnaire was produced that contained 45 items. It was designated as the OHIP(M). This questionnaire was pre-tested on 20 patients to assess its face validity. A short 14-item version of the questionnaire was completed by 171 patients to assess the suitability of the Likert-type response format. Field-testing was conducted in order to assess the suitability of two modes of administration (mail and interview) and to establish the psychometric properties of the adapted measure. The pre-testing revealed that the OHIP(M) has good face validity. It was found that the five-point frequency Likert scale could be used for the Malaysian population. The OHIP(M) was reliable, where the scale Cronbach's alpha was 0.95 and the ICC value for test-retest reliability was 0.79. Three out four construct validity hypotheses tested were confirmed. OHIP(M) works equally well as the English version. OHIP(M) was found to be reliable and valid regardless of the mode of administration. However, this study only provides initial evidence for the reliability and validity of the measure. Further study is recommended to collect more evidence to support these results.
Mapping functional connectivity
Peter Vogt; Joseph R. Ferrari; Todd R. Lookingbill; Robert H. Gardner; Kurt H. Riitters; Katarzyna Ostapowicz
2009-01-01
An objective and reliable assessment of wildlife movement is important in theoretical and applied ecology. The identification and mapping of landscape elements that may enhance functional connectivity is usually a subjective process based on visual interpretations of species movement patterns. New methods based on mathematical morphology provide a generic, flexible,...