ERIC Educational Resources Information Center
Spaan, Mary
2007-01-01
This article follows the development of test items (see "Language Assessment Quarterly", Volume 3 Issue 1, pp. 71-79 for the article "Test and Item Specifications Development"), beginning with a review of test and item specifications, then proceeding to writing and editing of items, pretesting and analysis, and finally selection of an item for a…
Tepe, Rodger; Tepe, Chabha
2015-03-01
To develop and psychometrically evaluate an information literacy (IL) self-efficacy survey and an IL knowledge test. In this test-retest reliability study, a 25-item IL self-efficacy survey and a 50-item IL knowledge test were developed and administered to a convenience sample of 53 chiropractic students. Item analyses were performed on all questions. The IL self-efficacy survey demonstrated good reliability (test-retest correlation = 0.81) and good/very good internal consistency (mean κ = .56 and Cronbach's α = .92). A total of 25 questions with the best item analysis characteristics were chosen from the 50-item IL knowledge test, resulting in a 25-item IL knowledge test that demonstrated good reliability (test-retest correlation = 0.87), very good internal consistency (mean κ = .69, KR20 = 0.85), and good item discrimination (mean point-biserial = 0.48). This study resulted in the development of three instruments: a 25-item IL self-efficacy survey, a 50-item IL knowledge test, and a 25-item IL knowledge test. The information literacy self-efficacy survey and the 25-item version of the information literacy knowledge test have shown preliminary evidence of adequate reliability and validity to justify continuing study with these instruments.
Tepe, Rodger; Tepe, Chabha
2015-01-01
Objective To develop and psychometrically evaluate an information literacy (IL) self-efficacy survey and an IL knowledge test. Methods In this test–retest reliability study, a 25-item IL self-efficacy survey and a 50-item IL knowledge test were developed and administered to a convenience sample of 53 chiropractic students. Item analyses were performed on all questions. Results The IL self-efficacy survey demonstrated good reliability (test–retest correlation = 0.81) and good/very good internal consistency (mean κ = .56 and Cronbach's α = .92). A total of 25 questions with the best item analysis characteristics were chosen from the 50-item IL knowledge test, resulting in a 25-item IL knowledge test that demonstrated good reliability (test–retest correlation = 0.87), very good internal consistency (mean κ = .69, KR20 = 0.85), and good item discrimination (mean point-biserial = 0.48). Conclusions This study resulted in the development of three instruments: a 25-item IL self-efficacy survey, a 50-item IL knowledge test, and a 25-item IL knowledge test. The information literacy self-efficacy survey and the 25-item version of the information literacy knowledge test have shown preliminary evidence of adequate reliability and validity to justify continuing study with these instruments. PMID:25517736
ERIC Educational Resources Information Center
Hewitt, Margaret A.; Homan, Susan P.
2004-01-01
Test validity issues considered by test developers and school districts rarely include individual item readability levels. In this study, items from a major standardized test were examined for individual item readability level and item difficulty. The Homan-Hewitt Readability Formula was applied to items across three grade levels. Results of…
ERIC Educational Resources Information Center
Matlock, Ki Lynn; Turner, Ronna
2016-01-01
When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…
Developing a Strategy for Using Technology-Enhanced Items in Large-Scale Standardized Tests
ERIC Educational Resources Information Center
Bryant, William
2017-01-01
As large-scale standardized tests move from paper-based to computer-based delivery, opportunities arise for test developers to make use of items beyond traditional selected and constructed response types. Technology-enhanced items (TEIs) have the potential to provide advantages over conventional items, including broadening construct measurement,…
Park, In Sook; Suh, Yeon Ok; Park, Hae Sook; Kang, So Young; Kim, Kwang Sung; Kim, Gyung Hee; Choi, Yeon-Hee; Kim, Hyun-Ju
2017-01-01
The purpose of this study was to improve the quality of items on the Korean Nursing Licensing Examination by developing and evaluating case-based items that reflect integrated nursing knowledge. We conducted a cross-sectional observational study to develop new case-based items. The methods for developing test items included expert workshops, brainstorming, and verification of content validity. After a mock examination of undergraduate nursing students using the newly developed case-based items, we evaluated the appropriateness of the items through classical test theory and item response theory. A total of 50 case-based items were developed for the mock examination, and content validity was evaluated. The question items integrated 34 discrete elements of integrated nursing knowledge. The mock examination was taken by 741 baccalaureate students in their fourth year of study at 13 universities. Their average score on the mock examination was 57.4, and the examination showed a reliability of 0.40. According to classical test theory, the average level of item difficulty of the items was 57.4% (80%-100% for 12 items; 60%-80% for 13 items; and less than 60% for 25 items). The mean discrimination index was 0.19, and was above 0.30 for 11 items and 0.20 to 0.29 for 15 items. According to item response theory, the item discrimination parameter (in the logistic model) was none for 10 items (0.00), very low for 20 items (0.01 to 0.34), low for 12 items (0.35 to 0.64), moderate for 6 items (0.65 to 1.34), high for 1 item (1.35 to 1.69), and very high for 1 item (above 1.70). The item difficulty was very easy for 24 items (below -2.0), easy for 8 items (-2.0 to -0.5), medium for 6 items (-0.5 to 0.5), hard for 3 items (0.5 to 2.0), and very hard for 9 items (2.0 or above). The goodness-of-fit test in terms of the 2-parameter item response model between the range of 2.0 to 0.5 revealed that 12 items had an ideal correct answer rate. We surmised that the low reliability of the mock examination was influenced by the timing of the test for the examinees and the inappropriate difficulty of the items. Our study suggested a methodology for the development of future case-based items for the Korean Nursing Licensing Examination.
The development of a science process assessment for fourth-grade students
NASA Astrophysics Data System (ADS)
Smith, Kathleen A.; Welliver, Paul W.
In this study, a multiple-choice test entitled the Science Process Assessment was developed to measure the science process skills of students in grade four. Based on the Recommended Science Competency Continuum for Grades K to 6 for Pennsylvania Schools, this instrument measured the skills of (1) observing, (2) classifying, (3) inferring, (4) predicting, (5) measuring, (6) communicating, (7) using space/time relations, (8) defining operationally, (9) formulating hypotheses, (10) experimenting, (11) recognizing variables, (12) interpreting data, and (13) formulating models. To prepare the instrument, classroom teachers and science educators were invited to participate in two science education workshops designed to develop an item bank of test questions applicable to measuring process skill learning. Participants formed writing teams and generated 65 test items representing the 13 process skills. After a comprehensive group critique of each item, 61 items were identified for inclusion into the Science Process Assessment item bank. To establish content validity, the item bank was submitted to a select panel of science educators for the purpose of judging item acceptability. This analysis yielded 55 acceptable test items and produced the Science Process Assessment, Pilot 1. Pilot 1 was administered to 184 fourth-grade students. Students were given a copy of the test booklet; teachers read each test aloud to the students. Upon completion of this first administration, data from the item analysis yielded a reliability coefficient of 0.73. Subsequently, 40 test items were identified for the Science Process Assessment, Pilot 2. Using the test-retest method, the Science Process Assessment, Pilot 2 (Test 1 and Test 2) was administered to 113 fourth-grade students. Reliability coefficients of 0.80 and 0.82, respectively, were ascertained. The correlation between Test 1 and Test 2 was 0.77. The results of this study indicate that (1) the Science Process Assessment, Pilot 2, is a valid and reliable instrument applicable to measuring the science process skills of students in grade four, (2) using educational workshops as a means of developing item banks of test questions is viable and productive in the test development process, and (3) involving classroom teachers and science educators in the test development process is educationally efficient and effective.
A Review of Classical Methods of Item Analysis.
ERIC Educational Resources Information Center
French, Christine L.
Item analysis is a very important consideration in the test development process. It is a statistical procedure to analyze test items that combines methods used to evaluate the important characteristics of test items, such as difficulty, discrimination, and distractibility of the items in a test. This paper reviews some of the classical methods for…
Current State of Test Development, Administration, and Analysis: A Study of Faculty Practices.
Bristol, Timothy J; Nelson, John W; Sherrill, Karin J; Wangerin, Virginia S
Developing valid and reliable test items is a critical skill for nursing faculty. This research analyzed the test item writing practice of 674 nursing faculty. Relationships between faculty characteristics and their test item writing practices were analyzed. Findings reveal variability in practice and a gap in implementation of evidence-based standards when developing and evaluating teacher-made examinations.
Development and Validation of a Computer Adaptive EFL Test
ERIC Educational Resources Information Center
He, Lianzhen; Min, Shangchao
2017-01-01
The first aim of this study was to develop a computer adaptive EFL test (CALT) that assesses test takers' listening and reading proficiency in English with dichotomous items and polytomous testlets. We reported in detail on the development of the CALT, including item banking, determination of suitable item response theory (IRT) models for item…
A Comparison of Three Types of Test Development Procedures Using Classical and Latent Trait Methods.
ERIC Educational Resources Information Center
Benson, Jeri; Wilson, Michael
Three methods of item selection were used to select sets of 38 items from a 50-item verbal analogies test and the resulting item sets were compared for internal consistency, standard errors of measurement, item difficulty, biserial item-test correlations, and relative efficiency. Three groups of 1,500 cases each were used for item selection. First…
The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory
ERIC Educational Resources Information Center
Sahin, Alper; Anil, Duygu
2017-01-01
This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…
Evaluating the Psychometric Characteristics of Generated Multiple-Choice Test Items
ERIC Educational Resources Information Center
Gierl, Mark J.; Lai, Hollis; Pugh, Debra; Touchie, Claire; Boulais, André-Philippe; De Champlain, André
2016-01-01
Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric…
Evaluation of Northwest University, Kano Post-UTME Test Items Using Item Response Theory
ERIC Educational Resources Information Center
Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi
2016-01-01
High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…
Item Specifications, Science Grade 8. Blue Prints for Testing Minimum Performance Test.
ERIC Educational Resources Information Center
Arkansas State Dept. of Education, Little Rock.
These item specifications were developed as a part of the Arkansas "Minimum Performance Testing Program" (MPT). There is one item specification for each instructional objective included in the MPT. The purpose of an item specification is to provide an overview of the general content and format of test items used to measure an…
Item Specifications, Science Grade 6. Blue Prints for Testing Minimum Performance Test.
ERIC Educational Resources Information Center
Arkansas State Dept. of Education, Little Rock.
These item specifications were developed as a part of the Arkansas "Minimum Performance Testing Program" (MPT). There is one item specification for each instructional objective included in the MPT. The purpose of an item specification is to provide an overview of the general content and format of test items used to measure an…
Item difficulty and item validity for the Children's Group Embedded Figures Test.
Rusch, R R; Trigg, C L; Brogan, R; Petriquin, S
1994-02-01
The validity and reliability of the Children's Group Embedded Figures Test was reported for students in Grade 2 by Cromack and Stone in 1980; however, a search of the literature indicates no evidence for internal consistency or item analysis. Hence the purpose of this study was to examine the item difficulty and item validity of the test with children in Grades 1 and 2. Confusion in the literature over development and use of this test was seemingly resolved through analysis of these descriptions and through an interview with the test developer. One early-appearing item was unreasonably difficult. Two or three other items were quite difficult and made little contribution to the total score. Caution is recommended, however, in any reordering or elimination of items based on these findings, given the limited number of subjects (n = 84).
Survey Development to Assess College Students' Perceptions of the Campus Environment.
Sowers, Morgan F; Colby, Sarah; Greene, Geoffrey W; Pickett, Mackenzie; Franzen-Castle, Lisa; Olfert, Melissa D; Shelnutt, Karla; Brown, Onikia; Horacek, Tanya M; Kidd, Tandalayo; Kattelmann, Kendra K; White, Adrienne A; Zhou, Wenjun; Riggsbee, Kristin; Yan, Wangcheng; Byrd-Bredbenner, Carol
2017-11-01
We developed and tested a College Environmental Perceptions Survey (CEPS) to assess college students' perceptions of the healthfulness of their campus. CEPS was developed in 3 stages: questionnaire development, validity testing, and reliability testing. Questionnaire development was based on an extensive literature review and input from an expert panel to establish content validity. Face validity was established with the target population using cognitive interviews with 100 college students. Concurrent-criterion validity was established with in-depth interviews (N = 30) of college students compared to surveys completed by the same 30 students. Surveys completed by college students from 8 universities (N = 1147) were used to test internal structure (factor analysis) and internal consistency (Cronbach's alpha). After development and testing, 15 items remained from the original 48 items. A 5-factor solution emerged: physical activity (4 items, α = .635), water (3 items, α = .773), vending (2 items, α = .680), healthy food (2 items, α = .631), and policy (2 items, α = .573). The mean total score for all universities was 62.71 (±11.16) on a 100-point scale. CEPS appears to be a valid and reliable tool for assessing college students' perceptions of their health-related campus environment.
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Garcia, Sofia F.; Hahn, Elizabeth A.; Magasi, Susan; Lai, Jin-Shei; Semik, Patrick; Hammel, Joy; Heinemann, Allen W.
2014-01-01
Objective To describe the development of new self-report measures of social attitudes that act as environmental facilitators or barriers to the participation of people with disabilities in society. Design A mixed methods approach included a literature review; item classification, selection and writing; cognitive interviews and field testing with participants with spinal cord injury (SCI), traumatic brain injury (TBI) or stroke; and rating scale analysis to evaluate initial psychometric properties. Setting General community. Participants Nine individuals with SCI, TBI or stroke participated in cognitive interviews; 305 community residents with those same conditions participated in field testing. Interventions None. Main Outcome Measure(s) Self-report item pool of social attitudes that act as facilitators or barriers to people with disabilities participating in society. Results An interdisciplinary team of experts classified 710 existing social environment items into content areas and wrote 32 new items. Additional qualitative item review included item refinement and winnowing of the pool prior to cognitive interviews and field testing 82 items. Field test data indicated that the pool satisfies a one-parameter item response theory measurement model and would be appropriate for development into a calibrated item bank. Conclusions Our qualitative item review process supported a social environment conceptual framework that includes both social support and social attitudes. We developed a new social attitudes self-report item pool. Calibration testing of that pool is underway with a larger sample in order to develop a social attitudes item bank for persons with disabilities. PMID:25045803
Garcia, Sofia F; Hahn, Elizabeth A; Magasi, Susan; Lai, Jin-Shei; Semik, Patrick; Hammel, Joy; Heinemann, Allen W
2015-04-01
To describe the development of new self-report measures of social attitudes that act as environmental facilitators or barriers to the participation of people with disabilities in society. A mixed-methods approach included a literature review; item classification, selection, and writing; cognitive interviews and field testing of participants with spinal cord injury (SCI), traumatic brain injury (TBI), or stroke; and rating scale analysis to evaluate initial psychometric properties. General community. Individuals with SCI, TBI, or stroke participated in cognitive interviews (n=9); community residents with those same conditions participated in field testing (n=305). None. Self-report item pool of social attitudes that act as facilitators or barriers to people with disabilities participating in society. An interdisciplinary team of experts classified 710 existing social environment items into content areas and wrote 32 new items. Additional qualitative item review included item refinement and winnowing of the pool prior to cognitive interviews and field testing of 82 items. Field test data indicated that the pool satisfies a 1-parameter item response theory measurement model and would be appropriate for development into a calibrated item bank. Our qualitative item review process supported a social environment conceptual framework that includes both social support and social attitudes. We developed a new social attitudes self-report item pool. Calibration testing of that pool is underway with a larger sample to develop a social attitudes item bank for persons with disabilities. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Kisala, Pamela A.; Victorson, David; Pace, Natalie; Heinemann, Allen W.; Choi, Seung W.; Tulsky, David S.
2015-01-01
Objective To describe the development and psychometric properties of the SCI-QOL Psychological Trauma item bank and short form. Design Using a mixed-methods design, we developed and tested a Psychological Trauma item bank with patient and provider focus groups, cognitive interviews, and item response theory based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. Setting We tested a 31-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Veterans Administration hospital. Participants A total of 716 individuals with SCI completed the trauma items Results The 31 items fit a unidimensional model (CFI=0.952; RMSEA=0.061) and demonstrated good precision (theta range between 0.6 and 2.5). Nine items demonstrated negligible DIF with little impact on score estimates. The final calibrated item bank contains 19 items Conclusion The SCI-QOL Psychological Trauma item bank is a psychometrically robust measurement tool from which a short form and a computer adaptive test (CAT) version are available. PMID:26010967
Machine Shop. Criterion-Referenced Test (CRT) Item Bank.
ERIC Educational Resources Information Center
Davis, Diane, Ed.
This drafting criterion-referenced test item bank is keyed to the machine shop competency profile developed by industry and education professionals in Missouri. The 16 references used for drafting the test items are listed. Test items are arranged under these categories: orientation to machine shop; performing mathematical calculations; performing…
ERIC Educational Resources Information Center
Gierl, Mark J.; Lai, Hollis
2013-01-01
Changes to the design and development of our educational assessments are resulting in the unprecedented demand for a large and continuous supply of content-specific test items. One way to address this growing demand is with automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer…
An Effect Size Measure for Raju's Differential Functioning for Items and Tests
ERIC Educational Resources Information Center
Wright, Keith D.; Oshima, T. C.
2015-01-01
This study established an effect size measure for differential functioning for items and tests' noncompensatory differential item functioning (NCDIF). The Mantel-Haenszel parameter served as the benchmark for developing NCDIF's effect size measure for reporting moderate and large differential item functioning in test items. The effect size of…
Detecting a Gender-Related DIF Using Logistic Regression and Transformed Item Difficulty
ERIC Educational Resources Information Center
Abedlaziz, Nabeel; Ismail, Wail; Hussin, Zaharah
2011-01-01
Test items are designed to provide information about the examinees. Difficult items are designed to be more demanding and easy items are less so. However, sometimes, test items carry with their demands other than those intended by the test developer (Scheuneman & Gerritz, 1990). When personal attributes such as gender systematically affect…
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Development of knowledge tests for multi-disciplinary emergency training: a review and an example.
Sørensen, J L; Thellesen, L; Strandbygaard, J; Svendsen, K D; Christensen, K B; Johansen, M; Langhoff-Roos, P; Ekelund, K; Ottesen, B; Van Der Vleuten, C
2015-01-01
The literature is sparse on written test development in a post-graduate multi-disciplinary setting. Developing and evaluating knowledge tests for use in multi-disciplinary post-graduate training is challenging. The objective of this study was to describe the process of developing and evaluating a multiple-choice question (MCQ) test for use in a multi-disciplinary training program in obstetric-anesthesia emergencies. A multi-disciplinary working committee with 12 members representing six professional healthcare groups and another 28 participants were involved. Recurrent revisions of the MCQ items were undertaken followed by a statistical analysis. The MCQ items were developed stepwise, including decisions on aims and content, followed by testing for face and content validity, construct validity, item-total correlation, and reliability. To obtain acceptable content validity, 40 out of originally 50 items were included in the final MCQ test. The MCQ test was able to distinguish between levels of competence, and good construct validity was indicated by a significant difference in the mean score between consultants and first-year trainees, as well as between first-year trainees and medical and midwifery students. Evaluation of the item-total correlation analysis in the 40 items set revealed that 11 items needed re-evaluation, four of which addressed content issues in local clinical guidelines. A Cronbach's alpha of 0.83 for reliability was found, which is acceptable. Content and construct validity and reliability were acceptable. The presented template for the development of this MCQ test could be useful to others when developing knowledge tests and may enhance the overall quality of test development. © 2014 The Acta Anaesthesiologica Scandinavica Foundation. Published by John Wiley & Sons Ltd.
Validation of a clinical critical thinking skills test in nursing.
Shin, Sujin; Jung, Dukyoo; Kim, Sungeun
2015-01-27
The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability.
Validation of a clinical critical thinking skills test in nursing
2015-01-01
Purpose: The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. Methods: This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Results: Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. Conclusion: From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability. PMID:25622716
ERIC Educational Resources Information Center
Quaigrain, Kennedy; Arhin, Ato Kwamina
2017-01-01
Item analysis is essential in improving items which will be used again in later tests; it can also be used to eliminate misleading items in a test. The study focused on item and test quality and explored the relationship between difficulty index (p-value) and discrimination index (DI) with distractor efficiency (DE). The study was conducted among…
The Role of Item Models in Automatic Item Generation
ERIC Educational Resources Information Center
Gierl, Mark J.; Lai, Hollis
2012-01-01
Automatic item generation represents a relatively new but rapidly evolving research area where cognitive and psychometric theories are used to produce tests that include items generated using computer technology. Automatic item generation requires two steps. First, test development specialists create item models, which are comparable to templates…
An Efficiency Balanced Information Criterion for Item Selection in Computerized Adaptive Testing
ERIC Educational Resources Information Center
Han, Kyung T.
2012-01-01
Successful administration of computerized adaptive testing (CAT) programs in educational settings requires that test security and item exposure control issues be taken seriously. Developing an item selection algorithm that strikes the right balance between test precision and level of item pool utilization is the key to successful implementation…
Development and validation of an energy-balance knowledge test for fourth- and fifth-grade students.
Chen, Senlin; Zhu, Xihe; Kang, Minsoo
2017-05-01
A valid test measuring children's energy-balance (EB) knowledge is lacking in research. This study developed and validated the energy-balance knowledge test (EBKT) for fourth and fifth grade students. The original EBKT contained 25 items but was reduced to 23 items based on pilot result and intensive expert panel discussion. De-identified data were collected from 468 fourth and fifth grade students enrolled in four schools to examine the psychometric properties of the EBKT items. The Rasch model analysis was conducted using the Winstep 3.65.0 software. Differential item functioning (DIF) analysis flagged 1 item (item #4) functioning differently between boys and girls, which was deleted. The final 22-item EBKT showed desirable model-data fit indices. The items had large variability ranging from -3.58 logit (item #10, the easiest) to 1.70 logit (item #3, the hardest). The average person ability on the test was 0.28 logit (SD = .78). Additional analyses supported known-group difference validity of the EBKT scores in capturing gender- and grade-based ability differences. The test was overall valid but could be further improved by expanding test items to discern various ability levels. For lack of a better test, researchers and practitioners may use the EBKT to assess fourth- and fifth-grade students' EB knowledge.
McInnes, Matthew D F; Moher, David; Thombs, Brett D; McGrath, Trevor A; Bossuyt, Patrick M; Clifford, Tammy; Cohen, Jérémie F; Deeks, Jonathan J; Gatsonis, Constantine; Hooft, Lotty; Hunt, Harriet A; Hyde, Christopher J; Korevaar, Daniël A; Leeflang, Mariska M G; Macaskill, Petra; Reitsma, Johannes B; Rodin, Rachel; Rutjes, Anne W S; Salameh, Jean-Paul; Stevens, Adrienne; Takwoingi, Yemisi; Tonelli, Marcello; Weeks, Laura; Whiting, Penny; Willis, Brian H
2018-01-23
Systematic reviews of diagnostic test accuracy synthesize data from primary diagnostic studies that have evaluated the accuracy of 1 or more index tests against a reference standard, provide estimates of test performance, allow comparisons of the accuracy of different tests, and facilitate the identification of sources of variability in test accuracy. To develop the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) diagnostic test accuracy guideline as a stand-alone extension of the PRISMA statement. Modifications to the PRISMA statement reflect the specific requirements for reporting of systematic reviews and meta-analyses of diagnostic test accuracy studies and the abstracts for these reviews. Established standards from the Enhancing the Quality and Transparency of Health Research (EQUATOR) Network were followed for the development of the guideline. The original PRISMA statement was used as a framework on which to modify and add items. A group of 24 multidisciplinary experts used a systematic review of articles on existing reporting guidelines and methods, a 3-round Delphi process, a consensus meeting, pilot testing, and iterative refinement to develop the PRISMA diagnostic test accuracy guideline. The final version of the PRISMA diagnostic test accuracy guideline checklist was approved by the group. The systematic review (produced 64 items) and the Delphi process (provided feedback on 7 proposed items; 1 item was later split into 2 items) identified 71 potentially relevant items for consideration. The Delphi process reduced these to 60 items that were discussed at the consensus meeting. Following the meeting, pilot testing and iterative feedback were used to generate the 27-item PRISMA diagnostic test accuracy checklist. To reflect specific or optimal contemporary systematic review methods for diagnostic test accuracy, 8 of the 27 original PRISMA items were left unchanged, 17 were modified, 2 were added, and 2 were omitted. The 27-item PRISMA diagnostic test accuracy checklist provides specific guidance for reporting of systematic reviews. The PRISMA diagnostic test accuracy guideline can facilitate the transparent reporting of reviews, and may assist in the evaluation of validity and applicability, enhance replicability of reviews, and make the results from systematic reviews of diagnostic test accuracy studies more useful.
A Stepwise Test Characteristic Curve Method to Detect Item Parameter Drift
ERIC Educational Resources Information Center
Guo, Rui; Zheng, Yi; Chang, Hua-Hua
2015-01-01
An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the…
Shen, Linjun; Li, Feiming; Wattleworth, Roberta; Filipetto, Frank
2010-10-01
The Comprehensive Osteopathic Medical Licensing Examination conducted a trial of multimedia items in the 2008-2009 Level 3 testing cycle to determine (1) if multimedia items were able to test additional elements of medical knowledge and skills and (2) how to develop effective multimedia items. Forty-four content-matched multimedia and text multiple-choice items were randomly delivered to Level 3 candidates. Logistic regression and paired-samples t tests were used for pairwise and group-level comparisons, respectively. Nine pairs showed significant differences in either difficulty or/and discrimination. Content analysis found that, if text narrations were less direct, multimedia materials could make items easier. When textbook terminologies were replaced by multimedia presentations, multimedia items could become more difficult. Moreover, a multimedia item was found not uniformly difficult for candidates at different ability levels, possibly because multimedia and text items tested different elements of a same concept. Multimedia items may be capable of measuring some constructs different from what text items can measure. Effective multimedia items with reasonable psychometric properties can be intentionally developed.
Flens, Gerard; Smits, Niels; Terwee, Caroline B; Dekker, Joost; Huijbrechts, Irma; de Beurs, Edwin
2017-03-01
We developed a Dutch-Flemish version of the patient-reported outcomes measurement information system (PROMIS) adult V1.0 item bank for depression as input for computerized adaptive testing (CAT). As item bank, we used the Dutch-Flemish translation of the original PROMIS item bank (28 items) and additionally translated 28 U.S. depression items that failed to make the final U.S. item bank. Through psychometric analysis of a combined clinical and general population sample ( N = 2,010), 8 added items were removed. With the final item bank, we performed several CAT simulations to assess the efficiency of the extended (48 items) and the original item bank (28 items), using various stopping rules. Both item banks resulted in highly efficient and precise measurement of depression and showed high similarity between the CAT simulation scores and the full item bank scores. We discuss the implications of using each item bank and stopping rule for further CAT development.
Classen, Sherrilene; Winter, Sandra M.; Velozo, Craig A.; Bédard, Michel; Lanford, Desiree N.; Brumback, Babette; Lutz, Barbara J.
2010-01-01
OBJECTIVE We report on item development and validity testing of a self-report older adult safe driving behaviors measure (SDBM). METHOD On the basis of theoretical frameworks (Precede–Proceed Model of Health Promotion, Haddon’s matrix, and Michon’s model), existing driving measures, and previous research and guided by measurement theory, we developed items capturing safe driving behavior. Item development was further informed by focus groups. We established face validity using peer reviewers and content validity using expert raters. RESULTS Peer review indicated acceptable face validity. Initial expert rater review yielded a scale content validity index (CVI) rating of 0.78, with 44 of 60 items rated ≥0.75. Sixteen unacceptable items (≤0.5) required major revision or deletion. The next CVI scale average was 0.84, indicating acceptable content validity. CONCLUSION The SDBM has relevance as a self-report to rate older drivers. Future pilot testing of the SDBM comparing results with on-road testing will define criterion validity. PMID:20437917
Item Pool Design for an Operational Variable-Length Computerized Adaptive Test
ERIC Educational Resources Information Center
He, Wei; Reckase, Mark D.
2014-01-01
For computerized adaptive tests (CATs) to work well, they must have an item pool with sufficient numbers of good quality items. Many researchers have pointed out that, in developing item pools for CATs, not only is the item pool size important but also the distribution of item parameters and practical considerations such as content distribution…
Test item linguistic complexity and assessments for deaf students.
Cawthon, Stephanie
2011-01-01
Linguistic complexity of test items is one test format element that has been studied in the context of struggling readers and their participation in paper-and-pencil tests. The present article presents findings from an exploratory study on the potential relationship between linguistic complexity and test performance for deaf readers. A total of 64 students completed 52 multiple-choice items, 32 in mathematics and 20 in reading. These items were coded for linguistic complexity components of vocabulary, syntax, and discourse. Mathematics items had higher linguistic complexity ratings than reading items, but there were no significant relationships between item linguistic complexity scores and student performance on the test items. The discussion addresses issues related to the subject area, student proficiency levels in the test content, factors to look for in determining a "linguistic complexity effect," and areas for further research in test item development and deaf students.
NASA Astrophysics Data System (ADS)
Wren, David A.
The research presented in this dissertation culminated in a 10-item Thermochemistry Concept Inventory (TCI). The development of the TCI can be divided into two main phases: qualitative studies and quantitative studies. Both phases focused on the primary stakeholders of the TCI, college-level general chemistry instructors and students. Each phase was designed to collect evidence for the validity of the interpretations and uses of TCI testing data. A central use of TCI testing data is to identify student conceptual misunderstandings, which are represented as incorrect options of multiple-choice TCI items. Therefore, quantitative and qualitative studies focused heavily on collecting evidence at the item-level, where important interpretations may be made by TCI users. Qualitative studies included student interviews (N = 28) and online expert surveys (N = 30). Think-aloud student interviews (N = 12) were used to identify conceptual misunderstandings used by students. Novice response process validity interviews (N = 16) helped provide information on how students interpreted and answered TCI items and were the basis of item revisions. Practicing general chemistry instructors (N = 18), or experts, defined boundaries of thermochemistry content included on the TCI. Once TCI items were in the later stages of development, an online version of the TCI was used in expert response process validity survey (N = 12), to provide expert feedback on item content, format and consensus of the correct answer for each item. Quantitative studies included three phases: beta testing of TCI items (N = 280), pilot testing of the a 12-item TCI (N = 485), and a large data collection using a 10-item TCI ( N = 1331). In addition to traditional classical test theory analysis, Rasch model analysis was also used for evaluation of testing data at the test and item level. The TCI was administered in both formative assessment (beta and pilot testing) and summative assessment (large data collection), with items performing well in both. One item, item K, did not have acceptable psychometric properties when the TCI was used as a quiz (summative assessment), but was retained in the final version of the TCI based on the acceptable psychometric properties displayed in pilot testing (formative assessment).
Fajrianthi; Zein, Rizqy Amelia
2017-01-01
This study aimed to develop an emotional intelligence (EI) test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA]) was designed to measure three EI domains: 1) emotional appraisal, 2) emotional recognition, and 3) emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT) approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA) and item response theory (IRT) were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF) was 3.414 (ability level = 0) for subset 1, 12.183 for subset 2 (ability level = -2), and 2.398 for subset 3 (level of ability = -2). It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA's item analysis and dimensionality test of each TKEA subset.
Kalpakjian, Claire Z.; Tate, Denise G.; Kisala, Pamela A.; Tulsky, David S.
2015-01-01
Objective To describe the development and psychometric properties of the Spinal Cord Injury-Quality of Life (SCI-QOL) Self-esteem item bank. Design Using a mixed-methods design, we developed and tested a self-esteem item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory- (IRT) based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. Setting We tested a pool of 30 items at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters/Bronx Department of Veterans Affairs hospital. Participants A total of 717 individuals with SCI completed the self-esteem items. Results A unidimensional model was observed (CFI = 0.946; RMSEA = 0.087) and measurement precision was good (theta range between −2.7 and 0.7). Eleven items were flagged for DIF; however, effect sizes were negligible with little practical impact on score estimates. The final calibrated item bank resulted in 23 retained items. Conclusion This study indicates that the SCI-QOL Self-esteem item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010972
Kalpakjian, Claire Z; Tate, Denise G; Kisala, Pamela A; Tulsky, David S
2015-05-01
To describe the development and psychometric properties of the Spinal Cord Injury-Quality of Life (SCI-QOL) Self-esteem item bank. Using a mixed-methods design, we developed and tested a self-esteem item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory-(IRT) based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. We tested a pool of 30 items at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters/Bronx Department of Veterans Affairs hospital. A total of 717 individuals with SCI completed the self-esteem items. A unidimensional model was observed (CFI=0.946; RMSEA=0.087) and measurement precision was good (theta range between -2.7 and 0.7). Eleven items were flagged for DIF; however, effect sizes were negligible with little practical impact on score estimates. The final calibrated item bank resulted in 23 retained items. This study indicates that the SCI-QOL Self-esteem item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available.
Victorson, David; Tulsky, David S; Kisala, Pamela A; Kalpakjian, Claire Z; Weiland, Brian; Choi, Seung W
2015-05-01
To describe the development and psychometric properties of the Spinal Cord Injury--Quality of Life (SCI-QOL) Resilience item bank and short form. Using a mixed-methods design, we developed and tested a resilience item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory based analytic approaches, including tests of model fit and differential item functioning (DIF). We tested a 32-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs medical center. A total of 717 individuals with SCI completed the Resilience items. A unidimensional model was observed (CFI=0.968; RMSEA=0.074) and measurement precision was good (theta range between -3.1 and 0.9). Ten items were flagged for DIF, however, after examination of effect sizes we found this to be negligible with little practical impact on score estimates. The final calibrated item bank resulted in 21 retained items. This study indicates that the SCI-QOL Resilience item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available.
Victorson, David; Tulsky, David S.; Kisala, Pamela A.; Kalpakjian, Claire Z.; Weiland, Brian; Choi, Seung W.
2015-01-01
Objective To describe the development and psychometric properties of the Spinal Cord Injury - Quality of Life (SCI-QOL) Resilience item bank and short form. Design Using a mixed-methods design, we developed and tested a resilience item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory based analytic approaches, including tests of model fit and differential item functioning (DIF). Setting We tested a 32-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs medical center. Participants A total of 717 individuals with SCI completed the Resilience items. Results A unidimensional model was observed (CFI = 0.968; RMSEA = 0.074) and measurement precision was good (theta range between −3.1 and 0.9). Ten items were flagged for DIF, however, after examination of effect sizes we found this to be negligible with little practical impact on score estimates. The final calibrated item bank resulted in 21 retained items. Conclusion This study indicates that the SCI-QOL Resilience item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010971
Development of an item bank and computer adaptive test for role functioning.
Anatchkova, Milena D; Rose, Matthias; Ware, John E; Bjorner, Jakob B
2012-11-01
Role functioning (RF) is a key component of health and well-being and an important outcome in health research. The aim of this study was to develop an item bank to measure impact of health on role functioning. A set of different instruments including 75 newly developed items asking about the impact of health on role functioning was completed by 2,500 participants. Established item response theory methods were used to develop an item bank based on the generalized partial credit model. Comparison of group mean bank scores of participants with different self-reported general health status and chronic conditions was used to test the external validity of the bank. After excluding items that did not meet established requirements, the final item bank consisted of a total of 64 items covering three areas of role functioning (family, social, and occupational). Slopes in the bank ranged between .93 and 4.37; the mean threshold range was -1.09 to -2.25. Item bank-based scores were significantly different for participants with and without chronic conditions and with different levels of self-reported general health. An item bank assessing health impact on RF across three content areas has been successfully developed. The bank can be used for development of short forms or computerized adaptive tests to be applied in the assessment of role functioning as one of the common denominators across applications of generic health assessment.
Bleau Lavigne, Maude; Reeves, Isabelle; Sasseville, Marie-Josée; Loignon, Christine
The primary purpose of this study was to develop 2 survey tools to explore factors influencing adoption of best practices for diabetic foot ulcer offloading treatment in primary health care settings. One survey was intended for the patients receiving care for a diabetic foot ulcer in primary health care settings and the other was intended for the health professionals providing treatment. The second purpose of this study was to evaluate the psychometric properties of the 2 surveys. Development and validation of survey instruments. Two surveys were developed using a published guide. Following review of pertinent literature and identification of variables to be measured, a bank of items was developed and pretested to determine clarity of the item and responses. Psychometric testing comprised measurement of content validity index (CVI) and intraclass correlation coefficient (ICC). Only items obtaining satisfactory CVI and ICC scores were included in the final version of the surveys. The final version of the patient survey contained 41 items and the final version of the survey for health care professionals contained 21 items. The patient-intended survey's items demonstrate high content validity scores and satisfactory test-retest reliability scores. The overall CVI score was 0.98. Forty of the 49 items eligible for testing obtain satisfactory ICC scores. One item's test-retest reliability could not be tested but it was retained based on its high CVI. The health professional-intended survey, an overall CVI score of 0.91 but items had lower ICC scores (63%, 31 of the 49 items), did not achieve a satisfactory ICC score for inclusion in the final instrument. This project led to development of 2 instruments designed to identify and explore factors influencing adoption of best practices for diabetic foot ulcer offloading treatment in the primary health care setting. Future research and testing is required to translate these French surveys into English and additional languages, in order to reach a broader population.
ERIC Educational Resources Information Center
Senarat, Somprasong; Tayraukham, Sombat; Piyapimonsit, Chatsiri; Tongkhambanjong, Sakesan
2013-01-01
The purpose of this research is to develop a multidimensional computerized adaptive test for diagnosing the cognitive process of grade 7 students in learning algebra by applying multidimensional item response theory. The research is divided into 4 steps: 1) the development of item bank of algebra, 2) the development of the multidimensional…
75 FR 43515 - National Assessment Governing Board; Meeting
Federal Register 2010, 2011, 2012, 2013, 2014
2010-07-26
... frameworks, developing appropriate student achievement levels for each grade and subject tested, developing... 12 economics, grades 4 and 8 reading, and grades 4 and 8 writing. The writing items are for the 2011 operational assessment; the reading items are for the 2013 pilot test; and the economics items are for the...
ERIC Educational Resources Information Center
Yoon, Su-Youn; Lee, Chong Min; Houghton, Patrick; Lopez, Melissa; Sakano, Jennifer; Loukina, Anastasia; Krovetz, Bob; Lu, Chi; Madani, Nitin
2017-01-01
In this study, we developed assistive tools and resources to support TOEIC® Listening test item generation. There has recently been an increased need for a large pool of items for these tests. This need has, in turn, inspired efforts to increase the efficiency of item generation while maintaining the quality of the created items. We aimed to…
ERIC Educational Resources Information Center
Nissan, Susan; And Others
One of the item types in the Listening Comprehension section of the Test of English as a Foreign Language (TOEFL) test is the dialogue. Because the dialogue item pool needs to have an appropriate balance of items at a range of difficulty levels, test developers have examined items at various difficulty levels in an attempt to identify their…
Exposure Control Using Adaptive Multi-Stage Item Bundles.
ERIC Educational Resources Information Center
Luecht, Richard M.
This paper presents a multistage adaptive testing test development paradigm that promises to handle content balancing and other test development needs, psychometric reliability concerns, and item exposure. The bundled multistage adaptive testing (BMAT) framework is a modification of the computer-adaptive sequential testing framework introduced by…
Science Library of Test Items. Volume Two.
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
The second volume of test items in the Science Library of Test Items is intended as a resource to assist teachers in implementing and evaluating science courses in the first 4 years of Australian secondary school. The items were selected from questions submitted to the School Certificate Development Unit by teachers in New South Wales. Only the…
The Development of a Pediatric Inpatient Experience of Care Measure: Child HCAHPS®
Toomey, Sara L.; Zaslavsky, Alan M.; Elliott, Marc N.; Gallagher, Patricia M.; Fowler, Floyd J.; Klein, David J.; Shulman, Shanna; Ratner, Jessica; McGovern, Caitriona; LeBlanc, Jessica L.; Schuster, Mark A.
2016-01-01
CMS uses Adult HCAHPS® scores for public reporting and pay-for-performance for most U.S. hospitals, but no publicly available standardized survey of inpatient experience of care exists for pediatrics. To fill the gap, CMS/AHRQ commissioned the development of the Consumer Assessment of Healthcare Providers and Systems Hospital Survey – Child Version (Child HCAHPS), a survey of parents/guardians of pediatric patients (<18 years old) who were recently hospitalized. This Special Article describes the development of Child HCAHPS, which included an extensive review of the literature and quality measures, expert interviews, focus groups, cognitive testing, pilot testing of the draft survey, a national field test with 69 hospitals in 34 states, psychometric analysis, and end-user testing of the final survey. We conducted extensive validity and reliability testing to determine which items would be included in the final survey instrument and to develop composite measures. We analyzed national field test data from 17,727 surveys collected from 11/12-1/14 from parents of recently hospitalized children. The final Child HCAHPS instrument has 62 items, including 39 patient experience items, 10 screeners, 12 demographic/descriptive items, and 1 open-ended item. The 39 experience items are categorized based on testing into 18 composite and single-item measures. Our composite and single-item measures demonstrated good to excellent hospital-level reliability at 300 responses per hospital. Child HCAHPS was developed to be a publicly available standardized survey of pediatric inpatient experience of care. It can be used to benchmark pediatric inpatient experience across hospitals and assist in efforts to improve the quality of inpatient care. PMID:26195542
Evaluation of Item Candidates: The PROMIS Qualitative Item Review
DeWalt, Darren A.; Rothrock, Nan; Yount, Susan; Stone, Arthur A.
2009-01-01
One of the PROMIS (Patient-Reported Outcome Measurement Information System) network's primary goals is the development of a comprehensive item bank for patient-reported outcomes of chronic diseases. For its first set of item banks, PROMIS chose to focus on pain, fatigue, emotional distress, physical function, and social function. An essential step for the development of an item pool is the identification, evaluation, and revision of extant questionnaire items for the core item pool. In this work, we also describe the systematic process wherein items are classified for subsequent statistical processing by the PROMIS investigators. Six phases of item development are documented: identification of extant items, item classification and selection, item review and revision, focus group input on domain coverage, cognitive interviews with individual items, and final revision before field testing. Identification of items refers to the systematic search for existing items in currently available scales. Expert item review and revision was conducted by trained professionals who reviewed the wording of each item and revised as appropriate for conventions adopted by the PROMIS network. Focus groups were used to confirm domain definitions and to identify new areas of item development for future PROMIS item banks. Cognitive interviews were used to examine individual items. Items successfully screened through this process were sent to field testing and will be subjected to innovative scale construction procedures. PMID:17443114
ERIC Educational Resources Information Center
Doran, Rodney L.; Pella, Milton O.
The purpose of this study was to develop tests items with a minimum reading demand for use with pupils at grade levels two through six. An item was judged to be acceptable if the item satisfied at least four of six criteria. Approximately 250 students in grades 2-6 participated in the study. Half of the students were given instruction to develop…
ERIC Educational Resources Information Center
Howard, Melissa M.; Weiler, Robert M.; Haddox, J. David
2009-01-01
Background: The purpose of this study was to develop and test the reliability of self-report survey items designed to monitor the nonmedical use of prescription drugs among adolescents. Methods: Eighteen nonmedical prescription drug items designed to be congruent with the substance abuse items in the US Centers for Disease Control and Prevention's…
Yost, Kathleen J; Webster, Kimberly; Baker, David W; Choi, Seung W; Bode, Rita K; Hahn, Elizabeth A
2009-06-01
Current health literacy measures are too long, imprecise, or have questionable equivalence of English and Spanish versions. The purpose of this paper is to describe the development and pilot testing of a new bilingual computer-based health literacy assessment tool. We analyzed literacy data from three large studies. Using a working definition of health literacy, we developed new prose, document and quantitative items in English and Spanish. Items were pilot tested on 97 English- and 134 Spanish-speaking participants to assess item difficulty. Items covered topics relevant to primary care patients and providers. English- and Spanish-speaking participants understood the tasks involved in answering each type of question. The English Talking Touchscreen was easy to use and the English and Spanish items provided good coverage of the difficulty continuum. Qualitative and quantitative results provided useful information on computer acceptability and initial item difficulty. After the items have been administered on the Talking Touchscreen (la Pantalla Parlanchina) to 600 English-speaking (and 600 Spanish-speaking) primary care patients, we will develop a computer adaptive test. This health literacy tool will enable clinicians and researchers to more precisely determine the level at which low health literacy adversely affects health and healthcare utilization.
Factors Affecting Item Difficulty in English Listening Comprehension Tests
ERIC Educational Resources Information Center
Sung, Pei-Ju; Lin, Su-Wei; Hung, Pi-Hsia
2015-01-01
Task difficulty is a critical issue affecting test developers. Controlling or balancing the item difficulty of an assessment improves its validity and discrimination. Test developers construct tests from the cognitive perspective, by making the test constructing process more scientific and efficient; thus, the scores obtained more precisely…
NASA Astrophysics Data System (ADS)
Siswaningsih, W.; Firman, H.; Zackiyah; Khoirunnisa, A.
2017-02-01
The aim of this study was to develop the two-tier pictorial-based diagnostic test for identifying student misconceptions on mole concept. The method of this study is used development and validation. The development of the test Obtained through four phases, development of any items, validation, determination key, and application test. Test was developed in the form of pictorial consisting of two tier, the first tier Consist of four possible answers and the second tier Consist of four possible reasons. Based on the results of content validity of 20 items using the CVR (Content Validity Ratio), a number of 18 items declared valid. Based on the results of the reliability test using SPSS, Obtained 17 items with Cronbach’s Alpha value of 0703, the which means that items have accepted. A total of 10 items was conducted to 35 students of senior high school students who have studied the mole concept on one of the high schools in Cimahi. Based on the results of the application test, student misconceptions were identified in each label concept in mole concept with the percentage of misconceptions on the label concept of mole (60.15%), Avogadro’s number (34.28%), relative atomic mass (62, 84%), relative molecule mass (77.08%), molar mass (68.53%), molar volume of gas (57.11%), molarity (71.32%), chemical equation (82.77%), limiting reactants (91.40%), and molecular formula (77.13%).
Bernhardt, Jay M; Stellefson, Michael; Weiler, Robert M; Anderson-Lewis, Charkarra; Miller, M David; MacInnes, Jann
2015-01-01
Background Social media can promote healthy behaviors by facilitating engagement and collaboration among health professionals and the public. Thus, social media is quickly becoming a vital tool for health promotion. While guidelines and trainings exist for public health professionals, there are currently no standardized measures to assess individual social media competency among Certified Health Education Specialists (CHES) and Master Certified Health Education Specialists (MCHES). Objective The aim of this study was to design, develop, and test the Social Media Competency Inventory (SMCI) for CHES and MCHES. Methods The SMCI was designed in three sequential phases: (1) Conceptualization and Domain Specifications, (2) Item Development, and (3) Inventory Testing and Finalization. Phase 1 consisted of a literature review, concept operationalization, and expert reviews. Phase 2 involved an expert panel (n=4) review, think-aloud sessions with a small representative sample of CHES/MCHES (n=10), a pilot test (n=36), and classical test theory analyses to develop the initial version of the SMCI. Phase 3 included a field test of the SMCI with a random sample of CHES and MCHES (n=353), factor and Rasch analyses, and development of SMCI administration and interpretation guidelines. Results Six constructs adapted from the unified theory of acceptance and use of technology and the integrated behavioral model were identified for assessing social media competency: (1) Social Media Self-Efficacy, (2) Social Media Experience, (3) Effort Expectancy, (4) Performance Expectancy, (5) Facilitating Conditions, and (6) Social Influence. The initial item pool included 148 items. After the pilot test, 16 items were removed or revised because of low item discrimination (r<.30), high interitem correlations (Ρ>.90), or based on feedback received from pilot participants. During the psychometric analysis of the field test data, 52 items were removed due to low discrimination, evidence of content redundancy, low R-squared value, or poor item infit or outfit. Psychometric analyses of the data revealed acceptable reliability evidence for the following scales: Social Media Self-Efficacy (alpha=.98, item reliability=.98, item separation=6.76), Social Media Experience (alpha=.98, item reliability=.98, item separation=6.24), Effort Expectancy(alpha =.74, item reliability=.95, item separation=4.15), Performance Expectancy (alpha =.81, item reliability=.99, item separation=10.09), Facilitating Conditions (alpha =.66, item reliability=.99, item separation=16.04), and Social Influence (alpha =.66, item reliability=.93, item separation=3.77). There was some evidence of local dependence among the scales, with several observed residual correlations above |.20|. Conclusions Through the multistage instrument-development process, sufficient reliability and validity evidence was collected in support of the purpose and intended use of the SMCI. The SMCI can be used to assess the readiness of health education specialists to effectively use social media for health promotion research and practice. Future research should explore associations across constructs within the SMCI and evaluate the ability of SMCI scores to predict social media use and performance among CHES and MCHES. PMID:26399428
Alber, Julia M; Bernhardt, Jay M; Stellefson, Michael; Weiler, Robert M; Anderson-Lewis, Charkarra; Miller, M David; MacInnes, Jann
2015-09-23
Social media can promote healthy behaviors by facilitating engagement and collaboration among health professionals and the public. Thus, social media is quickly becoming a vital tool for health promotion. While guidelines and trainings exist for public health professionals, there are currently no standardized measures to assess individual social media competency among Certified Health Education Specialists (CHES) and Master Certified Health Education Specialists (MCHES). The aim of this study was to design, develop, and test the Social Media Competency Inventory (SMCI) for CHES and MCHES. The SMCI was designed in three sequential phases: (1) Conceptualization and Domain Specifications, (2) Item Development, and (3) Inventory Testing and Finalization. Phase 1 consisted of a literature review, concept operationalization, and expert reviews. Phase 2 involved an expert panel (n=4) review, think-aloud sessions with a small representative sample of CHES/MCHES (n=10), a pilot test (n=36), and classical test theory analyses to develop the initial version of the SMCI. Phase 3 included a field test of the SMCI with a random sample of CHES and MCHES (n=353), factor and Rasch analyses, and development of SMCI administration and interpretation guidelines. Six constructs adapted from the unified theory of acceptance and use of technology and the integrated behavioral model were identified for assessing social media competency: (1) Social Media Self-Efficacy, (2) Social Media Experience, (3) Effort Expectancy, (4) Performance Expectancy, (5) Facilitating Conditions, and (6) Social Influence. The initial item pool included 148 items. After the pilot test, 16 items were removed or revised because of low item discrimination (r<.30), high interitem correlations (Ρ>.90), or based on feedback received from pilot participants. During the psychometric analysis of the field test data, 52 items were removed due to low discrimination, evidence of content redundancy, low R-squared value, or poor item infit or outfit. Psychometric analyses of the data revealed acceptable reliability evidence for the following scales: Social Media Self-Efficacy (alpha=.98, item reliability=.98, item separation=6.76), Social Media Experience (alpha=.98, item reliability=.98, item separation=6.24), Effort Expectancy(alpha =.74, item reliability=.95, item separation=4.15), Performance Expectancy (alpha =.81, item reliability=.99, item separation=10.09), Facilitating Conditions (alpha =.66, item reliability=.99, item separation=16.04), and Social Influence (alpha =.66, item reliability=.93, item separation=3.77). There was some evidence of local dependence among the scales, with several observed residual correlations above |.20|. Through the multistage instrument-development process, sufficient reliability and validity evidence was collected in support of the purpose and intended use of the SMCI. The SMCI can be used to assess the readiness of health education specialists to effectively use social media for health promotion research and practice. Future research should explore associations across constructs within the SMCI and evaluate the ability of SMCI scores to predict social media use and performance among CHES and MCHES.
Computerized Adaptive Testing: Overview and Introduction.
ERIC Educational Resources Information Center
Meijer, Rob R.; Nering, Michael L.
1999-01-01
Provides an overview of computerized adaptive testing (CAT) and introduces contributions to this special issue. CAT elements discussed include item selection, estimation of the latent trait, item exposure, measurement precision, and item-bank development. (SLD)
Rasch Measurement and Item Banking: Theory and Practice.
ERIC Educational Resources Information Center
Nakamura, Yuji
The Rasch Model is an item response theory, one parameter model developed that states that the probability of a correct response on a test is a function of the difficulty of the item and the ability of the candidate. Item banking is useful for language testing. The Rasch Model provides estimates of item difficulties that are meaningful,…
Development of a National Item Bank for Tests of Driving Knowledge.
ERIC Educational Resources Information Center
Pollock, William T.; McDole, Thomas L.
Materials intended for driving knowledge test development use by operational licensing and education agencies were prepared. Candidate test items were developed, using literature and operational practice sources, to reflect current state-of-knowledge with respect to principles of safe, efficient driving, to legal regulations, and to traffic…
ERIC Educational Resources Information Center
Köksal, Mustafa Serdar
2016-01-01
The purposes of this study were to develop a culture specific critical thinking ability test for 6, 7, and 8. grade students in Turkey and to use it as an assessment instrument for giftedness. For these purposes, item pool involving 22 items was formed by writing items focusing on the current and common events presented in (Turkish) media from…
Mathematics Library of Test Items. Volume One.
ERIC Educational Resources Information Center
Fraser, Graham, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from previous tests are made available to teachers for the construction of pretests or posttests, reference tests for inter-class comparisons and general assignments. The collection was reviewed for content…
Agriculture Library of Test Items.
ERIC Educational Resources Information Center
Sutherland, Duncan, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection is reviewed for content validity and reliability. The test…
NASA Astrophysics Data System (ADS)
Bhakti, Satria Seto; Samsudin, Achmad; Chandra, Didi Teguh; Siahaan, Parsaoran
2017-05-01
The aim of research is developing multiple-choices test items as tools for measuring the scientific of generic skills on solar system. To achieve the aim that the researchers used the ADDIE model consisting Of: Analyzing, Design, Development, Implementation, dan Evaluation, all of this as a method research. While The scientific of generic skills limited research to five indicator including: (1) indirect observation, (2) awareness of the scale, (3) inference logic, (4) a causal relation, and (5) mathematical modeling. The participants are 32 students at one of junior high schools in Bandung. The result shown that multiple-choices that are constructed test items have been declared valid by the expert validator, and after the tests show that the matter of developing multiple-choices test items be able to measuring the scientific of generic skills on solar system.
Fajrianthi; Zein, Rizqy Amelia
2017-01-01
This study aimed to develop an emotional intelligence (EI) test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA]) was designed to measure three EI domains: 1) emotional appraisal, 2) emotional recognition, and 3) emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT) approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA) and item response theory (IRT) were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF) was 3.414 (ability level = 0) for subset 1, 12.183 for subset 2 (ability level = −2), and 2.398 for subset 3 (level of ability = −2). It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA’s item analysis and dimensionality test of each TKEA subset. PMID:29238234
Efforts Toward the Development of Unbiased Selection and Assessment Instruments.
ERIC Educational Resources Information Center
Rudner, Lawrence M.
Investigations into item bias provide an empirical basis for the identification and elimination of test items which appear to measure different traits across populations or cultural groups. The Psychometric rationales for six approaches to the identification of biased test items are reviewed: (1) Transformed item difficulties: within-group…
Devine, J; Otto, C; Rose, M; Barthel, D; Fischer, F; Mühlan, H; Mülhan, H; Nolte, S; Schmidt, S; Ottova-Jordan, V; Ravens-Sieberer, U
2015-04-01
Assessing health-related quality of life (HRQoL) via Computerized Adaptive Tests (CAT) provides greater measurement precision coupled with a lower test burden compared to conventional tests. Currently, there are no European pediatric HRQoL CATs available. This manuscript aims at describing the development of a HRQoL CAT for children and adolescents: the Kids-CAT, which was developed based on the established KIDSCREEN-27 HRQoL domain structure. The Kids-CAT was developed combining classical test theory and item response theory methods and using large archival data of European KIDSCREEN norm studies (n = 10,577-19,580). Methods were applied in line with the US PROMIS project. Item bank development included the investigation of unidimensionality, local independence, exploration of Differential Item Functioning (DIF), evaluation of Item Response Curves (IRCs), estimation and norming of item parameters as well as first CAT simulations. The Kids-CAT was successfully built covering five item banks (with 26-46 items each) to measure physical well-being, psychological well-being, parent relations, social support and peers, and school well-being. The Kids-CAT item banks proved excellent psychometric properties: high content validity, unidimensionality, local independence, low DIF, and model conform IRCs. In CAT simulations, seven items were needed to achieve a measurement precision between .8 and .9 (reliability). It has a child-friendly design, is easy accessible online and gives immediate feedback reports of scores. The Kids-CAT has the potential to advance pediatric HRQoL measurement by making it less burdensome and enhancing the patient-doctor communication.
Development and Validity Testing of an Arthritis Self-Management Assessment Tool.
Oh, HyunSoo; Han, SunYoung; Kim, SooHyun; Seo, WhaSook
Because of the chronic, progressive nature of arthritis and the substantial effects it has on quality of life, patients may benefit from self-management. However, no valid, reliable self-management assessment tool has been devised for patients with arthritis. This study was conducted to develop a comprehensive self-management assessment tool for patients with arthritis, that is, the Arthritis Self-Management Assessment Tool (ASMAT). To develop a list of qualified items corresponding to the conceptual definitions and attributes of arthritis self-management, a measurement model was established on the basis of theoretical and empirical foundations. Content validity testing was conducted to evaluate whether listed items were suitable for assessing arthritis self-management. Construct validity and reliability of the ASMAT were tested. Construct validity was examined using confirmatory factor analysis and nomological validity. The 32-item ASMAT was developed with a sample composed of patients in a clinic in South Korea. Content validity testing validated the 32 items, which comprised medical (10 items), behavioral (13 items), and psychoemotional (9 items) management subscales. Construct validity testing of the ASMAT showed that the 32 items properly corresponded with conceptual constructs of arthritis self-management, and were suitable for assessing self-management ability in patients with arthritis. Reliability was also well supported. The ASMAT devised in the present study may aid the evaluation of patient self-management ability and the effectiveness of self-management interventions. The authors believe the developed tool may also aid the identification of problems associated with the adoption of self-management practice, and thus improve symptom management, independence, and quality of life of patients with arthritis.
ERIC Educational Resources Information Center
Kouimanos, John, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
IRT Item Parameter Scaling for Developing New Item Pools
ERIC Educational Resources Information Center
Kang, Hyeon-Ah; Lu, Ying; Chang, Hua-Hua
2017-01-01
Increasing use of item pools in large-scale educational assessments calls for an appropriate scaling procedure to achieve a common metric among field-tested items. The present study examines scaling procedures for developing a new item pool under a spiraled block linking design. The three scaling procedures are considered: (a) concurrent…
Handbook for Driving Knowledge Testing.
ERIC Educational Resources Information Center
Pollock, William T.; McDole, Thomas L.
Materials intended for driving knowledge test development for use by operational licensing and education agencies are presented. A pool of 1,313 multiple choice test items is included, consisting of sets of specially developed and tested items covering principles of safe driving, legal regulations, and traffic control device knowledge pertinent to…
A Generative Approach to the Development of Hidden-Figure Items.
ERIC Educational Resources Information Center
Bejar, Issac I.; Yocom, Peter
This report explores an approach to item development and psychometric modeling which explicitly incorporates knowledge about the mental models used by examinees in the solution of items into a psychometric model that characterize performances on a test, as well as incorporating that knowledge into the item development process. The paper focuses on…
Obbarius, Nina; Fischer, Felix; Obbarius, Alexander; Nolte, Sandra; Liegl, Gregor; Rose, Matthias
2018-04-10
To develop the first item bank to measure Stress Resilience (SR) in clinical populations. Qualitative item development resulted in an initial pool of 131 items covering a broad theoretical SR concept. These items were tested in n=521 patients at a psychosomatic outpatient clinic. Exploratory and Confirmatory Factor Analysis (CFA), as well as other state-of-the-art item analyses and IRT were used for item evaluation and calibration of the final item bank. Out of the initial item pool of 131 items, we excluded 64 items (54 factor loading <.5, 4 residual correlations >.3, 2 non-discriminative Item Response Curves, 4 Differential Item Functioning). The final set of 67 items indicated sufficient model fit in CFA and IRT analyses. Additionally, a 10-item short form with high measurement precision (SE≤.32 in a theta range between -1.8 and +1.5) was derived. Both the SR item bank and the SR short form were highly correlated with an existing static legacy tool (Connor-Davidson Resilience Scale). The final SR item bank and 10-item short form showed good psychometric properties. When further validated, they will be ready to be used within a framework of Computer-Adaptive Tests for a comprehensive assessment of the Stress-Construct. Copyright © 2018. Published by Elsevier Inc.
Development of a noise annoyance sensitivity scale
NASA Technical Reports Server (NTRS)
Bregman, H. L.; Pearson, R. G.
1972-01-01
Examining the problem of noise pollution from the psychological rather than the engineering view, a test of human sensitivity to noise was developed against the criterion of noise annoyance. Test development evolved from a previous study in which biographical, attitudinal, and personality data was collected on a sample of 166 subjects drawn from the adult community of Raleigh. Analysis revealed that only a small subset of the data collected was predictive of noise annoyance. Item analysis yielded 74 predictive items that composed the preliminary noise sensitivity test. This was administered to a sample of 80 adults who later rate the annoyance value of six sounds (equated in terms of peak sound pressure level) presented in a simulated home, living-room environment. A predictive model involving 20 test items was developed using multiple regression techniques, and an item weighting scheme was evaluated.
ERIC Educational Resources Information Center
Lee, William M.; And Others
Projects to develop an automated item banking and test development system have been undertaken on several occasions at the Air Force Human Resources Laboratory (AFHRL) throughout the past 10 years. Such a system permits the construction of tests in far less time and with a higher degree of accuracy than earlier test construction procedures. This…
Geography Library of Test Items. Volume Four.
ERIC Educational Resources Information Center
Kouimanos, John, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Home Science Library of Test Items. Volume One.
ERIC Educational Resources Information Center
Smith, Jan, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection is reviewed for content validity and reliability. The test…
Languages Library of Test Items. Volume Two: German, Latin.
ERIC Educational Resources Information Center
Campbell, Thomas; And Others
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Languages Library of Test Items. Volume One: French, Indonesian.
ERIC Educational Resources Information Center
Campbell, Thomas; And Others
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography Library of Test Items. Volume Three.
ERIC Educational Resources Information Center
Kouimanos, John, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Commerce Library of Test Items. Volume One.
ERIC Educational Resources Information Center
Meeve, Brian, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography Library of Test Items. Volume Five.
ERIC Educational Resources Information Center
Kouimanos, John, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Textiles and Design Library of Test Items. Volume I.
ERIC Educational Resources Information Center
Smith, Jan, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection is reviewed for content validity and reliability. The test…
Commerce Library of Test Items. Volume Two.
ERIC Educational Resources Information Center
Meeve, Brian, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography Library of Test Items. Volume Six.
ERIC Educational Resources Information Center
Kouimanos, John, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography: Library of Test Items. Volume II.
ERIC Educational Resources Information Center
Kouimanos, John, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
How Big Is Big Enough? Sample Size Requirements for CAST Item Parameter Estimation
ERIC Educational Resources Information Center
Chuah, Siang Chee; Drasgow, Fritz; Luecht, Richard
2006-01-01
Adaptive tests offer the advantages of reduced test length and increased accuracy in ability estimation. However, adaptive tests require large pools of precalibrated items. This study looks at the development of an item pool for 1 type of adaptive administration: the computer-adaptive sequential test. An important issue is the sample size required…
Geography Library of Test Items. Volume One.
ERIC Educational Resources Information Center
Kouimanos, John, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
A Comparison of Linking and Concurrent Calibration under the Graded Response Model.
ERIC Educational Resources Information Center
Kim, Seock-Ho; Cohen, Allan S.
Applications of item response theory to practical testing problems including equating, differential item functioning, and computerized adaptive testing, require that item parameter estimates be placed onto a common metric. In this study, two methods for developing a common metric for the graded response model under item response theory were…
[Development of critical thinking skill evaluation scale for nursing students].
You, So Young; Kim, Nam Cho
2014-04-01
To develop a Critical Thinking Skill Test for Nursing Students. The construct concepts were drawn from a literature review and in-depth interviews with hospital nurses and surveys were conducted among students (n=607) from nursing colleges. The data were collected from September 13 to November 23, 2012 and analyzed using the SAS program, 9.2 version. The KR 20 coefficient for reliability, difficulty index, discrimination index, item-total correlation and known group technique for validity were performed. Four domains and 27 skills were identified and 35 multiple choice items were developed. Thirty multiple choice items which had scores higher than .80 on the content validity index were selected for the pre test. From the analysis of the pre test data, a modified 30 items were selected for the main test. In the main test, the KR 20 coefficient was .70 and Corrected Item-Total Correlations range was .11-.38. There was a statistically significant difference between two academic systems (p=.001). The developed instrument is the first critical thinking skill test reflecting nursing perspectives in hospital settings and is expected to be utilized as a tool which contributes to improvement of the critical thinking ability of nursing students.
Kisala, Pamela A; Tulsky, David S; Pace, Natalie; Victorson, David; Choi, Seung W; Heinemann, Allen W
2015-05-01
To develop a calibrated item bank and computer adaptive test (CAT) to assess the effects of stigma on health-related quality of life in individuals with spinal cord injury (SCI). Grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, and item response theory (IRT)-based psychometric analyses. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. SCI-QOL Stigma Item Bank A sample of 611 individuals with traumatic SCI completed 30 items assessing SCI-related stigma. After 7 items were iteratively removed, factor analyses confirmed a unidimensional pool of items. Graded Response Model IRT analyses were used to estimate slopes and thresholds for the final 23 items. The SCI-QOL Stigma item bank is unique not only in the assessment of SCI-related stigma but also in the inclusion of individuals with SCI in all phases of its development. Use of confirmatory factor analytic and IRT methods provide flexibility and precision of measurement. The item bank may be administered as a CAT or as a 10-item fixed-length short form and can be used for research and clinical applications.
Kisala, Pamela A.; Tulsky, David S.; Pace, Natalie; Victorson, David; Choi, Seung W.; Heinemann, Allen W.
2015-01-01
Objective To develop a calibrated item bank and computer adaptive test (CAT) to assess the effects of stigma on health-related quality of life in individuals with spinal cord injury (SCI). Design Grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, and item response theory (IRT)-based psychometric analyses. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Main Outcome Measures SCI-QOL Stigma Item Bank Results A sample of 611 individuals with traumatic SCI completed 30 items assessing SCI-related stigma. After 7 items were iteratively removed, factor analyses confirmed a unidimensional pool of items. Graded Response Model IRT analyses were used to estimate slopes and thresholds for the final 23 items. Conclusions The SCI-QOL Stigma item bank is unique not only in the assessment of SCI-related stigma but also in the inclusion of individuals with SCI in all phases of its development. Use of confirmatory factor analytic and IRT methods provide flexibility and precision of measurement. The item bank may be administered as a CAT or as a 10-item fixed-length short form and can be used for research and clinical applications. PMID:26010973
ERIC Educational Resources Information Center
Masters, James S.
2010-01-01
With the need for larger and larger banks of items to support adaptive testing and to meet security concerns, large-scale item generation is a requirement for many certification and licensure programs. As part of the mass production of items, it is critical that the difficulty and the discrimination of the items be known without the need for…
Adaptive Mental Testing: The State of the Art
1979-11-01
typically vary in their psychometric properties --particularly in their difficulty--the test designer must decide what configuration of these item...psychometric properties best suits the test’s purpose. There are two extreme ration- ales to guide that decision. One rationale is to choose items that are...development of item response theory (Rasch, 1960; Lord, 1952, 1970, 1974a; Birnbaum, 1968) that provided the needed invariance properties for item
Forkmann, Thomas; Boecker, Maren; Norra, Christine; Eberle, Nicole; Kircher, Tilo; Schauerte, Patrick; Mischke, Karl; Westhofen, Martin; Gauggel, Siegfried; Wirtz, Markus
2009-05-01
The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants' test burden. The present study aimed at developing a new item bank that allows for assessing depression in persons with mental and persons with somatic diseases. The sample consisted of 161 participants treated for a depressive syndrome, and 206 participants with somatic illnesses (103 cardiologic, 103 otorhinolaryngologic; overall mean age = 44.1 years, SD =14.0; 44.7% women) to allow for validation of the item bank in both groups. Persons answered a pool of 182 depression items on a 5-point Likert scale. Evaluation of Rasch model fit (infit < 1.3), differential item functioning, dimensionality, local independence, item spread, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 79 items with good psychometric properties. The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. It might also be useful for researchers who wish to develop new fixed-length scales for the assessment of depression in specific rehabilitation settings. (PsycINFO Database Record (c) 2009 APA, all rights reserved).
Optimal Stratification of Item Pools in a-Stratified Computerized Adaptive Testing.
ERIC Educational Resources Information Center
Chang, Hua-Hua; van der Linden, Wim J.
2003-01-01
Developed a method based on 0-1 linear programming to stratify an item pool optimally for use in alpha-stratified adaptive testing. Applied the method to a previous item pool from the computerized adaptive test of the Graduate Record Examinations. Results show the new method performs well in practical situations. (SLD)
The Development and Validation of a Formula for Measuring Single-Sentence Test Item Readability.
ERIC Educational Resources Information Center
Homan, Susan; And Others
1994-01-01
A study was conducted with 782 elementary school students to determine whether the Homan-Hewitt Readability Formula could identify the readability of a single-sentence test item. Results indicate that a relationship exists between students' reading grade levels and responses to test items written at higher readability levels. (SLD)
The Development and Management of Banks of Performance Based Test Items.
ERIC Educational Resources Information Center
Curtis, H. A., Ed.
Symposium papers presented at an Annual Meeting of the National Council on Measurement in Education (Chicago, 1972), all of which concern banks of test items for use in constructing criterion referenced tests, comprise this document. The first paper, "Locally Produced Item Banks" by Thomas J. Slocum, presents information on the…
Developing an item bank and short forms that assess the impact of asthma on quality of life.
Stucky, Brian D; Edelen, Maria Orlando; Sherbourne, Cathy D; Eberhart, Nicole K; Lara, Marielena
2014-02-01
The present work describes the process of developing an item bank and short forms that measure the impact of asthma on quality of life (QoL) that avoids confounding QoL with asthma symptomatology and functional impairment. Using a diverse national sample of adults with asthma (N = 2032) we conducted exploratory and confirmatory factor analyses, and item response theory and differential item functioning analyses to develop a 65-item unidimensional item bank and separate short form assessments. A psychometric evaluation of the RAND Impact of Asthma on QoL item bank (RAND-IAQL) suggests that though the concept of asthma impact on QoL is multi-faceted, it may be measured as a single underlying construct. The performance of the bank was then evaluated with a real-data simulated computer adaptive test. From the RAND-IAQL item bank we then developed two short forms consisting of 4 and 12 items (reliability = 0.86 and 0.93, respectively). A real-data simulated computer adaptive test suggests that as few as 4-5 items from the bank are needed to obtain highly precise scores. Preliminary validity results indicate that the RAND-IAQL measures distinguish between levels of asthma control. To measure the impact of asthma on QoL, users of these items may choose from two highly reliable short forms, computer adaptive test administration, or content-specific subsets of items from the bank tailored to their specific needs. Copyright © 2013 Elsevier Ltd. All rights reserved.
Gopichandran, Vijayaprasad; Wouters, Edwin; Chetlapalli, Satish Kumar
2015-05-03
Trust in physicians is the unwritten covenant between the patient and the physician that the physician will do what is in the best interest of the patient. This forms the undercurrent of all healthcare relationships. Several scales exist for assessment of trust in physicians in developed healthcare settings, but to our knowledge none of these have been developed in a developing country context. To develop and validate a new trust in physician scale for a developing country setting. Dimensions of trust in physicians, which were identified in a previous qualitative study in the same setting, were used to develop a scale. This scale was administered among 616 adults selected from urban and rural areas of Tamil Nadu, south India, using a multistage sampling cross sectional survey method. The individual items were analysed using a classical test approach as well as item response theory. Cronbach's α was calculated and the item to total correlation of each item was assessed. After testing for unidimensionality and absence of local dependence, a 2 parameter logistic Semajima's graded response model was fit and item characteristics assessed. Competence, assurance of treatment, respect for the physician and loyalty to the physician were important dimensions of trust. A total of 31 items were developed using these dimensions. Of these, 22 were selected for final analysis. The Cronbach's α was 0.928. The item to total correlations were acceptable for all the 22 items. The item response analysis revealed good item characteristic curves and item information for all the items. Based on the item parameters and item information, a final 12 item scale was developed. The scale performs optimally in the low to moderate trust range. The final 12 item trust in physician scale has a good construct validity and internal consistency. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Gopichandran, Vijayaprasad; Wouters, Edwin; Chetlapalli, Satish Kumar
2015-01-01
Trust in physicians is the unwritten covenant between the patient and the physician that the physician will do what is in the best interest of the patient. This forms the undercurrent of all healthcare relationships. Several scales exist for assessment of trust in physicians in developed healthcare settings, but to our knowledge none of these have been developed in a developing country context. Objectives To develop and validate a new trust in physician scale for a developing country setting. Methods Dimensions of trust in physicians, which were identified in a previous qualitative study in the same setting, were used to develop a scale. This scale was administered among 616 adults selected from urban and rural areas of Tamil Nadu, south India, using a multistage sampling cross sectional survey method. The individual items were analysed using a classical test approach as well as item response theory. Cronbach's α was calculated and the item to total correlation of each item was assessed. After testing for unidimensionality and absence of local dependence, a 2 parameter logistic Semajima's graded response model was fit and item characteristics assessed. Results Competence, assurance of treatment, respect for the physician and loyalty to the physician were important dimensions of trust. A total of 31 items were developed using these dimensions. Of these, 22 were selected for final analysis. The Cronbach's α was 0.928. The item to total correlations were acceptable for all the 22 items. The item response analysis revealed good item characteristic curves and item information for all the items. Based on the item parameters and item information, a final 12 item scale was developed. The scale performs optimally in the low to moderate trust range. Conclusions The final 12 item trust in physician scale has a good construct validity and internal consistency. PMID:25941182
Development of an item bank for computerized adaptive test (CAT) measurement of pain.
Petersen, Morten Aa; Aaronson, Neil K; Chie, Wei-Chu; Conroy, Thierry; Costantini, Anna; Hammerlid, Eva; Hjermstad, Marianne J; Kaasa, Stein; Loge, Jon H; Velikova, Galina; Young, Teresa; Groenvold, Mogens
2016-01-01
Patient-reported outcomes should ideally be adapted to the individual patient while maintaining comparability of scores across patients. This is achievable using computerized adaptive testing (CAT). The aim here was to develop an item bank for CAT measurement of the pain domain as measured by the EORTC QLQ-C30 questionnaire. The development process consisted of four steps: (1) literature search, (2) formulation of new items and expert evaluations, (3) pretesting and (4) field-testing and psychometric analyses for the final selection of items. In step 1, we identified 337 pain items from the literature. Twenty-nine new items fitting the QLQ-C30 item style were formulated in step 2 that were reduced to 26 items by expert evaluations. Based on interviews with 31 patients from Denmark, France and the UK, the list was further reduced to 21 items in step 3. In phase 4, responses were obtained from 1103 cancer patients from five countries. Psychometric evaluations showed that 16 items could be retained in a unidimensional item bank. Evaluations indicated that use of the CAT measure may reduce sample size requirements with 15-25% compared to using the QLQ-C30 pain scale. We have established an item bank of 16 items suitable for CAT measurement of pain. While being backward compatible with the QLQ-C30, the new item bank will significantly improve measurement precision of pain. We recommend initiating CAT measurement by screening for pain using the two original QLQ-C30 pain items. The EORTC pain CAT is currently available for "experimental" purposes.
NASA Astrophysics Data System (ADS)
Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan
2016-12-01
This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
Developing and testing new smoking measures for the Health Plan Employer Data and Information Set.
Pbert, Lori; Vuckovic, Nancy; Ockene, Judith K; Hollis, Jack F; Riedlinger, Karen
2003-04-01
To develop and test items for the Health Plan Employee Data and Information Set (HEDIS) that assess delivery of the full range of provider-delivered tobacco interventions. The authors identified potential items via literature review; items were reviewed by national experts. Face validity of candidate items was tested in focus groups. The final survey was sent to a random sample of 1711 adult primary care patients; the re-test survey was sent to self-identified smokers. The process identified reliable items to capture provider assessment of motivation and provision of assistance and follow-up. One can reliably assess patient self-report of provider delivery of the full range of brief tobacco interventions. Such assessment and feedback to health plans and providers may increase use of evidence-based brief interventions.
Development of The Science Processes Test.
ERIC Educational Resources Information Center
Ludeman, Robert R.
Presented is a description and copy of a test manual developed to include items in the test on the basis of children's performance; each item correlated highly with performance on an external criterion. The external criterion was the Individual Competency Measures of the elementary science program Science - A Process Approach (SAPA). The test…
The Australian Science Item Bank Project
ERIC Educational Resources Information Center
Kings, Clive B.; Cropley, Murray C.
1974-01-01
Describes the development of multiple-choice test item bank for grade ten science by the Australian Council for Educational Research. Other item banks are also being developed at the grade ten level in mathematics and social science. (RH)
Buck, Harleah G; Harkness, Karen; Ali, Muhammad Usman; Carroll, Sandra L; Kryworuchko, Jennifer; McGillion, Michael
2017-04-01
Caregivers (CGs) contribute important assistance with heart failure (HF) self-care, including daily maintenance, symptom monitoring, and management. Until CGs' contributions to self-care can be quantified, it is impossible to characterize it, account for its impact on patient outcomes, or perform meaningful cost analyses. The purpose of this study was to conduct psychometric testing and item reduction on the recently developed 34-item Caregiver Contribution to Heart Failure Self-care (CACHS) instrument using classical and item response theory methods. Fifty CGs (mean age 63 years ±12.84; 70% female) recruited from a HF clinic completed the CACHS in 2014 and results evaluated using classical test theory and item response theory. Items would be deleted for low (<.05) or high (>.95) endorsement, low (<.3) or high (>.7) corrected item-total correlations, significant pairwise correlation coefficients, floor or ceiling effects, relatively low latent trait and item information function levels (<1.5 and p > .5), and differential item functioning. After analysis, 14 items were excluded, resulting in a 20-item instrument (self-care maintenance eight items; monitoring seven items; and management five items). Most items demonstrated moderate to high discrimination (median 2.13, minimum .77, maximum 5.05), and appropriate item difficulty (-2.7 to 1.4). Internal consistency reliability was excellent (Cronbach α = .94, average inter-item correlation = .41) with no ceiling effects. The newly developed 20-item version of the CACHS is supported by rigorous instrument development and represents a novel instrument to measure CGs' contribution to HF self-care. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
FIM-Minimum Data Set Motor Item Bank: Short Forms Development and Precision Comparison in Veterans.
Li, Chih-Ying; Romero, Sergio; Simpson, Annie N; Bonilha, Heather S; Simpson, Kit N; Hong, Ickpyo; Velozo, Craig A
2018-03-01
To improve the practical use of the short forms (SFs) developed from the item bank, we compared the measurement precision of the 4- and 8-item SFs generated from a motor item bank composed of the FIM and the Minimum Data Set (MDS). The FIM-MDS motor item bank allowed scores generated from different instruments to be co-calibrated. The 4- and 8-item SFs were developed based on Rasch analysis procedures. This article compared person strata, ceiling/floor effects, and test SE plots for each administration form and examined 95% confidence interval error bands of anchored person measures with the corresponding SFs. We used 0.3 SE as a criterion to reflect a reliability level of .90. Veterans' inpatient rehabilitation facilities and community living centers. Veterans (N=2500) who had both FIM and the MDS data within 6 days during 2008 through 2010. Not applicable. Four- and 8-item SFs of FIM, MDS, and FIM-MDS motor item bank. Six SFs were generated with 4 and 8 items across a range of difficulty levels from the FIM-MDS motor item bank. The three 8-item SFs all had higher correlations with the item bank (r=.82-.95), higher person strata, and less test error than the corresponding 4-item SFs (r=.80-.90). The three 4-item SFs did not meet the criteria of SE <0.3 for any theta values. Eight-item SFs could improve clinical use of the item bank composed of existing instruments across the continuum of care in veterans. We also found that the number of items, not test specificity, determines the precision of the instrument. Copyright © 2017 American Congress of Rehabilitation Medicine. All rights reserved.
DEVELOPMENT OF DIAGNOSTIC ANALYTICAL AND MECHANICAL ABILITY TESTS THROUGH FACET DESIGN AND ANALYSIS.
ERIC Educational Resources Information Center
GUTTMAN, LOUIS,; SCHLESINGER, I.M.
METHODOLOGY BASED ON FACET THEORY (MODIFIED SET THEORY) WAS USED IN TEST CONSTRUCTION AND ANALYSIS TO PROVIDE AN EFFICIENT TOOL OF EVALUATION FOR VOCATIONAL GUIDANCE AND VOCATIONAL SCHOOL USE. THE TYPE OF TEST DEVELOPMENT UNDERTAKEN WAS LIMITED TO THE USE OF NONVERBAL PICTORIAL ITEMS. ITEMS FOR TESTING ABILITY TO IDENTIFY ELEMENTS BELONGING TO AN…
Development of Thermodynamic Conceptual Evaluation
NASA Astrophysics Data System (ADS)
Talaeb, P.; Wattanakasiwich, P.
2010-07-01
This research aims to develop a test for assessing student understanding of fundamental principles in thermodynamics. Misconceptions found from previous physics education research were used to develop the test. Its topics include heat and temperature, the zeroth and the first law of thermodynamics, and the thermodynamics processes. The content validity was analyzed by three physics experts. Then the test was administered to freshmen, sophomores and juniors majored in physics in order to determine item difficulties and item discrimination of the test. A few items were eliminated from the test. Finally, the test will be administered to students taking Physics I course in order to evaluate the effectiveness of Interactive Lecture Demonstrations that will be used for the first time at Chiang Mai University.
ERIC Educational Resources Information Center
Weiss, David J., Ed.
This symposium consists of five papers and presents some recent developments in adaptive testing which have applications to several military testing problems. The overview, by James R. McBride, defines adaptive testing and discusses some of its item selection and scoring strategies. Item response theory, or item characteristic curve theory, is…
Cupani, Marcos; Zamparella, Tatiana Castro; Piumatti, Gisella; Vinculado, Grupo
The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants' test burden. This study aims to develop a bank of items to measure the level of Knowledge on Biology using the Rasch model. The sample consisted of 1219 participants that studied in different faculties of the National University of Cordoba (mean age = 21.85 years, SD = 4.66; 66.9% are women). The items were organized in different forms and into separate subtests, with some common items across subtests. The students were told they had to answer 60 questions of knowledge on biology. Evaluation of Rasch model fit (Zstd >|2.0|), differential item functioning, dimensionality, local independence, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 180 items with good psychometric properties. The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. The contribution of this work is significant in the field of educational assessment in Argentina.
Tarrant, Marie; Ware, James; Mohammed, Ahmed M
2009-07-07
Four- or five-option multiple choice questions (MCQs) are the standard in health-science disciplines, both on certification-level examinations and on in-house developed tests. Previous research has shown, however, that few MCQs have three or four functioning distractors. The purpose of this study was to investigate non-functioning distractors in teacher-developed tests in one nursing program in an English-language university in Hong Kong. Using item-analysis data, we assessed the proportion of non-functioning distractors on a sample of seven test papers administered to undergraduate nursing students. A total of 514 items were reviewed, including 2056 options (1542 distractors and 514 correct responses). Non-functioning options were defined as ones that were chosen by fewer than 5% of examinees and those with a positive option discrimination statistic. The proportion of items containing 0, 1, 2, and 3 functioning distractors was 12.3%, 34.8%, 39.1%, and 13.8% respectively. Overall, items contained an average of 1.54 (SD = 0.88) functioning distractors. Only 52.2% (n = 805) of all distractors were functioning effectively and 10.2% (n = 158) had a choice frequency of 0. Items with more functioning distractors were more difficult and more discriminating. The low frequency of items with three functioning distractors in the four-option items in this study suggests that teachers have difficulty developing plausible distractors for most MCQs. Test items should consist of as many options as is feasible given the item content and the number of plausible distractors; in most cases this would be three. Item analysis results can be used to identify and remove non-functioning distractors from MCQs that have been used in previous tests.
ERIC Educational Resources Information Center
Ryan, Ève; Brunfaut, Tineke
2016-01-01
It is not unusual for tests in less-commonly taught languages (LCTLs) to be developed by an experienced item writer with no proficiency in the language being tested, in collaboration with a language informant who is a speaker of the target language, but lacks language assessment expertise. How this approach to item writing works in practice, and…
ERIC Educational Resources Information Center
Harris, Margaret L.; Tabachnick, B. Robert
This paper describes test development efforts for measuring achievement of selected concepts in social studies. It includes descriptive item and test statistics for the tests developed. Twelve items were developed for each of 30 concepts. Subject specialists categorized the concepts into three major areas: Geographic Region, Man and Society, and…
Kalpakjian, Claire Z.; Tulsky, David S.; Kisala, Pamela A.; Bombardier, Charles H.
2015-01-01
Objective To develop an item response theory (IRT) calibrated Grief and Loss item bank as part of the Spinal Cord Injury – Quality of Life (SCI-QOL) measurement system. Design A literature review guided framework development of grief/loss. New items were created from focus groups. Items were revised based on expert review and patient feedback and were then field tested. Analyses included confirmatory factor analysis (CFA), graded response IRT modeling and evaluation of differential item functioning (DIF). Setting We tested a 20-item pool at several rehabilitation centers across the United States, including the University of Michigan, Kessler Foundation, Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs hospital. Participants A total of 717 individuals with SCI answered the grief and loss questions. Results The final calibrated item bank resulted in 17 retained items. A unidimensional model was observed (CFI = 0.976; RMSEA = 0.078) and measurement precision was good (theta range between −1.48 to 2.48). Ten items were flagged for DIF, however, after examination of effect sizes found this to be negligible with little practical impact on score estimates. Conclusions This study indicates that the SCI-QOL Grief and Loss item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010969
ERIC Educational Resources Information Center
Davis, Laurie Laughlin
2004-01-01
Choosing a strategy for controlling item exposure has become an integral part of test development for computerized adaptive testing (CAT). This study investigated the performance of six procedures for controlling item exposure in a series of simulated CATs under the generalized partial credit model. In addition to a no-exposure control baseline…
Osmosis and Diffusion Conceptual Assessment
Fisher, Kathleen M.; Williams, Kathy S.; Lineback, Jennifer Evarts
2011-01-01
Biology student mastery regarding the mechanisms of diffusion and osmosis is difficult to achieve. To monitor comprehension of these processes among students at a large public university, we developed and validated an 18-item Osmosis and Diffusion Conceptual Assessment (ODCA). This assessment includes two-tiered items, some adopted or modified from the previously published Diffusion and Osmosis Diagnostic Test (DODT) and some newly developed items. The ODCA, a validated instrument containing fewer items than the DODT and emphasizing different content areas within the realm of osmosis and diffusion, better aligns with our curriculum. Creation of the ODCA involved removal of six DODT item pairs, modification of another six DODT item pairs, and development of three new item pairs addressing basic osmosis and diffusion concepts. Responses to ODCA items testing the same concepts as the DODT were remarkably similar to responses to the DODT collected from students 15 yr earlier, suggesting that student mastery regarding the mechanisms of diffusion and osmosis remains elusive. PMID:22135375
Boston, Raymond C.; Coyne, James C.; Farrar, John T.
2010-01-01
Objective To develop and psychometrically test an owner self-administered questionnaire designed to assess severity and impact of chronic pain in dogs with osteoarthritis. Sample Population 70 owners of dogs with osteoarthritis and 50 owners of clinically normal dogs. Procedures Standard methods for the stepwise development and testing of instruments designed to assess subjective states were used. Items were generated through focus groups and an expert panel. Items were tested for readability and ambiguity, and poorly performing items were removed. The reduced set of items was subjected to factor analysis, reliability testing, and validity testing. Results Severity of pain and interference with function were 2 factors identified and named on the basis of the items contained in them. Cronbach’s α was 0.93 and 0.89, respectively, suggesting that the items in each factor could be assessed as a group to compute factor scores (ie, severity score and interference score). The test-retest analysis revealed κ values of 0.75 for the severity score and 0.81 for the interference score. Scores correlated moderately well (r = 0.51 and 0.50, respectively) with the overall quality-of-life (QOL) question, such that as severity and interference scores increased, QOL decreased. Clinically normal dogs had significantly lower severity and interference scores than dogs with osteoarthritis. Conclusions and Clinical Relevance A psychometrically sound instrument was developed. Responsiveness testing must be conducted to determine whether the questionnaire will be useful in reliably obtaining quantifiable assessments from owners regarding the severity and impact of chronic pain and its treatment on dogs with osteoarthritis. PMID:17542696
Item Selection and Pre-equating with Empirical Item Characteristic Curves.
ERIC Educational Resources Information Center
Livingston, Samuel A.
An empirical item characteristic curve shows the probability of a correct response as a function of the student's total test score. These curves can be estimated from large-scale pretest data. They enable test developers to select items that discriminate well in the score region where decisions are made. A similar set of curves can be used to…
ERIC Educational Resources Information Center
Brown, Frank N.; And Others
The successful Wisconsin Title 1 project item bank offers a valid, flexible, and efficient means of providing migrant student tests in reading and mathematics tailored to instructor curricula. The item bank system consists of nine PASCAL computer programs which maintain, search, and select from approximately 1,000 test items stored on floppy disks…
ERIC Educational Resources Information Center
Rudner, Lawrence
This digest discusses the advantages and disadvantages of using item banks, and it provides useful information for those who are considering implementing an item banking project in their school districts. The primary advantage of item banking is in test development. Using an item response theory method, such as the Rasch model, items from multiple…
Air Force Officer Qualifying Test Form O: Development and Standardization.
ERIC Educational Resources Information Center
Rogers, Deborah L.; And Others
This report presents the rationale, development, and standardization of the Air Force Officer Qualifying Test (AFOQT) Form O. The test is used to select individuals for officer commissioning programs, and candidates for pilot and navigator training. Form O contains 380 items organized in 16 subtests. All items are administered in a single test…
ERIC Educational Resources Information Center
Samejima, Fumiko; Changas, Paul S.
The methods and approaches for estimating the operating characteristics of the discrete item responses without assuming any mathematical form have been developed and expanded. It has been made possible that, even if the test information function of a given test is not constant for the interval of ability of interest, it is used as the Old Test.…
Development and Validation of the Numeracy Understanding in Medicine Instrument Short Form
Schapira, Marilyn M.; Walker, Cindy M.; Miller, Tamara; Fletcher, Kathlyn A; Ganschow, Pamela G.; Jacobs, Elizabeth A; Imbert, Diana; O'Connell, Maria; Neuner, Joan M.
2014-01-01
Background Health numeracy can be defined as the ability to understand and use numeric information and quantitative concepts in the context of health. We previously reported the development of the Numeracy Understanding in Medicine Instrument (NUMi); a 20-item test developed using item response theory. We now report the development and validation of a short form of the NUMi. Methods Item statistics were used to identify a subset of 8-items representing a range of difficulty and content areas. Internal reliability was evaluated with Cronbach's alpha. Divergent and convergent validity was assessed by comparing scores of the S-NUMI with existing measures of education, print and numeric health literacy, mathematic achievement, cognitive reasoning, and the original NUMi. Results The 8-item scale had adequate reliability (Cronbach's alpha: 0.72) and was strongly correlated to the 20-item NUMi (0.92). The S-NUMi scores were strongly correlated with the Lipkus numeracy test (0.62), Wide Range of Achievement Test-Mathematics (WRAT-M) (0.72), and Wonderlic cognitive reasoning test (0.76). Moderate correlation was found with education level (0.58) and print literacy as measured by the TOFHLA (0.49). Conclusion The short Numeracy Understanding in Medicine Instrument is a reliable and valid measure of health numeracy feasible for use in clinical and research settings. PMID:25315596
Victorson, David E; Choi, Seung; Judson, Marc A; Cella, David
2014-05-01
Sarcoidosis is a multisystem disease that can negatively impact health-related quality of life (HRQL) across generic (e.g., physical, social and emotional wellbeing) and disease-specific (e.g., pulmonary, ocular, dermatologic) domains. Measurement of HRQL in sarcoidosis has largely relied on generic patient-reported outcome tools, with little disease-specific measures available. The purpose of this paper is to present the development and testing of disease-specific item banks and short forms of lung, skin and eye problems, which are a part of a new patient-reported outcome (PRO) instrument called the sarcoidosis assessment tool. After prioritizing and selecting the most important disease-specific domains, we wrote new items to reflect disease-specific problems by drawing from patient focus group and clinician expert survey data that were used to create our conceptual model of HRQL in sarcoidosis. Item pools underwent cognitive interviews by sarcoidosis patients (n = 13), and minor modifications were made. These items were administered in a multi-site study (n = 300) to obtain item calibrations and create calibrated short forms using item response theory (IRT) approaches. From the available item pools, we created four new item banks and short forms: (1) skin problems, (2) skin stigma, (3) lung problems, and (4) eye Problems. We also created and tested supplemental forms of the most common constitutional symptoms and negative effects of corticosteroids. Several new sarcoidosis-specific PROs were developed and tested using IRT approaches. These new measures can advance more precise and targeted HRQL assessment in sarcoidosis clinical trials and clinical practice.
Procedures to develop a computerized adaptive test to assess patient-reported physical functioning.
McCabe, Erin; Gross, Douglas P; Bulut, Okan
2018-06-07
The purpose of this paper is to demonstrate the procedures to develop and implement a computerized adaptive patient-reported outcome (PRO) measure using secondary analysis of a dataset and items from fixed-format legacy measures. We conducted secondary analysis of a dataset of responses from 1429 persons with work-related lower extremity impairment. We calibrated three measures of physical functioning on the same metric, based on item response theory (IRT). We evaluated efficiency and measurement precision of various computerized adaptive test (CAT) designs using computer simulations. IRT and confirmatory factor analyses support combining the items from the three scales for a CAT item bank of 31 items. The item parameters for IRT were calculated using the generalized partial credit model. CAT simulations show that reducing the test length from the full 31 items to a maximum test length of 8 items, or 20 items is possible without a significant loss of information (95, 99% correlation with legacy measure scores). We demonstrated feasibility and efficiency of using CAT for PRO measurement of physical functioning. The procedures we outlined are straightforward, and can be applied to other PRO measures. Additionally, we have included all the information necessary to implement the CAT of physical functioning in the electronic supplementary material of this paper.
Methodology for the development and calibration of the SCI-QOL item banks
Tulsky, David S.; Kisala, Pamela A.; Victorson, David; Choi, Seung W.; Gershon, Richard; Heinemann, Allen W.; Cella, David
2015-01-01
Objective To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Methods Individual interviews (n = 44) and focus groups (n = 65 individuals with SCI and n = 42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n = 877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n = 245) to assess test-retest reliability and stability. Participants and Procedures A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. Results We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury – Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. Conclusions The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM. PMID:26010963
Methodology for the development and calibration of the SCI-QOL item banks.
Tulsky, David S; Kisala, Pamela A; Victorson, David; Choi, Seung W; Gershon, Richard; Heinemann, Allen W; Cella, David
2015-05-01
To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Individual interviews (n=44) and focus groups (n=65 individuals with SCI and n=42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n=877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n=245) to assess test-retest reliability and stability. A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury--Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM.
Development of a short version of the new brief job stress questionnaire.
Inoue, Akiomi; Kawakami, Norito; Shimomitsu, Teruichi; Tsutsumi, Akizumi; Haratani, Takashi; Yoshikawa, Toru; Shimazu, Akihito; Odagiri, Yuko
2014-01-01
This study was aimed to investigate the test-retest reliability and validity of a short version of the New Brief Job Stress Questionnaire (New BJSQ) whose scales have one item selected from a standard version. Based on the results from an anonymous web-based questionnaire of occupational health staffs and personnel/labor staffs, we selected higher-priority scales from the standard version. After selecting one item with highest item-total correlation coefficient from each scale, a 23-item questionnaire was developed. A nationally representative survey was administered to Japanese employees (n=1,633) to examine test-retest reliability and validity. Most scales (or items) showed modest but adequate levels of test-retest reliability (r>0.50). Furthermore, job demands and job resources scales (or items) were associated with mental and physical stress reactions while job resources scales (or items) were also associated with positive outcomes. These findings provided a piece of evidence that the short version of the New BJSQ is reliable and valid.
Development of a Short Version of the New Brief Job Stress Questionnaire
INOUE, Akiomi; KAWAKAMI, Norito; SHIMOMITSU, Teruichi; TSUTSUMI, Akizumi; HARATANI, Takashi; YOSHIKAWA, Toru; SHIMAZU, Akihito; ODAGIRI, Yuko
2014-01-01
This study was aimed to investigate the test-retest reliability and validity of a short version of the New Brief Job Stress Questionnaire (New BJSQ) whose scales have one item selected from a standard version. Based on the results from an anonymous web-based questionnaire of occupational health staffs and personnel/labor staffs, we selected higher-priority scales from the standard version. After selecting one item with highest item-total correlation coefficient from each scale, a 23-item questionnaire was developed. A nationally representative survey was administered to Japanese employees (n=1,633) to examine test-retest reliability and validity. Most scales (or items) showed modest but adequate levels of test-retest reliability (r>0.50). Furthermore, job demands and job resources scales (or items) were associated with mental and physical stress reactions while job resources scales (or items) were also associated with positive outcomes. These findings provided a piece of evidence that the short version of the New BJSQ is reliable and valid. PMID:24975108
Kisala, Pamela A; Tulsky, David S; Kalpakjian, Claire Z; Heinemann, Allen W; Pohlig, Ryan T; Carle, Adam; Choi, Seung W
2015-05-01
To develop a calibrated item bank and computer adaptive test to assess anxiety symptoms in individuals with spinal cord injury (SCI), transform scores to the Patient Reported Outcomes Measurement Information System (PROMIS) metric, and create a statistical linkage with the Generalized Anxiety Disorder (GAD)-7, a widely used anxiety measure. Grounded-theory based qualitative item development methods; large-scale item calibration field testing; confirmatory factor analysis; graded response model item response theory analyses; statistical linking techniques to transform scores to a PROMIS metric; and linkage with the GAD-7. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Spinal Cord Injury-Quality of Life (SCI-QOL) Anxiety Item Bank Seven hundred sixteen individuals with traumatic SCI completed 38 items assessing anxiety, 17 of which were PROMIS items. After 13 items (including 2 PROMIS items) were removed, factor analyses confirmed unidimensionality. Item response theory analyses were used to estimate slopes and thresholds for the final 25 items (15 from PROMIS). The observed Pearson correlation between the SCI-QOL Anxiety and GAD-7 scores was 0.67. The SCI-QOL Anxiety item bank demonstrates excellent psychometric properties and is available as a computer adaptive test or short form for research and clinical applications. SCI-QOL Anxiety scores have been transformed to the PROMIS metric and we provide a method to link SCI-QOL Anxiety scores with those of the GAD-7.
Predicting Item Difficulty in a Reading Comprehension Test with an Artificial Neural Network.
ERIC Educational Resources Information Center
Perkins, Kyle; And Others
This paper reports the results of using a three-layer backpropagation artificial neural network to predict item difficulty in a reading comprehension test. Two network structures were developed, one with and one without a sigmoid function in the output processing unit. The data set, which consisted of a table of coded test items and corresponding…
ERIC Educational Resources Information Center
Gutl, Christian; Lankmayr, Klaus; Weinhofer, Joachim; Hofler, Margit
2011-01-01
Research in automated creation of test items for assessment purposes became increasingly important during the recent years. Due to automatic question creation it is possible to support personalized and self-directed learning activities by preparing appropriate and individualized test items quite easily with relatively little effort or even fully…
Higgins, Johanne; Finch, Lois E; Kopec, Jacek; Mayo, Nancy E
2010-02-01
To create and illustrate the development of a method to parsimoniously and hierarchically assess upper extremity function in persons after stroke. Data were analyzed using Rasch analysis. Re-analysis of data from 8 studies involving persons after stroke. Over 4000 patients with stroke who participated in various studies in Montreal and elsewhere in Canada. Data comprised 17 tests or indices of upper extremity function and health-related quality of life, for a total of 99 items related to upper extremity function. Tests and indices included, among others, the Box and Block Test, the Nine-Hole Peg Test and the Stroke Impact Scale. Data were collected at various times post-stroke from 3 days to 1 year. Once the data fit the model, a bank of items measuring upper extremity function with persons and items organized hierarchically by difficulty and ability in log units was produced. This bank forms the basis for eventual computer adaptive testing. The calibration of the items should be tested further psychometrically, as should the interpretation of the metric arising from using the item calibration to measure the upper extremity of individuals.
Giuffrida, Michelle A; Brown, Dorothy Cimino; Ellenberg, Susan S; Farrar, John T
2018-05-01
OBJECTIVE To describe development and initial psychometric testing of an owner-reported questionnaire designed to standardize measurement of general quality of life (QOL) in dogs with cancer. DESIGN Key-informant interviews, questionnaire development, and field trial. SAMPLE Owners of 25 dogs with cancer for item development and pretesting and owners of 90 dogs with cancer for reliability and validity testing. PROCEDURES Standard methods for development and testing of questionnaire instruments intended to measure subjective states were used. Items were generated, selected, scaled, and pretested for content, meaning, and readability. Response items were evaluated with exploratory factor analysis and by assessing internal consistency (Cronbach α) and convergence with global QOL as determined with a visual analog scale. Preliminary tests of stability and responsiveness were performed. RESULTS The final questionnaire-which was named the Canine Owner-Reported Quality of Life (CORQ) questionnaire-contained 17 items related to observable behaviors commonly used by owners to evaluate QOL in their dogs. Several items pertaining to physical symptoms performed poorly and were omitted. The 17 items were assigned to 4 factors-vitality, companionship, pain, and mobility-on the basis of the items they contained. The CORQ questionnaire and its factors had high internal consistency (Cronbach α = 0.68 to 0.90) and moderate to strong correlations (r = 0.49 to 0.71) with global QOL as measured on a visual analog scale. Preliminary testing indicated good test-retest reliability and responsiveness to improvements in overall QOL. CONCLUSIONS AND CLINICAL RELEVANCE The CORQ questionnaire was a valid, reliable owner-reported questionnaire that measured general QOL in dogs with cancer and showed promise as a clinical trial outcome measure for quantifying changes in individual dog QOL occurring in response to cancer treatment and progression.
Monitoring Items in Real Time to Enhance CAT Security
ERIC Educational Resources Information Center
Zhang, Jinming; Li, Jie
2016-01-01
An IRT-based sequential procedure is developed to monitor items for enhancing test security. The procedure uses a series of statistical hypothesis tests to examine whether the statistical characteristics of each item under inspection have changed significantly during CAT administration. This procedure is compared with a previously developed…
DIFAS: Differential Item Functioning Analysis System. Computer Program Exchange
ERIC Educational Resources Information Center
Penfield, Randall D.
2005-01-01
Differential item functioning (DIF) is an important consideration in assessing the validity of test scores (Camilli & Shepard, 1994). A variety of statistical procedures have been developed to assess DIF in tests of dichotomous (Hills, 1989; Millsap & Everson, 1993) and polytomous (Penfield & Lam, 2000; Potenza & Dorans, 1995) items. Some of these…
Distinctions between Item Format and Objectivity in Scoring.
ERIC Educational Resources Information Center
Terwilliger, James S.
This paper clarifies important distinctions in item writing and item scoring and considers the implications of these distinctions for developing guidelines related to test construction for training teachers. The terminology used to describe and classify paper and pencil test questions frequently confuses two distinct features of questions:…
Toward a More Systematic Assessment of Smoking: Development of a Smoking Module for PROMIS®
Tucker, Joan S.; Shadel, William G.; Stucky, Brian D.; Cai, Li
2012-01-01
Introduction The aim of the PROMIS® Smoking Initiative is to develop, evaluate, and standardize item banks to assess cigarette smoking behavior and biopsychosocial constructs associated with smoking for both daily and non-daily smokers. Methods We used qualitative methods to develop the item pool (following the PROMIS® approach: e.g., literature search, “binning and winnowing” of items, and focus groups and cognitive interviews to finalize wording and format), and quantitative methods (e.g., factor analysis) to develop the item banks. Results We considered a total of 1622 extant items, and 44 new items for inclusion in the smoking item banks. A final set of 277 items representing 11 conceptual domains was selected for field testing in a national sample of smokers. Using data from 3021 daily smokers in the field test, an iterative series of exploratory factor analyses and project team discussions resulted in six item banks: Positive Consequences of Smoking (40 items), Smoking Dependence/Craving (55 items), Health Consequences of Smoking (26 items), Psychosocial Consequences of Smoking (37 items), Coping Aspects of Smoking (30 items), and Social Factors of Smoking (23 items). Conclusions Inclusion of a smoking domain in the PROMIS® framework will standardize measurement of key smoking constructs using state-of-the-art psychometric methods, and make them widely accessible to health care providers, smoking researchers and the large community of researchers using PROMIS® who might not otherwise include an assessment of smoking in their design. Next steps include reducing the number of items in each domain, conducting confirmatory analyses, and duplicating the process for non-daily smokers. PMID:22770824
Toward a more systematic assessment of smoking: development of a smoking module for PROMIS®.
Edelen, Maria O; Tucker, Joan S; Shadel, William G; Stucky, Brian D; Cai, Li
2012-11-01
The aim of the PROMIS® Smoking Initiative is to develop, evaluate, and standardize item banks to assess cigarette smoking behavior and biopsychosocial constructs associated with smoking for both daily and non-daily smokers. We used qualitative methods to develop the item pool (following the PROMIS® approach: e.g., literature search, "binning and winnowing" of items, and focus groups and cognitive interviews to finalize wording and format), and quantitative methods (e.g., factor analysis) to develop the item banks. We considered a total of 1622 extant items, and 44 new items for inclusion in the smoking item banks. A final set of 277 items representing 11 conceptual domains was selected for field testing in a national sample of smokers. Using data from 3021 daily smokers in the field test, an iterative series of exploratory factor analyses and project team discussions resulted in six item banks: Positive Consequences of Smoking (40 items), Smoking Dependence/Craving (55 items), Health Consequences of Smoking (26 items), Psychosocial Consequences of Smoking (37 items), Coping Aspects of Smoking (30 items), and Social Factors of Smoking (23 items). Inclusion of a smoking domain in the PROMIS® framework will standardize measurement of key smoking constructs using state-of-the-art psychometric methods, and make them widely accessible to health care providers, smoking researchers and the large community of researchers using PROMIS® who might not otherwise include an assessment of smoking in their design. Next steps include reducing the number of items in each domain, conducting confirmatory analyses, and duplicating the process for non-daily smokers. Copyright © 2012 Elsevier Ltd. All rights reserved.
The Development of the Motivation for Critical Reasoning in Online Discussions Inventory (MCRODI)
ERIC Educational Resources Information Center
Zhang, Tianyi; Koehler, Matthew J.; Spatariu, Alexandru
2009-01-01
This study was conducted to develop an inventory that measures students' motivation to engage in critical reasoning in online discussions. Inventory items were developed based on theoretical frameworks and then tested on 168 participants. Using exploratory factor analysis, test-retest reliability, and internal consistency, twenty-two items were…
Mitchell, Alex J; Smith, Adam B; Al-salihy, Zerak; Rahim, Twana A; Mahmud, Mahmud Q; Muhyaldin, Asma S
2011-10-01
We aimed to redefine the optimal self-report symptoms of depression suitable for creation of an item bank that could be used in computer adaptive testing or to develop a simplified screening tool for DSM-V. Four hundred subjects (200 patients with primary depression and 200 non-depressed subjects), living in Iraqi Kurdistan were interviewed. The Mini International Neuropsychiatric Interview (MINI) was used to define the presence of major depression (DSM-IV criteria). We examined symptoms of depression using four well-known scales delivered in Kurdish. The Partial Credit Model was applied to each instrument. Common-item equating was subsequently used to create an item bank and differential item functioning (DIF) explored for known subgroups. A symptom level Rasch analysis reduced the original 45 items to 24 items of the original after the exclusion of 21 misfitting items. A further six items (CESD13 and CESD17, HADS-D4, HADS-D5 and HADS-D7, and CDSS3 and CDSS4) were removed due to misfit as the items were added together to form the item bank, and two items were subsequently removed following the DIF analysis by diagnosis (CESD20 and CDSS9, both of which were harder to endorse for women). Therefore the remaining optimal item bank consisted of 17 items and produced an area under the curve (AUC) of 0.987. Using a bank restricted to the optimal nine items revealed only minor loss of accuracy (AUC = 0.989, sensitivity 96%, specificity 95%). Finally, when restricted to only four items accuracy was still high (AUC was still 0.976; sensitivity 93%, specificity 96%). An item bank of 17 items may be useful in computer adaptive testing and nine or even four items may be used to develop a simplified screening tool for DSM-V major depressive disorder (MDD). Further examination of this item bank should be conducted in different cultural settings.
Tulsky, David S; Kisala, Pamela A; Tate, Denise G; Spungen, Ann M; Kirshblum, Steven C
2015-05-01
To describe the development and psychometric properties of the Spinal Cord Injury--Quality of Life (SCI-QOL) Bladder Management Difficulties and Bowel Management Difficulties item banks and Bladder Complications scale. Using a mixed-methods design, a pool of items assessing bladder and bowel-related concerns were developed using focus groups with individuals with spinal cord injury (SCI) and SCI clinicians, cognitive interviews, and item response theory (IRT) analytic approaches, including tests of model fit and differential item functioning. Thirty-eight bladder items and 52 bowel items were tested at the University of Michigan, Kessler Foundation Research Center, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters VA Medical Center, Bronx, NY. Seven hundred fifty-seven adults with traumatic SCI. The final item banks demonstrated unidimensionality (Bladder Management Difficulties CFI=0.965; RMSEA=0.093; Bowel Management Difficulties CFI=0.955; RMSEA=0.078) and acceptable fit to a graded response IRT model. The final calibrated Bladder Management Difficulties bank includes 15 items, and the final Bowel Management Difficulties item bank consists of 26 items. Additionally, 5 items related to urinary tract infections (UTI) did not fit with the larger Bladder Management Difficulties item bank but performed relatively well independently (CFI=0.992, RMSEA=0.050) and were thus retained as a separate scale. The SCI-QOL Bladder Management Difficulties and Bowel Management Difficulties item banks are psychometrically robust and are available as computer adaptive tests or short forms. The SCI-QOL Bladder Complications scale is a brief, fixed-length outcomes instrument for individuals with a UTI.
Developing self-concept instrument for pre-service mathematics teachers
NASA Astrophysics Data System (ADS)
Afgani, M. W.; Suryadi, D.; Dahlan, J. A.
2018-01-01
This study aimed to develop self-concept instrument for undergraduate students of mathematics education in Palembang, Indonesia. Type of this study was development research of non-test instrument in questionnaire form. A Validity test of the instrument was performed with construct validity test by using Pearson product moment and factor analysis, while reliability test used Cronbach’s alpha. The instrument was tested by 65 undergraduate students of mathematics education in one of the universities at Palembang, Indonesia. The instrument consisted of 43 items with 7 aspects of self-concept, that were the individual concern, social identity, individual personality, view of the future, the influence of others who become role models, the influence of the environment inside or outside the classroom, and view of the mathematics. The result of validity test showed there was one invalid item because the value of Pearson’s r was 0.107 less than the critical value (0.244; α = 0.05). The item was included in social identity aspect. After the invalid item was removed, Construct validity test with factor analysis generated only one factor. The Kaiser-Meyer-Olkin (KMO) coefficient was 0.846 and reliability coefficient was 0.91. From that result, we concluded that the self-concept instrument for undergraduate students of mathematics education in Palembang, Indonesia was valid and reliable with 42 items.
Cappelleri, Joseph C; Jason Lundy, J; Hays, Ron D
2014-05-01
The US Food and Drug Administration's guidance for industry document on patient-reported outcomes (PRO) defines content validity as "the extent to which the instrument measures the concept of interest" (FDA, 2009, p. 12). According to Strauss and Smith (2009), construct validity "is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity" (p. 7). Hence, both qualitative and quantitative information are essential in evaluating the validity of measures. We review classical test theory and item response theory (IRT) approaches to evaluating PRO measures, including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized "difficulty" (severity) order of items is represented by observed responses. If a researcher has few qualitative data and wants to get preliminary information about the content validity of the instrument, then descriptive assessments using classical test theory should be the first step. As the sample size grows during subsequent stages of instrument development, confidence in the numerical estimates from Rasch and other IRT models (as well as those of classical test theory) would also grow. Classical test theory and IRT can be useful in providing a quantitative assessment of items and scales during the content-validity phase of PRO-measure development. Depending on the particular type of measure and the specific circumstances, the classical test theory and/or the IRT should be considered to help maximize the content validity of PRO measures. Copyright © 2014 Elsevier HS Journals, Inc. All rights reserved.
A validation study of public health knowledge, skills, social responsibility and applied learning.
Vackova, Dana; Chen, Coco K; Lui, Juliana N M; Johnston, Janice M
2018-06-22
To design and validate a questionnaire to measure medical students' Public Health (PH) knowledge, skills, social responsibility and applied learning as indicated in the four domains recommended by the Association of Schools & Programmes of Public Health (ASPPH). A cross-sectional study was conducted to develop an evaluation tool for PH undergraduate education through item generation, reduction, refinement and validation. The 74 preliminary items derived from the existing literature were reduced to 55 items based on expert panel review which included those with expertise in PH, psychometrics and medical education, as well as medical students. Psychometric properties of the preliminary questionnaire were assessed as follows: frequency of endorsement for item variance; principal component analysis (PCA) with varimax rotation for item reduction and factor estimation; Cronbach's Alpha, item-total correlation and test-retest validity for internal consistency and reliability. PCA yielded five factors: PH Learning Experience (6 items); PH Risk Assessment and Communication (5 items); Future Use of Evidence in Practice (6 items); Recognition of PH as a Scientific Discipline (4 items); and PH Skills Development (3 items), explaining 72.05% variance. Internal consistency and reliability tests were satisfactory (Cronbach's Alpha ranged from 0.87 to 0.90; item-total correlation > 0.59). Lower paired test-retest correlations reflected instability in a social science environment. An evaluation tool for community-centred PH education has been developed and validated. The tool measures PH knowledge, skills, social responsibilities and applied learning as recommended by the internationally recognised Association of Schools & Programmes of Public Health (ASPPH).
ERIC Educational Resources Information Center
Abed, Eman Rasmi; Al-Absi, Mohammad Mustafa; Abu shindi, Yousef Abdelqader
2016-01-01
The purpose of the present study is developing a test to measure the numerical ability for students of education. The sample of the study consisted of (504) students from 8 universities in Jordan. The final draft of the test contains 45 items distributed among 5 dimensions. The results revealed that acceptable psychometric properties of the test;…
Bhandari, T R; Dangal, G; Sarma, P S; Kutty, V R
2014-01-01
Women's autonomy is one of the predictors of maternal health care service utilization. This study aimed to construct and validate a scale for measuring women's autonomy with relevance to developing countries. We conducted a study for construction and validation of a scale in Rupandehi and further validated in Kapilvastu districts of Nepal. Initially, we administered a 24-item preliminary scale and finalized a 23-item scale using psychometric tests. After defining the construct of women's autonomy, we pooled 194 items and selected 24 items to develop a preliminary scale. The scale development process followed different steps i.e. definition of construct, generation of items pool, pretesting, analysis of psychometric test and further validation. The new scale was strongly supported by Cronbach's Alpha value (0.84), test-retest Pearson correlation (0.87), average content validity ratio (0.8) and overall agreement- Kappa value of the items (0.83) whereas all values were found satisfactory. From factor analysis, we selected 23 items for the final scale which show good convergent and discriminant validity. From preliminary draft, we removed one item; the remaining 23 items were loaded in five factors. All five factors had single loading items by suppressing absolute coefficient value less than 0.45 and average coefficient was more than 0.60 of each factor. Similarly, the factors and loaded items had good convergent and discriminant validity which further showed strong measurement capacity of the scale. The new scale is a reliable tool for assessing women's autonomy in developing countries. We recommend for further use and validation of the scale for ensuring the measurement capacity.
Flens, Gerard; Smits, Niels; Terwee, Caroline B; Dekker, Joost; Huijbrechts, Irma; Spinhoven, Philip; de Beurs, Edwin
2017-12-01
We used the Dutch-Flemish version of the USA PROMIS adult V1.0 item bank for Anxiety as input for developing a computerized adaptive test (CAT) to measure the entire latent anxiety continuum. First, psychometric analysis of a combined clinical and general population sample ( N = 2,010) showed that the 29-item bank has psychometric properties that are required for a CAT administration. Second, a post hoc CAT simulation showed efficient and highly precise measurement, with an average number of 8.64 items for the clinical sample, and 9.48 items for the general population sample. Furthermore, the accuracy of our CAT version was highly similar to that of the full item bank administration, both in final score estimates and in distinguishing clinical subjects from persons without a mental health disorder. We discuss the future directions and limitations of CAT development with the Dutch-Flemish version of the PROMIS Anxiety item bank.
Jacobson, C. Jeffrey; Kashikar-Zuck, Susmita; Farrell, Jennifer; Barnett, Kimberly; Goldschneider, Ken; Dampier, Carlton; Cunningham, Natoshia; Crosby, Lori; DeWitt, Esi Morgan
2015-01-01
As initial steps in a broader effort to develop and test pediatric Pain Behavior and Pain Quality item banks for the Patient Reported Outcomes Measurement Information System (PROMIS®), we employed qualitative interview and item review methods to 1) evaluate the overall conceptual scope and content validity of the PROMIS pain domain framework among children with chronic /recurrent pain conditions, and 2) develop item candidates for further psychometric testing. To elicit the experiential and conceptual scope of pain outcomes across a variety of pediatric recurrent/chronic pain conditions, we conducted semi-structured individual (32) and focus-group interviews (2) with children and adolescents (8–17 years), and parents of children with pain (individual (32) and focus group (2)). Interviews with pain experts (10) explored the operational limits of pain measurement in children. For item bank development, we identified existing items from measures in the literature, grouped them by concept, removed redundancies, and modified remaining items to match PROMIS formatting. New items were written as needed and cognitive debriefing was completed with children and their parents, resulting in 98 Pain Behavior (47 self, 51 proxy), 54 Quality and 4 Intensity items for further testing. Qualitative content analyses suggest that reportable pain outcomes that matter to children with pain are captured within and consistent with the pain domain framework in PROMIS. PMID:26335990
Sung, Vivian W.; Griffith, James W.; Rogers, Rebecca G.; Raker, Christina A.; Clark, Melissa A.
2016-01-01
Purpose Current patient-reported outcomes for female urinary incontinence (UI) are limited by their inability to be tailored. Our objective is to describe the development and field-testing of 7 item banks designed to measure domains identified as important UI in females (UIf). We also describe the calibration and validation properties of the UIf-item banks, which allow for more efficient computerized-adaptive testing (CAT) in the future. METHODS The UIf-measures included 168 items covering 7 domains: Stress UI (SUI), Overactive Bladder (OAB), Urinary Frequency, Physical, Social and Emotional Health Impact, and Adaptation. Items underwent rigorous qualitative development and psychometric testing across 2 sites. Items were calibrated using item response theory and evaluated for internal consistency, construct validity and responsiveness. RESULTS 750 women (249 SUI, 249 OAB, and 252 mixed UI) participated. Mean age was 55±14 years ,23% were Hispanic, 80% white. In addition to face and content validity, the measures demonstrated good internal consistency (coefficient alpha 0.92-0.98) and unidimensionality. There was evidence for construct validity with moderate to strong correlations with the UDI (r’s ≥ 0.6) and IIQ (r’s = ≥ 0.6) scales. The measures were responsive to change for SUI treatment (paired t-test p <.001, ES range=1.3 to 2.9; SRM range=1.3 to 2.5) and OAB treatment (paired t-test p <.05 for all domains except Social Health Impact and Adaptation, ES range=.3 to 1.5, SRM range=0.4 to 1.0). The measures were responsive based on concurrent changes with the UDI and IIQ (p < 0.05). CAT versions were developed and pilot tested. CONCLUSIONS The UIf-item banks demonstrate good psychometric characteristics and are a sufficiently valid set of customizable tools for measuring UI symptoms and life impact. PMID:26732514
Clinton-McHarg, Tara; Carey, Mariko; Sanson-Fisher, Rob; D'Este, Catherine; Shakeshaft, Anthony
2012-01-30
Adolescents and young adult (AYA) cancer survivors may have unique physical, psychological and social needs due to their cancer occurring at a critical phase of development. The aim of this study was to develop a psychometrically rigorous measure of unmet need to capture the specific needs of this group. Items were developed following a comprehensive literature review, focus groups with AYAs, and feedback from health care providers, researchers and other professionals. The measure was pilot tested with 32 AYA cancer survivors recruited through a state-based cancer registry to establish face and content validity. A main sample of 139 AYA cancer patients and survivors were recruited through seven treatment centres and invited to complete the questionnaire. To establish test-retest reliability, a sub-sample of 34 participants completed the measure a second time. Exploratory factor analysis was performed and the measure was assessed for internal consistency, discriminative validity, potential responsiveness and acceptability. The Cancer Needs Questionnaire - Young People (CNQ-YP) has established face and content validity, and acceptability. The final measure has 70 items and six factors: Treatment Environment and Care (33 items); Feelings and Relationships (14 items); Daily Life (12 items); Information and Activities (5 items); Education (3 items); and Work (3 items). All domains achieved Cronbach's alpha values greater than 0.80. Item-to-item test-retest reliability was also high, with all but four items reaching weighted kappa values above 0.60. The CNQ-YP is the first multi-dimensional measure of unmet need which has been developed specifically for AYA cancer patients and survivors. The measure displays a strong factor structure, and excellent internal consistency and test-retest reliability. However, the small sample size has implications for the reliability of the statistical analyses undertaken, particularly the exploratory factor analysis. Future studies with a larger sample are recommended to confirm the factor structure of the measure. Longitudinal studies to establish responsiveness and predictive validity should also be undertaken.
2012-01-01
Background Adolescents and young adult (AYA) cancer survivors may have unique physical, psychological and social needs due to their cancer occurring at a critical phase of development. The aim of this study was to develop a psychometrically rigorous measure of unmet need to capture the specific needs of this group. Methods Items were developed following a comprehensive literature review, focus groups with AYAs, and feedback from health care providers, researchers and other professionals. The measure was pilot tested with 32 AYA cancer survivors recruited through a state-based cancer registry to establish face and content validity. A main sample of 139 AYA cancer patients and survivors were recruited through seven treatment centres and invited to complete the questionnaire. To establish test-retest reliability, a sub-sample of 34 participants completed the measure a second time. Exploratory factor analysis was performed and the measure was assessed for internal consistency, discriminative validity, potential responsiveness and acceptability. Results The Cancer Needs Questionnaire - Young People (CNQ-YP) has established face and content validity, and acceptability. The final measure has 70 items and six factors: Treatment Environment and Care (33 items); Feelings and Relationships (14 items); Daily Life (12 items); Information and Activities (5 items); Education (3 items); and Work (3 items). All domains achieved Cronbach's alpha values greater than 0.80. Item-to-item test-retest reliability was also high, with all but four items reaching weighted kappa values above 0.60. Conclusions The CNQ-YP is the first multi-dimensional measure of unmet need which has been developed specifically for AYA cancer patients and survivors. The measure displays a strong factor structure, and excellent internal consistency and test-retest reliability. However, the small sample size has implications for the reliability of the statistical analyses undertaken, particularly the exploratory factor analysis. Future studies with a larger sample are recommended to confirm the factor structure of the measure. Longitudinal studies to establish responsiveness and predictive validity should also be undertaken. PMID:22284545
Development, Validation, and Use of an Item Bank for Police Promotion Examinations.
ERIC Educational Resources Information Center
Enger, John M.
In Arkansas, in reaction to complaints about traditional methods of selection for promotion, the civil service commission has chosen to base promotions in the police department solely on scores on locally-developed objective tests. Items developed and loaded into a computerized test bank were selected from six areas of responsibility: (1) criminal…
Item Response Models for Examinee-Selected Items
ERIC Educational Resources Information Center
Wang, Wen-Chung; Jin, Kuan-Yu; Qiu, Xue-Lan; Wang, Lei
2012-01-01
In some tests, examinees are required to choose a fixed number of items from a set of given items to answer. This practice creates a challenge to standard item response models, because more capable examinees may have an advantage by making wiser choices. In this study, we developed a new class of item response models to account for the choice…
Carbonneau, Elise; Robitaille, Julie; Lamarche, Benoît; Corneau, Louise; Lemieux, Simone
2017-08-01
The present study aimed to develop and validate a questionnaire assessing perceived food environment in a French-Canadian population. A questionnaire, the Perceived Food Environment Questionnaire, was developed assessing perceived accessibility to healthy (nine items) and unhealthy foods (three items). A pre-test sample was recruited for a pilot testing of the questionnaire. For the validation study, another sample was recruited and completed the questionnaire twice. Exploratory factor analysis was performed on the items to assess the number of factors (subscales). Cronbach's α was used to measure internal consistency reliability. Test-retest reliability was assessed with Pearson correlations. Online survey. Men and women from the Québec City area (n 31 in the pre-test sample; n 150 in the validation study sample). The pilot testing did not lead to any change in the questionnaire. The exploratory factor analysis revealed a two-subscale structure. The first subscale is composed of six items assessing accessibility to healthy foods and the second includes three items related to accessibility to unhealthy foods. Three items were removed from the questionnaire due to low loading on the two subscales. The subscales demonstrated adequate internal consistency (Cronbach's α=0·77 for healthy foods and 0·62 for unhealthy foods) and test-retest reliability (r=0·59 and 0·60, respectively; both P<0·0001). The Perceived Food Environment Questionnaire was developed for a French-Canadian population and demonstrated good psychometric properties. Further validation is recommended if the questionnaire is to be used in other populations.
The development of Metacognition test in genetics laboratory for undergraduate students
NASA Astrophysics Data System (ADS)
A-nongwech, Nattapong; Pruekpramool, Chaninan
2018-01-01
The purpose of this research was to develop a Metacognition test in a Genetics Laboratory for undergraduate students. The participants were 30 undergraduate students of a Rajabhat university in Rattanakosin group in the second semester of the 2016 academic year using purposive sampling. The research instrument consisted of 1) Metacognition test and 2) a Metacognition test evaluation form for experts focused on three main points which were an accurate evaluation form of content, a consistency between Metacognition experiences and questions and the appropriateness of the test. The quality of the test was analyzed by using the Index of Consistency (IOC), discrimination and reliability. The results of developing Metacognition test were summarized as 1) The result of developing Metacognition test in a Genetics Laboratory for undergraduate students found that the Metacognition test contained 56 items of open - ended questions. The test composed of 1) four scientific situations, 2) fourteen items of open - ended questions in each scientific situation for evaluating components of Metacognition. The components of Metacognition consisted of Metacognitive knowledge, which were divided into person knowledge, task knowledge and strategy knowledge and Metacognitive experience, which were divided into planning, monitoring and evaluating, and 3) fourteen items of scoring criteria divided into four scales. 2) The results of the item analysis of Metacognition in Genetics Laboratory for undergraduate students found that Index of Consistency between Metacognitive experiences and questions were in the range between 0.75 - 1.00. An accuracy of content equaled 1.00. The appropriateness of the test equaled 1.00 in all situations and items. The discrimination of the test was in the range between 0.00 - 0.73. Furthermore, the reliability of the test equaled 0.97.
Online Calibration of Polytomous Items Under the Generalized Partial Credit Model
Zheng, Yi
2016-01-01
Online calibration is a technology-enhanced architecture for item calibration in computerized adaptive tests (CATs). Many CATs are administered continuously over a long term and rely on large item banks. To ensure test validity, these item banks need to be frequently replenished with new items, and these new items need to be pretested before being used operationally. Online calibration dynamically embeds pretest items in operational tests and calibrates their parameters as response data are gradually obtained through the continuous test administration. This study extends existing formulas, procedures, and algorithms for dichotomous item response theory models to the generalized partial credit model, a popular model for items scored in more than two categories. A simulation study was conducted to investigate the developed algorithms and procedures under a variety of conditions, including two estimation algorithms, three pretest item selection methods, three seeding locations, two numbers of score categories, and three calibration sample sizes. Results demonstrated acceptable estimation accuracy of the two estimation algorithms in some of the simulated conditions. A variety of findings were also revealed for the interacted effects of included factors, and recommendations were made respectively. PMID:29881063
Aldekhayel, Salah A; Alselaim, Nahar A; Magzoub, Mohi Eldin; Al-Qattan, Mohammad M; Al-Namlah, Abdullah M; Tamim, Hani; Al-Khayal, Abdullah; Al-Habdan, Sultan I; Zamakhshary, Mohammed F
2012-10-24
Script Concordance Test (SCT) is a new assessment tool that reliably assesses clinical reasoning skills. Previous descriptions of developing SCT-question banks were merely subjective. This study addresses two gaps in the literature: 1) conducting the first phase of a multistep validation process of SCT in Plastic Surgery, and 2) providing an objective methodology to construct a question bank based on SCT. After developing a test blueprint, 52 test items were written. Five validation questions were developed and a validation survey was established online. Seven reviewers were asked to answer this survey. They were recruited from two countries, Saudi Arabia and Canada, to improve the test's external validity. Their ratings were transformed into percentages. Analysis was performed to compare reviewers' ratings by looking at correlations, ranges, means, medians, and overall scores. Scores of reviewers' ratings were between 76% and 95% (mean 86% ± 5). We found poor correlations between reviewers (Pearson's: +0.38 to -0.22). Ratings of individual validation questions ranged between 0 and 4 (on a scale 1-5). Means and medians of these ranges were computed for each test item (mean: 0.8 to 2.4; median: 1 to 3). A subset of test items comprising 27 items was generated based on a set of inclusion and exclusion criteria. This study proposes an objective methodology for validation of SCT-question bank. Analysis of validation survey is done from all angles, i.e., reviewers, validation questions, and test items. Finally, a subset of test items is generated based on a set of criteria.
Development and Validation of the Poverty Attributions Survey
ERIC Educational Resources Information Center
Bennett, Robert M.; Raiz, Lisa; Davis, Tamara S.
2016-01-01
This article describes the process of developing and testing the Poverty Attribution Survey (PAS), a measure of poverty attributions. The PAS is theory based and includes original items as well as items from previously tested poverty attribution instruments. The PAS was electronically administered to a sample of state-licensed professional social…
Physical performance testing in mucopolysaccharidosis I: a pilot study.
Dumas, Helene M; Fragala, Maria A; Haley, Stephen M; Skrinar, Alison M; Wraith, James E; Cox, Gerald F
2004-01-01
To develop and field-test a physical performance measure (MPS-PPM) for individuals with Mucopolysaccharidosis I (MPS I), a rare genetic disorder. Motor performance and endurance items were developed based on literature review, clinician feedback, feasibility, and equipment and training needs. A standardized testing protocol and scoring rules were created. The MPS-PPM includes: Arm Function (7 items), Leg Function (5 items), and Endurance (2 items). Pilot data were collected for 10 subjects (ages 5-29 years). We calculated Spearman's rho correlations between age, severity and summary z-scores on the MPS-PPM. Subjects had variable presentations, as correlations among the three sub-test scores were not significant. Increasing age was related to greater severity in physical performance (r = 0.72, p<0.05) and lower scores on the Leg Function (r = -0.67, p<0.05) and Endurance (r = -0.65, p<0.05) sub-tests. The MPS-PPM was sensitive to detecting physical performance deficits, as six subjects could not complete the full battery of Arm Function items and eight subjects were unable to complete all Leg Function items. Subjects walked more slowly and expended more energy than typically developing peers. Individuals with MPS I have difficulty with arm and leg function and reduced endurance. The MPS-PPM is a clinically feasible measure that detects limitations in physical performance and may have potential to quantify changes in function following intervention. Copyright 2004 Taylor and Francis Ltd.
Enhancing self-report assessment of PTSD: development of an item bank.
Del Vecchio, Nicole; Elwy, A Rani; Smith, Eric; Bottonari, Kathryn A; Eisen, Susan V
2011-04-01
The authors report results of work to enhance self-report posttraumatic stress disorder (PTSD) assessment by developing an item bank for use in a computer-adapted test. Computer-adapted tests have great potential to decrease the burden of PTSD assessment and outcomes monitoring. The authors conducted a systematic literature review of PTSD instruments, created a database of items, performed qualitative review and readability analysis, and conducted cognitive interviews with veterans diagnosed with PTSD. The systematic review yielded 480 studies in which 41 PTSD instruments comprising 993 items met inclusion criteria. The final PTSD item bank includes 104 items representing each of the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV; American Psychiatric Association [APA], 1994), PTSD symptom clusters (reexperiencing, avoidance, and hyperarousal), and 3 additional subdomains (depersonalization, guilt, and sexual problems) that expanded the assessment item pool. Copyright © 2011 International Society for Traumatic Stress Studies.
Pilkonis, Paul A.; Yu, Lan; Dodds, Nathan E.; Johnston, Kelly L.; Lawrence, Suzanne; Hilton, Thomas F.; Daley, Dennis C.; Patkar, Ashwin A.; McCarty, Dennis
2015-01-01
Background Two item banks for substance use were developed as part of the Patient-Reported Outcomes Measurement Information System (PROMIS®): severity of substance use and positive appeal of substance use. Methods Qualitative item analysis (including focus groups, cognitive interviewing, expert review, and item revision) reduced an initial pool of more than 5,300 items for substance use to 119 items included in field testing. Items were written in a first-person, past-tense format, with 5 response options reflecting frequency or severity. Both 30-day and 3-month time frames were tested. The calibration sample of 1,336 respondents included 875 individuals from the general population (ascertained through an internet panel) and 461patients from addiction treatment centers participating in the National Drug Abuse Treatment Clinical Trials Network. Results Final banks of 37 and 18 items were calibrated for severity of substance use and positive appeal of substance use, respectively, using the two-parameter graded response model from item response theory (IRT). Initial calibrations were similar for the 30-day and 3-month time frames, and final calibrations used data combined across the time frames, making the items applicable with either interval. Seven-item static short forms were also developed from each item bank. Conclusions Test information curves showed that the PROMIS item banks provided substantial information in a broad range of severity, making them suitable for treatment, observational, and epidemiological research in both clinical and community settings. PMID:26423364
Development of Taiwan Undergraduates' Volunteer Service Motivation Scale
ERIC Educational Resources Information Center
Ho-Tang, Wu; Chin-Tang, Tu; Mei-Ju, Chou; Jing-Fang, Hou; Meng-Shan, Lei
2016-01-01
This study aims to develop Taiwan undergraduates' volunteer service motivation scale. To begin with, item pool was proposed on the basis of literature. After discussing with three Taiwan undergraduates, item pool, exploratory factor analysis (EFA) (N = 150) was proceeded, where three tests were conducted EFA: 1. Item analysis: comparisons of…
Jacobson, C Jeffrey; Kashikar-Zuck, Susmita; Farrell, Jennifer; Barnett, Kimberly; Goldschneider, Ken; Dampier, Carlton; Cunningham, Natoshia; Crosby, Lori; DeWitt, Esi Morgan
2015-12-01
As initial steps in a broader effort to develop and test pediatric pain behavior and pain quality item banks for the Patient-Reported Outcomes Measurement Information System (PROMIS), we used qualitative interview and item review methods to 1) evaluate the overall conceptual scope and content validity of the PROMIS pain domain framework among children with chronic/recurrent pain conditions, and 2) develop item candidates for further psychometric testing. To elicit the experiential and conceptual scope of pain outcomes across a variety of pediatric recurrent/chronic pain conditions, we conducted 32 semi-structured individual and 2 focus-group interviews with children and adolescents (8-17 years), and 32 individual and 2 focus-group interviews with parents of children with pain. Interviews with pain experts (10) explored the operational limits of pain measurement in children. For item bank development, we identified existing items from measures in the literature, grouped them by concept, removed redundancies, and modified the remaining items to match PROMIS formatting. New items were written as needed and cognitive debriefing was completed with the children and their parents, resulting in 98 pain behavior (47 self, 51 proxy), 54 quality, and 4 intensity items for further testing. Qualitative content analyses suggest that reportable pain outcomes that matter to children with pain are captured within and consistent with the pain domain framework in PROMIS. PROMIS pediatric pain behavior, quality, and intensity items were developed based on a theoretical framework of pain that was evaluated by multiple stakeholders in the measurement of pediatric pain, including researchers, clinicians, and children with pain and their parents, and the appropriateness of the framework was verified. Copyright © 2015 American Pain Society. Published by Elsevier Inc. All rights reserved.
Development of multiple choice pictorial test for measuring the dimensions of knowledge
NASA Astrophysics Data System (ADS)
Nahadi, Siswaningsih, Wiwi; Erna
2017-05-01
This study aims to develop a multiple choice pictorial test as a tool to measure dimension of knowledge in chemical equilibrium subject. The method used is Research and Development and validation that was conducted in the preliminary studies and model development. The product is multiple choice pictorial test. The test was developed by 22 items and tested to 64 high school students in XII grade. The quality of test was determined by value of validity, reliability, difficulty index, discrimination power, and distractor effectiveness. The validity of test was determined by CVR calculation using 8 validators (4 university teachers and 4 high school teachers) with average CVR value 0,89. The reliability of test has very high category with value 0,87. Discrimination power of items with a very good category is 32%, 59% as good category, and 20% as sufficient category. This test has a varying level of difficulty, item with difficult category is 23%, the medium category is 50%, and the easy category is 27%. The distractor effectiveness of items with a very poor category is 1%, poor category is 1%, medium category is 4%, good category is 39%, and very good category is 55%. The dimension of knowledge that was measured consist of factual knowledge, conceptual knowledge, and procedural knowledge. Based on the questionnaire, students responded quite well to the developed test and most of the students like this kind of multiple choice pictorial test that include picture as evaluation tool compared to the naration tests was dominated by text.
The development and psychometric validation of the Ethical Awareness Scale.
Milliken, Aimee; Ludlow, Larry; DeSanto-Madeya, Susan; Grace, Pamela
2018-04-19
To develop and psychometrically assess the Ethical Awareness Scale using Rasch measurement principles and a Rasch item response theory model. Critical care nurses must be equipped to provide good (ethical) patient care. This requires ethical awareness, which involves recognizing the ethical implications of all nursing actions. Ethical awareness is imperative in successfully addressing patient needs. Evidence suggests that the ethical import of everyday issues may often go unnoticed by nurses in practice. Assessing nurses' ethical awareness is a necessary first step in preparing nurses to identify and manage ethical issues in the highly dynamic critical care environment. A cross-sectional design was used in two phases of instrument development. Using Rasch principles, an item bank representing nursing actions was developed (33 items). Content validity testing was performed. Eighteen items were selected for face validity testing. Two rounds of operational testing were performed with critical care nurses in Boston between February-April 2017. A Rasch analysis suggests sufficient item invariance across samples and sufficient construct validity. The analysis further demonstrates a progression of items uniformly along a hierarchical continuum; items that match respondent ability levels; response categories that are sufficiently used; and adequate internal consistency. Mean ethical awareness scores were in the low/moderate range. The results suggest the Ethical Awareness Scale is a psychometrically sound, reliable and valid measure of ethical awareness in critical care nurses. © 2018 John Wiley & Sons Ltd.
Developing Computerized Tests for Classroom Teachers: A Pilot Study.
ERIC Educational Resources Information Center
Glowacki, Margaret L.; And Others
Two types of computerized testing have been defined: (1) computer-based testing, using a computer to administer conventional tests in which all examinees take the same set of items; and (2) adaptive tests, in which items are selected for administration by the computer, based on examinee's previous responses. This paper discusses an option for…
A Framework for Examining the Utility of Technology-Enhanced Items
ERIC Educational Resources Information Center
Russell, Michael
2016-01-01
Interest in and use of technology-enhanced items has increased over the past decade. Given the additional time required to administer many technology-enhanced items and the increased expense required to develop them, it is important for testing programs to consider the utility of technology-enhanced items. The Technology-Enhanced Item Utility…
Statistical Approaches to the Study of Item Difficulty.
ERIC Educational Resources Information Center
Olson, John F.; And Others
Traditionally, item difficulty has been defined in terms of the performance of examinees. For test development purposes, a more useful concept would be some kind of intrinsic item difficulty, defined in terms of the item's content, context, or characteristics and the task demands set by the item. In this investigation, the measurement literature…
The Multidimensional Assessment of Interoceptive Awareness (MAIA)
Mehling, Wolf E.; Price, Cynthia; Daubenmier, Jennifer J.; Acree, Mike; Bartmess, Elizabeth; Stewart, Anita
2012-01-01
This paper describes the development of a multidimensional self-report measure of interoceptive body awareness. The systematic mixed-methods process involved reviewing the current literature, specifying a multidimensional conceptual framework, evaluating prior instruments, developing items, and analyzing focus group responses to scale items by instructors and patients of body awareness-enhancing therapies. Following refinement by cognitive testing, items were field-tested in students and instructors of mind-body approaches. Final item selection was achieved by submitting the field test data to an iterative process using multiple validation methods, including exploratory cluster and confirmatory factor analyses, comparison between known groups, and correlations with established measures of related constructs. The resulting 32-item multidimensional instrument assesses eight concepts. The psychometric properties of these final scales suggest that the Multidimensional Assessment of Interoceptive Awareness (MAIA) may serve as a starting point for research and further collaborative refinement. PMID:23133619
Subjective health literacy: Development of a brief instrument for school-aged children.
Paakkari, Olli; Torppa, Minna; Kannas, Lasse; Paakkari, Leena
2016-12-01
The present paper focuses on the measurement of health literacy (HL), which is an important determinant of health and health behaviours. HL starts to develop in childhood and adolescence; hence, there is a need for instruments to monitor HL among younger age groups. These instruments are still rare. The aim of the project reported here was, therefore, to develop a brief, multidimensional, theory-based instrument to measure subjective HL among school-aged children. The development of the instrument covered four phases: item generation based on a conceptual framework; a pilot study ( n = 405); test-retest ( n = 117); and construction of the instrument ( n = 3853). All the samples were taken from Finnish 7th and 9th graders. Initially, 65 items were generated, of which 32 items were selected for the pilot study. After item reduction, the instrument contained 16 items. The test-retest phase produced estimates of stability. In the final phase a 10-item instrument was constructed, referred to as Health Literacy for School-Aged Children (HLSAC). The instrument exhibited a high Cronbach alpha (0.93), and included two items from each of the five predetermined theoretical components (theoretical knowledge, practical knowledge, critical thinking, self-awareness, citizenship). The iterative and validity-driven development process made it possible to construct a brief multidimensional HLSAC instrument. Such instruments are suitable for large-scale studies, and for use with children and adolescents. Validation will require further testing for use in other countries.
Automatic Item Generation: A More Efficient Process for Developing Mathematics Achievement Items?
ERIC Educational Resources Information Center
Embretson, Susan E.; Kingston, Neal M.
2018-01-01
The continual supply of new items is crucial to maintaining quality for many tests. Automatic item generation (AIG) has the potential to rapidly increase the number of items that are available. However, the efficiency of AIG will be mitigated if the generated items must be submitted to traditional, time-consuming review processes. In two studies,…
Assessment of Differential Item Functioning under Cognitive Diagnosis Models: The DINA Model Example
ERIC Educational Resources Information Center
Li, Xiaomin; Wang, Wen-Chung
2015-01-01
The assessment of differential item functioning (DIF) is routinely conducted to ensure test fairness and validity. Although many DIF assessment methods have been developed in the context of classical test theory and item response theory, they are not applicable for cognitive diagnosis models (CDMs), as the underlying latent attributes of CDMs are…
A Comparison of Four Item-Selection Methods for Severely Constrained CATs
ERIC Educational Resources Information Center
He, Wei; Diao, Qi; Hauser, Carl
2014-01-01
This study compared four item-selection procedures developed for use with severely constrained computerized adaptive tests (CATs). Severely constrained CATs refer to those adaptive tests that seek to meet a complex set of constraints that are often not conclusive to each other (i.e., an item may contribute to the satisfaction of several…
Item response theory analysis of the mechanics baseline test
NASA Astrophysics Data System (ADS)
Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.
2012-02-01
Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.
Development of the Attributed Dignity Scale.
Jacelon, Cynthia S; Dixon, Jane; Knafl, Kathleen A
2009-07-01
A sequential, multi-method approach to instrument development beginning with concept analysis, followed by (a) item generation from qualitative data, (b) review of items by expert and lay person panels, (c) cognitive appraisal interviews, (d) pilot testing, and (e) evaluating construct validity was used to develop a measure of attributed dignity in older adults. The resulting positively scored, 23-item scale has three dimensions: Self-Value, Behavioral Respect-Self, and Behavioral Respect-Others. Item-total correlations in the pilot study ranged from 0.39 to 0.85. Correlations between the Attributed Dignity Scale (ADS) and both Rosenberg's Self-Esteem Scale (0.17) and Crowne and Marlowe's Social Desirability Scale (0.36) were modest and in the expected direction, indicating attributed dignity is a related but independent concept. Next steps include testing the ADS with a larger sample to complete factor analysis, test-retest stability, and further study of the relationships between attributed dignity and other concepts.
Cohen, Matthew L; Kisala, Pamela A; Dyson-Hudson, Trevor A; Tulsky, David S
2018-05-01
To develop modern patient-reported outcome measures that assess pain interference and pain behavior after spinal cord injury (SCI). Grounded-theory based qualitative item development; large-scale item calibration field-testing; confirmatory factor analyses; graded response model item response theory analyses; statistical linking techniques to transform scores to the Patient Reported Outcome Measurement Information System (PROMIS) metric. Five SCI Model Systems centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. N/A. Spinal Cord Injury - Quality of Life (SCI-QOL) Pain Interference item bank, SCI-QOL Pain Interference short form, and SCI-QOL Pain Behavior scale. Seven hundred fifty-seven individuals with traumatic SCI completed 58 items addressing various aspects of pain. Items were then separated by whether they assessed pain interference or pain behavior, and poorly functioning items were removed. Confirmatory factor analyses confirmed that each set of items was unidimensional, and item response theory analyses were used to estimate slopes and thresholds for the items. Ultimately, 7 items (4 from PROMIS) comprised the Pain Behavior scale and 25 items (18 from PROMIS) comprised the Pain Interference item bank. Ten of these 25 items were selected to form the Pain Interference short form. The SCI-QOL Pain Interference item bank and the SCI-QOL Pain Behavior scale demonstrated robust psychometric properties. The Pain Interference item bank is available as a computer adaptive test or short form for research and clinical applications, and scores are transformed to the PROMIS metric.
Rosneck, James S; Hughes, Joel; Gunstad, John; Josephson, Richard; Noe, Donald A; Waechter, Donna
2014-01-01
This article describes the systematic construction and psychometric analysis of a knowledge assessment instrument for phase II cardiac rehabilitation (CR) patients measuring risk modification disease management knowledge and behavioral outcomes derived from national standards relevant to secondary prevention and management of cardiovascular disease. First, using adult curriculum based on disease-specific learning outcomes and competencies, a systematic test item development process was completed by clinical staff. Second, a panel of educational and clinical experts used an iterative process to identify test content domain and arrive at consensus in selecting items meeting criteria. Third, the resulting 31-question instrument, the Cardiac Knowledge Assessment Tool (CKAT), was piloted in CR patients to ensure use of application. Validity and reliability analyses were performed on 3638 adults before test administrations with additional focused analyses on 1999 individuals completing both pretreatment and posttreatment administrations within 6 months. Evidence of CKAT content validity was substantiated, with 85% agreement among content experts. Evidence of construct validity was demonstrated via factor analysis identifying key underlying factors. Estimates of internal consistency, for example, Cronbach's α = .852 and Spearman-Brown split-half reliability = 0.817 on pretesting, support test reliability. Item analysis, using point biserial correlation, measured relationships between performance on single items and total score (P < .01). Analyses using item difficulty and item discrimination indices further verified item stability and validity of the CKAT. A knowledge instrument specifically designed for an adult CR population was systematically developed and tested in a large representative patient population, satisfying psychometric parameters, including validity and reliability.
A Framework for the Development of Computerized Adaptive Tests
ERIC Educational Resources Information Center
Thompson, Nathan A.; Weiss, David J.
2011-01-01
A substantial amount of research has been conducted over the past 40 years on technical aspects of computerized adaptive testing (CAT), such as item selection algorithms, item exposure controls, and termination criteria. However, there is little literature providing practical guidance on the development of a CAT. This paper seeks to collate some…
Issues and Procedures in the Development of Criterion Referenced Tests.
ERIC Educational Resources Information Center
Klein, Stephen P.; Kosecoff, Jacqueline
The basic steps and procedures in the development of criterion referenced tests (CRT), as well as the issues and problems associated with these activities are discussed. In the first section of the paper, the discussions focus upon the purpose and defining characteristics of CRTs, item construction and selection, improving item quality, content…
Development and Testing of the Church Environment Audit Tool.
Kaczynski, Andrew T; Jake-Schoffman, Danielle E; Peters, Nathan A; Dunn, Caroline G; Wilcox, Sara; Forthofer, Melinda
2018-05-01
In this paper, we describe development and reliability testing of a novel tool to evaluate the physical environment of faith-based settings pertaining to opportunities for physical activity (PA) and healthy eating (HE). Tool development was a multistage process including a review of similar tools, stakeholder review, expert feedback, and pilot testing. Final tool sections included indoor opportunities for PA, outdoor opportunities for PA, food preparation equipment, kitchen type, food for purchase, beverages for purchase, and media. Two independent audits were completed at 54 churches. Interrater reliability (IRR) was determined with Kappa and percent agreement. Of 218 items, 102 were assessed for IRR and 116 could not be assessed because they were not present at enough churches. Percent agreement for all 102 items was over 80%. For 42 items, the sample was too homogeneous to assess Kappa. Forty-six of the remaining items had Kappas greater than 0.60 (25 items 0.80-1.00; 21 items 0.60-0.79), indicating substantial to almost perfect agreement. The tool proved reliable and efficient for assessing church environments and identifying potential intervention points. Future work can focus on applications within faith-based partnerships to understand how church environments influence diverse health outcomes.
Development of and Field-Test Results for the CAHPS PCMH Survey
Scholle, Sarah Hudson; Vuong, Oanh; Ding, Lin; Fry, Stephanie; Gallagher, Patricia; Brown, Julie A.; Hays, Ron D.; Cleary, Paul D.
2017-01-01
Objective To develop and evaluate survey questions that assess processes of care relevant to Patient-Centered Medical Homes (PCMHs). Research Design We convened expert panels, reviewed evidence on effective care practices and existing surveys, elicited broad public input, and conducted cognitive interviews and a field test to develop items relevant to PCMHs that could be added to the CAHPS® Clinician & Group (CG-CAHPS) 1.0 Survey. Surveys were tested using a two-contact mail protocol in 10 adult and 33 pediatric practices (both private and community health centers) in Massachusetts. A total of 4,875 completed surveys were received (overall response rate of 25%). Analyses We calculated the rate of valid responses for each item. We conducted exploratory factor analyses and estimated item-to-total correlations, individual and site level reliability, and correlations among proposed multi-item composites. Results Ten items in four new domains (Comprehensiveness, Information, Self-Management Support, and Shared Decision-Making) and four items in two existing domains (Access and Coordination of Care) were selected to be supplemental items to be used in conjunction with the adult CG-CAHPS 1.0 survey. For the child version, four items in each of two new domains (Information and Self-Management Support) and five items in existing domains (Access, Comprehensiveness-Prevention, Coordination of Care) were selected. Conclusions This study provides support for the reliability and validity of new items to supplement the CG-CAHPS 1.0 survey to assess aspects of primary care that are important attributes of Patient-Centered Medical Homes. PMID:23064272
Widger, Kimberley; Tourangeau, Ann E; Steele, Rose; Streiner, David L
2015-01-01
The field of pediatric palliative care is hindered by the lack of a well-defined, reliable, and valid method for measuring the quality of end-of-life care. The study purpose was to develop and test an instrument to measure mothers' perspectives on the quality of care received before, at the time of, and following a child's death. In Phase 1, key components of quality end-of-life care for children were synthesized through a comprehensive review of research literature. These key components were validated in Phase 2 and then extended through focus groups with bereaved parents. In Phase 3, items were developed to assess structures, processes, and outcomes of quality end-of-life care then tested for content and face validity with health professionals. Cognitive testing was conducted through interviews with bereaved parents. In Phase 4, bereaved mothers were recruited through 10 children's hospitals/hospices in Canada to complete the instrument, and psychometric testing was conducted. Following review of 67 manuscripts and 3 focus groups with 10 parents, 141 items were initially developed. The overall content validity index for these items was 0.84 as rated by 7 health professionals. Based on feedback from health professionals and cognitive testing with 6 parents, a 144-item instrument was finalized for further testing. In Phase 4, 128 mothers completed the instrument, 31 of whom completed it twice. Test-retest reliability, internal consistency, and construct validity were demonstrated for six subscales: Connect With Families, Involve Parents, Share Information With Parents, Share Information Among Health Professionals, Support Parents, and Provide Care at Death. Additional items with content validity were grouped in four domains: Support the Child, Support Siblings, Provide Bereavement Follow-up, and Structures of Care. Forty-eight items were deleted through psychometric testing, leaving a 95-item instrument. There is good initial evidence for the reliability and validity of this new quality of end-of-life care instrument as a mechanism for evaluative feedback to health professionals, health systems, and policy makers to improve children's end-of-life care.
NASA Astrophysics Data System (ADS)
Ilich, Maria O.
Psychometricians and test developers evaluate standardized tests for potential bias against groups of test-takers by using differential item functioning (DIF). English language learners (ELLs) are a diverse group of students whose native language is not English. While they are still learning the English language, they must take their standardized tests for their school subjects, including science, in English. In this study, linguistic complexity was examined as a possible source of DIF that may result in test scores that confound science knowledge with a lack of English proficiency among ELLs. Two years of fifth-grade state science tests were analyzed for evidence of DIF using two DIF methods, Simultaneous Item Bias Test (SIBTest) and logistic regression. The tests presented a unique challenge in that the test items were grouped together into testlets---groups of items referring to a scientific scenario to measure knowledge of different science content or skills. Very large samples of 10, 256 students in 2006 and 13,571 students in 2007 were examined. Half of each sample was composed of Spanish-speaking ELLs; the balance was comprised of native English speakers. The two DIF methods were in agreement about the items that favored non-ELLs and the items that favored ELLs. Logistic regression effect sizes were all negligible, while SIBTest flagged items with low to high DIF. A decrease in socioeconomic status and Spanish-speaking ELL diversity may have led to inconsistent SIBTest effect sizes for items used in both testing years. The DIF results for the testlets suggested that ELLs lacked sufficient opportunity to learn science content. The DIF results further suggest that those constructed response test items requiring the student to draw a conclusion about a scientific investigation or to plan a new investigation tended to favor ELLs.
Richards, Rickelle; Brown, Lora Beth; Williams, D Pauline; Eggett, Dennis L
2017-02-01
Develop a questionnaire to measure students' knowledge, attitude, behavior, self-efficacy, and environmental factors related to the use of canned foods. The Knowledge-Attitude-Behavior Model, Social Cognitive Theory, and Canned Foods Alliance survey were used as frameworks for questionnaire development. Cognitive interviews were conducted with college students (n = 8). Nutrition and survey experts assessed content validity. Reliability was measured via Cronbach α and 2 rounds (1, n = 81; 2, n = 65) of test-retest statistics. Means and frequencies were used. The 65-item questionnaire had a test-retest reliability of .69. Cronbach α scores were .87 for knowledge (9 items), .86 for attitude (30 items), .80 for self-efficacy (12 items), .68 for canned foods use (8 items), and .30 for environment (6 items). A reliable questionnaire was developed to measure perceptions and use of canned foods. Nutrition educators may find this questionnaire useful to evaluate pretest-posttest changes from canned foods-based interventions among college students. Copyright © 2016 Society for Nutrition Education and Behavior. Published by Elsevier Inc. All rights reserved.
Rutgers, D R; van Raamt, F; van Lankeren, W; Ravesloot, C J; van der Gijp, A; Ten Cate, Th J; van Schaik, J P J
2018-05-01
To describe the development of the Dutch Radiology Progress Test (DRPT) for knowledge testing in radiology residency training in The Netherlands from its start in 2003 up to 2016. We reviewed all DRPTs conducted since 2003. We assessed key changes and events in the test throughout the years, as well as resident participation and dispensation for the DRPT, test reliability and discriminative power of test items. The DRPT has been conducted semi-annually since 2003, except for 2015 when one digital DRPT failed. Key changes in these years were improvements in test analysis and feedback, test digitalization (2013) and inclusion of test items on nuclear medicine (2016). From 2003 to 2016, resident dispensation rates increased (Pearson's correlation coefficient 0.74, P-value <0.01) to maximally 16 %. Cronbach´s alpha for test reliability varied between 0.83 and 0.93. The percentage of DRPT test items with negative item-rest-correlations, indicating relatively poor discriminative power, varied between 4 % and 11 %. Progress testing has proven feasible and sustainable in Dutch radiology residency training, keeping up with innovations in the radiological profession. Test reliability and discriminative power of test items have remained fair over the years, while resident dispensation rates have increased. • Progress testing allows for monitoring knowledge development from novice to senior trainee. • In postgraduate medical training, progress testing is used infrequently. • Progress testing is feasible and sustainable in radiology residency training.
Harrison, Peter M C; Collins, Tom; Müllensiefen, Daniel
2017-06-15
Modern psychometric theory provides many useful tools for ability testing, such as item response theory, computerised adaptive testing, and automatic item generation. However, these techniques have yet to be integrated into mainstream psychological practice. This is unfortunate, because modern psychometric techniques can bring many benefits, including sophisticated reliability measures, improved construct validity, avoidance of exposure effects, and improved efficiency. In the present research we therefore use these techniques to develop a new test of a well-studied psychological capacity: melodic discrimination, the ability to detect differences between melodies. We calibrate and validate this test in a series of studies. Studies 1 and 2 respectively calibrate and validate an initial test version, while Studies 3 and 4 calibrate and validate an updated test version incorporating additional easy items. The results support the new test's viability, with evidence for strong reliability and construct validity. We discuss how these modern psychometric techniques may also be profitably applied to other areas of music psychology and psychological science in general.
Chen, Cheng-Te; Chen, Yu-Lan; Lin, Yu-Ching; Hsieh, Ching-Lin; Tzeng, Jeng-Yi
2018-01-01
Objective The purpose of this study was to construct a computerized adaptive test (CAT) for measuring self-care performance (the CAT-SC) in children with developmental disabilities (DD) aged from 6 months to 12 years in a content-inclusive, precise, and efficient fashion. Methods The study was divided into 3 phases: (1) item bank development, (2) item testing, and (3) a simulation study to determine the stopping rules for the administration of the CAT-SC. A total of 215 caregivers of children with DD were interviewed with the 73-item CAT-SC item bank. An item response theory model was adopted for examining the construct validity to estimate item parameters after investigation of the unidimensionality, equality of slope parameters, item fitness, and differential item functioning (DIF). In the last phase, the reliability and concurrent validity of the CAT-SC were evaluated. Results The final CAT-SC item bank contained 56 items. The stopping rules suggested were (a) reliability coefficient greater than 0.9 or (b) 14 items administered. The results of simulation also showed that 85% of the estimated self-care performance scores would reach a reliability higher than 0.9 with a mean test length of 8.5 items, and the mean reliability for the rest was 0.86. Administering the CAT-SC could reduce the number of items administered by 75% to 84%. In addition, self-care performances estimated by the CAT-SC and the full item bank were very similar to each other (Pearson r = 0.98). Conclusion The newly developed CAT-SC can efficiently measure self-care performance in children with DD whose performances are comparable to those of TD children aged from 6 months to 12 years as precisely as the whole item bank. The item bank of the CAT-SC has good reliability and a unidimensional self-care construct, and the CAT can estimate self-care performance with less than 25% of the items in the item bank. Therefore, the CAT-SC could be useful for measuring self-care performance in children with DD in clinical and research settings. PMID:29561879
Chen, Cheng-Te; Chen, Yu-Lan; Lin, Yu-Ching; Hsieh, Ching-Lin; Tzeng, Jeng-Yi; Chen, Kuan-Lin
2018-01-01
The purpose of this study was to construct a computerized adaptive test (CAT) for measuring self-care performance (the CAT-SC) in children with developmental disabilities (DD) aged from 6 months to 12 years in a content-inclusive, precise, and efficient fashion. The study was divided into 3 phases: (1) item bank development, (2) item testing, and (3) a simulation study to determine the stopping rules for the administration of the CAT-SC. A total of 215 caregivers of children with DD were interviewed with the 73-item CAT-SC item bank. An item response theory model was adopted for examining the construct validity to estimate item parameters after investigation of the unidimensionality, equality of slope parameters, item fitness, and differential item functioning (DIF). In the last phase, the reliability and concurrent validity of the CAT-SC were evaluated. The final CAT-SC item bank contained 56 items. The stopping rules suggested were (a) reliability coefficient greater than 0.9 or (b) 14 items administered. The results of simulation also showed that 85% of the estimated self-care performance scores would reach a reliability higher than 0.9 with a mean test length of 8.5 items, and the mean reliability for the rest was 0.86. Administering the CAT-SC could reduce the number of items administered by 75% to 84%. In addition, self-care performances estimated by the CAT-SC and the full item bank were very similar to each other (Pearson r = 0.98). The newly developed CAT-SC can efficiently measure self-care performance in children with DD whose performances are comparable to those of TD children aged from 6 months to 12 years as precisely as the whole item bank. The item bank of the CAT-SC has good reliability and a unidimensional self-care construct, and the CAT can estimate self-care performance with less than 25% of the items in the item bank. Therefore, the CAT-SC could be useful for measuring self-care performance in children with DD in clinical and research settings.
Computer Assisted Assembly of Tests at Educational Testing Service.
ERIC Educational Resources Information Center
Educational Testing Service, Princeton, NJ.
Two basic requirements for the successful initiation of a program for test assembly are the development of detailed item content classification systems and the delineation of the professional judgements made in building a test from a pool of items to detailed content, ability, and statistical specifications in terms precise enough to be translated…
Single Event Effect (SEE) Test Planning 101
NASA Technical Reports Server (NTRS)
LaBel, Kenneth A.; Pellish, Jonathan; Berg, Melanie D.
2011-01-01
This is a course on SEE Test Plan development. It is an introductory discussion of the items that go into planning an SEE test that should complement the SEE test methodology used. Material will only cover heavy ion SEE testing and not proton, LASER, or other though many of the discussed items may be applicable. While standards and guidelines for how-to perform single event effects (SEE) testing have existed almost since the first cyclotron testing, guidance on the development of SEE test plans has not been as easy to find. In this section of the short course, we attempt to rectify this lack. We consider the approach outlined here as a "living" document: mission specific constraints and new technology related issues always need to be taken into account. We note that we will use the term "test planning" in the context of those items being included in a test plan.
Yun, Young Ho; Kang, Eun Kyo; Lee, Jihye; Choo, Jiyeon; Ryu, Hyewon; Yun, Hye-Min; Kang, Jung Hun; Kim, Tae You; Sim, Jin-Ah; Kim, Yaeji
2018-03-05
In this study, we aimed to develop and validate an instrument that could be used by patients with cancer to evaluate their quality of palliative care. Development of the questionnaire followed the four-phase process: item generation and reduction, construction, pilot testing, and field testing. Based on the literature, we constructed a list of items for the quality of palliative care from 104 quality care issues divided into 14 subscales. We constructed scales of 43 items that only the cancer patients were asked to answer. Using relevance and feasibility criteria and pilot testing, we developed a 44-item questionnaire. To assess the sensitivity and validity of the questionnaire, we recruited 220 patients over 18 years of age from three Korean hospitals. Factor analysis of the data and fit statistics process resulted in the 4-factor, 32-item Quality Care Questionnaire-Palliative Care (QCQ-PC), which covers appropriate communication with health care professionals (ten items), discussing value of life and goals of care (nine items), support and counseling for needs of holistic care (seven items), and accessibility and sustainability of care (six items). All subscales and total scores showed a high internal consistency (Cronbach alpha range, 0.89 to 0.97). Multi-trait scaling analysis showed good convergent (0.568-0.995) and discriminant (0.472-0.869) validity. The correlation between the total and subscale scores of QCQ-PC and those of EORTC QLQ-C15-PAL, MQOL, SAT-SF, and DCS was obtained. This study demonstrates that the QCQ-PC can be adopted to assess the quality of care in patients with cancer.
The EORTC CAT Core-The computer adaptive version of the EORTC QLQ-C30 questionnaire.
Petersen, Morten Aa; Aaronson, Neil K; Arraras, Juan I; Chie, Wei-Chu; Conroy, Thierry; Costantini, Anna; Dirven, Linda; Fayers, Peter; Gamper, Eva-Maria; Giesinger, Johannes M; Habets, Esther J J; Hammerlid, Eva; Helbostad, Jorunn; Hjermstad, Marianne J; Holzner, Bernhard; Johnson, Colin; Kemmler, Georg; King, Madeleine T; Kaasa, Stein; Loge, Jon H; Reijneveld, Jaap C; Singer, Susanne; Taphoorn, Martin J B; Thamsborg, Lise H; Tomaszewski, Krzysztof A; Velikova, Galina; Verdonck-de Leeuw, Irma M; Young, Teresa; Groenvold, Mogens
2018-06-21
To optimise measurement precision, relevance to patients and flexibility, patient-reported outcome measures (PROMs) should ideally be adapted to the individual patient/study while retaining direct comparability of scores across patients/studies. This is achievable using item banks and computerised adaptive tests (CATs). The European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Questionnaire Core 30 (QLQ-C30) is one of the most widely used PROMs in cancer research and clinical practice. Here we provide an overview of the research program to develop CAT versions of the QLQ-C30's 14 functional and symptom domains. The EORTC Quality of Life Group's strategy for developing CAT item banks consists of: literature search to identify potential candidate items; formulation of new items compatible with the QLQ-C30 item style; expert evaluations and patient interviews; field-testing and psychometric analyses, including factor analysis, item response theory calibration and simulation of measurement properties. In addition, software for setting up, running and scoring CAT has been developed. Across eight rounds of data collections, 9782 patients were recruited from 12 countries for the field-testing. The four phases of development resulted in a total of 260 unique items across the 14 domains. Each item bank consists of 7-34 items. Psychometric evaluations indicated higher measurement precision and increased statistical power of the CAT measures compared to the QLQ-C30 scales. Using CAT, sample size requirements may be reduced by approximately 20-35% on average without loss of power. The EORTC CAT Core represents a more precise, powerful and flexible measurement system than the QLQ-C30. It is currently being validated in a large independent, international sample of cancer patients. Copyright © 2018 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Defeyter, Margaret Anne; Russo, Riccardo; McPartlin, Pamela Louise
2009-01-01
Items studied as pictures are better remembered than items studied as words even when test items are presented as words. The present study examined the development of this picture superiority effect in recognition memory. Four groups ranging in age from 7 to 20 years participated. They studied words and pictures, with test stimuli always presented…
ERIC Educational Resources Information Center
Brackenbury, Tim; Zickar, Michael J.; Munson, Benjamin; Storkel, Holly L.
2017-01-01
Purpose: Item response theory (IRT) is a psychometric approach to measurement that uses latent trait abilities (e.g., speech sound production skills) to model performance on individual items that vary by difficulty and discrimination. An IRT analysis was applied to preschoolers' productions of the words on the Goldman-Fristoe Test of…
Sol, Marleen Elisabeth; Verschuren, Olaf; de Groot, Laura; de Groot, Janke Frederike
2017-02-13
Wheelchair mobility skills (WMS) training is regarded by children using a manual wheelchair and their parents as an important factor to improve participation and daily physical activity. Currently, there is no outcome measure available for the evaluation of WMS in children. Several wheelchair mobility outcome measures have been developed for adults, but none of these have been validated in children. Therefore the objective of this study is to develop a WMS outcome measure for children using the current knowledge from literature in combination with the clinical expertise of health care professionals, children and their parents. Mixed methods approach. Phase 1: Item identification of WMS items through a systematic review using the 'COnsensus-based Standards for the selection of health Measurement Instruments' (COSMIN) recommendations. Phase 2: Item selection and validation of relevant WMS items for children, using a focus group and interviews with children using a manual wheelchair, their parents and health care professionals. Phase 3: Feasibility of the newly developed Utrecht Pediatric Wheelchair Mobility Skills Test (UP-WMST) through pilot testing. Phase 1: Data analysis and synthesis of nine WMS related outcome measures showed there is no widely used outcome measure with levels of evidence across all measurement properties. However, four outcome measures showed some levels of evidence on reliability and validity for adults. Twenty-two WMS items with the best clinimetric properties were selected for further analysis in phase 2. Phase 2: Fifteen items were deemed as relevant for children, one item needed adaptation and six items were considered not relevant for assessing WMS in children. Phase 3: Two health care professionals administered the UP-WMST in eight children. The instructions of the UP-WMST were clear, but the scoring method of the height difference items needed adaptation. The outdoor items for rolling over soft surface and the side slope item were excluded in the final version of the UP-WMST due to logistic reasons. The newly developed 15 item UP-WMST is a validated outcome measure which is easy to administer in children using a manual wheelchair. More research regarding reliability, construct validity and responsiveness is warranted before the UP-WMST can be used in practice.
Michel, Pierre; Auquier, Pascal; Baumstarck, Karine; Pelletier, Jean; Loundou, Anderson; Ghattas, Badih; Boyer, Laurent
2015-09-01
Quality of life (QoL) measurements are considered important outcome measures both for research on multiple sclerosis (MS) and in clinical practice. Computerized adaptive testing (CAT) can improve the precision of measurements made using QoL instruments while reducing the burden of testing on patients. Moreover, a cross-cultural approach is also necessary to guarantee the wide applicability of CAT. The aim of this preliminary study was to develop a calibrated item bank that is available in multiple languages and measures QoL related to mental health by combining one generic (SF-36) and one disease-specific questionnaire (MusiQoL). Patients with MS were enrolled in this international, multicenter, cross-sectional study. The psychometric properties of the item bank were based on classical test and item response theories and approaches, including the evaluation of unidimensionality, item response theory model fitting, and analyses of differential item functioning (DIF). Convergent and discriminant validities of the item bank were examined according to socio-demographic, clinical, and QoL features. A total of 1992 patients with MS and from 15 countries were enrolled in this study to calibrate the 22-item bank developed in this study. The strict monotonicity of the Cronbach's alpha curve, the high eigenvalue ratio estimator (5.50), and the adequate CFA model fit (RMSEA = 0.07 and CFI = 0.95) indicated that a strong assumption of unidimensionality was warranted. The infit mean square statistic ranged from 0.76 to 1.27, indicating a satisfactory item fit. DIF analyses revealed no item biases across geographical areas, confirming the cross-cultural equivalence of the item bank. External validity testing revealed that the item bank scores correlated significantly with QoL scores but also showed discriminant validity for socio-demographic and clinical characteristics. This work demonstrated satisfactory psychometric characteristics for a QoL item bank for MS in multiple languages. This work may offer a common measure for the assessment of QoL in different cultural contexts and for international studies conducted on MS.
Dirven, Linda; Groenvold, Mogens; Taphoorn, Martin J B; Conroy, Thierry; Tomaszewski, Krzysztof A; Young, Teresa; Petersen, Morten Aa
2017-11-01
The European Organisation of Research and Treatment of Cancer (EORTC) Quality of Life Group is developing computerized adaptive testing (CAT) versions of all EORTC Quality of Life Questionnaire (QLQ-C30) scales with the aim to enhance measurement precision. Here we present the results on the field-testing and psychometric evaluation of the item bank for cognitive functioning (CF). In previous phases (I-III), 44 candidate items were developed measuring CF in cancer patients. In phase IV, these items were psychometrically evaluated in a large sample of international cancer patients. This evaluation included an assessment of dimensionality, fit to the item response theory (IRT) model, differential item functioning (DIF), and measurement properties. A total of 1030 cancer patients completed the 44 candidate items on CF. Of these, 34 items could be included in a unidimensional IRT model, showing an acceptable fit. Although several items showed DIF, these had a negligible impact on CF estimation. Measurement precision of the item bank was much higher than the two original QLQ-C30 CF items alone, across the whole continuum. Moreover, CAT measurement may on average reduce study sample sizes with about 35-40% compared to the original QLQ-C30 CF scale, without loss of power. A CF item bank for CAT measurement consisting of 34 items was established, applicable to various cancer patients across countries. This CAT measurement system will facilitate precise and efficient assessment of HRQOL of cancer patients, without loss of comparability of results.
Development and validation of the Current Opioid Misuse Measure.
Butler, Stephen F; Budman, Simon H; Fernandez, Kathrine C; Houle, Brian; Benoit, Christine; Katz, Nathaniel; Jamison, Robert N
2007-07-01
Clinicians recognize the importance of monitoring aberrant medication-related behaviors of chronic pain patients while being prescribed opioid therapy. The purpose of this study was to develop and validate the Current Opioid Misuse Measure (COMM) for those pain patients already on long-term opioid therapy. An initial pool of 177 items was developed with input from 26 pain management and addiction specialists. Concept mapping identified six primary concepts underlying medication misuse, which were used to develop an initial item pool. Twenty-two pain and addiction specialists rated the items on importance and relevance, resulting in selection of a 40-item alpha COMM. Final item selection was based on empirical evaluation of items with patients taking opioids for chronic, noncancer pain (N=227). One-week test-retest reliability was examined with 55 participants. All participants were administered the alpha version of the COMM, the Prescription Drug Use Questionnaire (PDUQ) interview, and submitted a urine sample for toxicology screening. Physician ratings of patient aberrant behaviors were also obtained. Of the 40 items, 17 items appeared to adequately measure aberrant behavior, demonstrating excellent internal consistency and test-retest reliability. Cutoff scores were examined using ROC curve analysis and reasonable sensitivity and specificity were established. To evaluate the COMM's ability to capture change in patient status, it was tested on a subset of patients (N=86) that were followed and reassessed three months later. The COMM was found to have promise as a brief, self-report measure of current aberrant drug-related behavior. Further cross-validation and replication of these preliminary results is pending.
Huang, Wenhao; Chapman-Novakofski, Karen M
2017-01-01
Background The extensive availability and increasing use of mobile apps for nutrition-based health interventions makes evaluation of the quality of these apps crucial for integration of apps into nutritional counseling. Objective The goal of this research was the development, validation, and reliability testing of the app quality evaluation (AQEL) tool, an instrument for evaluating apps’ educational quality and technical functionality. Methods Items for evaluating app quality were adapted from website evaluations, with additional items added to evaluate the specific characteristics of apps, resulting in 79 initial items. Expert panels of nutrition and technology professionals and app users reviewed items for face and content validation. After recommended revisions, nutrition experts completed a second AQEL review to ensure clarity. On the basis of 150 sets of responses using the revised AQEL, principal component analysis was completed, reducing AQEL into 5 factors that underwent reliability testing, including internal consistency, split-half reliability, test-retest reliability, and interrater reliability (IRR). Two additional modifiable constructs for evaluating apps based on the age and needs of the target audience as selected by the evaluator were also tested for construct reliability. IRR testing using intraclass correlations (ICC) with all 7 constructs was conducted, with 15 dietitians evaluating one app. Results Development and validation resulted in the 51-item AQEL. These were reduced to 25 items in 5 factors after principal component analysis, plus 9 modifiable items in two constructs that were not included in principal component analysis. Internal consistency and split-half reliability of the following constructs derived from principal components analysis was good (Cronbach alpha >.80, Spearman-Brown coefficient >.80): behavior change potential, support of knowledge acquisition, app function, and skill development. App purpose split half-reliability was .65. Test-retest reliability showed no significant change over time (P>.05) for all but skill development (P=.001). Construct reliability was good for items assessing age appropriateness of apps for children, teens, and a general audience. In addition, construct reliability was acceptable for assessing app appropriateness for various target audiences (Cronbach alpha >.70). For the 5 main factors, ICC (1,k) was >.80, with a P value of <.05. When 15 nutrition professionals evaluated one app, ICC (2,15) was .98, with a P value of <.001 for all 7 constructs when the modifiable items were specified for adults seeking weight loss support. Conclusions Our preliminary effort shows that AQEL is a valid, reliable instrument for evaluating nutrition apps’ qualities for clinical interventions by nutrition clinicians, educators, and researchers. Further efforts in validating AQEL in various contexts are needed. PMID:29079554
Su, Bi-ying; Liu, Shao-nan; Li, Xiao-yan
2011-11-01
To study the train of thoughts and procedures for developing the theoretical framework and the item pool of the peri-operative recovery scale for integrative medicine, thus making preparation for the development of this scale and psychometric testing. Under the guidance for Chinese medicine theories and the guidance for developing psychometric scale, the theoretical framework and the item pool of the scale were initially laid out by literature retrieval, and expert consultation, etc. The scale covered the domains of physical function, mental function, activity function, pain, and general assessment. Besides, social function is involved, which is suitable for pre-operative testing and long-term therapeutic efficacy testing after discharge from hospital. Each domain should cover correlated Zang-Fu organs, qi, blood, and the patient-reported outcomes. Totally 122 items were initially covered in the item pool according to theoretical framework of the scale. The peri-operative recovery scale of integrative medicine was the embodiment of the combination of Chinese medicine theories and patient-reported outcome concepts. The scale could reasonably assess the peri-operative recovery outcomes of patients treated by integrative medicine.
Chan, Raymond Javan; Yates, Patsy; McCarthy, Alexandra L
Fatigue is one of the most distressing and commonly experienced symptoms in patients with advanced cancer. Although the self-management (SM) of cancer-related symptoms has received increasing attention, no research instrument assessing fatigue SM outcomes for patients with advanced cancer is available. The aim of this study was to describe the development and preliminary testing of an interviewer-administered instrument for assessing the frequency and perceived levels of effectiveness and self-efficacy associated with fatigue SM behaviors in patients with advanced cancer. The development and testing of the Self-efficacy in Managing Symptoms Scale-Fatigue Subscale for Patients With Advanced Cancer (SMSFS-A) involved a number of procedures: item generation using a comprehensive literature review and semistructured interviews, content validity evaluation using expert panel reviews, and face validity and test-retest reliability evaluation using pilot testing. Initially, 23 items (22 specific behaviors with 1 global item) were generated from the literature review and semistructured interviews. After 2 rounds of expert panel review, the final scale was reduced to 17 items (16 behaviors with 1 global item). Participants in the pilot test (n = 10) confirmed that the questions in this scale were clear and easy to understand. Bland-Altman analysis showed agreement of results over a 1-week interval. The SMSFS-A items were generated using multiple sources. This tool demonstrated preliminary validity and reliability. The SMSFS-A has the potential to be used for clinical and research purposes. Nurses can use this instrument for collecting data to inform the initiation of appropriate fatigue SM support for this population.
Tulsky, David S; Kisala, Pamela A; Kalpakjian, Claire Z; Bombardier, Charles H; Pohlig, Ryan T; Heinemann, Allen W; Carle, Adam; Choi, Seung W
2015-05-01
To develop a calibrated spinal cord injury-quality of life (SCI-QOL) item bank, computer adaptive test (CAT), and short form to assess depressive symptoms experienced by individuals with SCI, transform scores to the Patient Reported Outcomes Measurement Information System (PROMIS) metric, and create a crosswalk to the Patient Health Questionnaire (PHQ)-9. We used grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, item response theory (IRT) analyses, and statistical linking techniques to transform scores to a PROMIS metric and to provide a crosswalk with the PHQ-9. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. Spinal Cord Injury--Quality of Life (SCI-QOL) Depression Item Bank Individuals with SCI were involved in all phases of SCI-QOL development. A sample of 716 individuals with traumatic SCI completed 35 items assessing depression, 18 of which were PROMIS items. After removing 7 non-PROMIS items, factor analyses confirmed a unidimensional pool of items. We used a graded response IRT model to estimate slopes and thresholds for the 28 retained items. The SCI-QOL Depression measure correlated 0.76 with the PHQ-9. The SCI-QOL Depression item bank provides a reliable and sensitive measure of depressive symptoms with scores reported in terms of general population norms. We provide a crosswalk to the PHQ-9 to facilitate comparisons between measures. The item bank may be administered as a CAT or as a short form and is suitable for research and clinical applications.
Developing a prelicensure exam for Canada: an international collaboration.
Hobbins, Bonnie; Bradley, Pat
2013-01-01
Nine previously conducted studies indicate that Elsevier's HESI Exit Exam (E(2)) is 96.36%-99.16% accurate in predicting success on the National Council Licensure Examination for Registered Nurses. No similar standardized exam is available in Canada to predict Canadian Registered Nurse Examination (CRNE) success. Like the E(2), such an exam could be used to evaluate Canadian nursing students' preparedness for the CRNE, and scores on the numerous subject matter categories could be used to guide students' remediation efforts so that, ultimately, they are successful on their first attempt at taking the CRNE. The international collaboration between a HESI test construction expert and a nursing faculty member from Canada, who served as the content expert, resulted in the development of a 180-item, multiple-choice/single-answer prelicensure exam (PLE) that was pilot tested with Canadian nursing students (N = 175). Item analysis data obtained from this pilot testing were used to develop a 160-item PLE, which includes an additional 20 pilot test items. The estimated reliability of this exam is 0.91, and it exhibits congruent validity with the CRNE because the PLE test blueprint mimics the CRNE test blueprint. Copyright © 2013 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Dumas, Helene M.
2010-01-01
The PEDI-CAT is a new computer adaptive test (CAT) version of the Pediatric Evaluation of Disability Inventory (PEDI). Additional PEDI-CAT items specific to postacute pediatric hospital care were recently developed using expert reviews and cognitive interviewing techniques. Expert reviews established face and construct validity, providing positive…
Development of Abbreviated Nine-Item Forms of the Raven's Standard Progressive Matrices Test
ERIC Educational Resources Information Center
Bilker, Warren B.; Hansen, John A.; Brensinger, Colleen M.; Richard, Jan; Gur, Raquel E.; Gur, Ruben C.
2012-01-01
The Raven's Standard Progressive Matrices (RSPM) is a 60-item test for measuring abstract reasoning, considered a nonverbal estimate of fluid intelligence, and often included in clinical assessment batteries and research on patients with cognitive deficits. The goal was to develop and apply a predictive model approach to reduce the number of items…
Extending LMS to Support IRT-Based Assessment Test Calibration
NASA Astrophysics Data System (ADS)
Fotaris, Panagiotis; Mastoras, Theodoros; Mavridis, Ioannis; Manitsaris, Athanasios
Developing unambiguous and challenging assessment material for measuring educational attainment is a time-consuming, labor-intensive process. As a result Computer Aided Assessment (CAA) tools are becoming widely adopted in academic environments in an effort to improve the assessment quality and deliver reliable results of examinee performance. This paper introduces a methodological and architectural framework which embeds a CAA tool in a Learning Management System (LMS) so as to assist test developers in refining items to constitute assessment tests. An Item Response Theory (IRT) based analysis is applied to a dynamic assessment profile provided by the LMS. Test developers define a set of validity rules for the statistical indices given by the IRT analysis. By applying those rules, the LMS can detect items with various discrepancies which are then flagged for review of their content. Repeatedly executing the aforementioned procedure can improve the overall efficiency of the testing process.
76 FR 43286 - National Assessment Governing Board; Meeting
Federal Register 2010, 2011, 2012, 2013, 2014
2011-07-20
... levels for each grade and subject tested, developing standards and procedures for interstate and national... in closed session to review secure test items for the 2012 Economics assessment at grade 12 and the... meeting the ADC will complete their review of secure NAEP test items for the 2012 Economics assessment at...
Item response theory and the measurement of motor behavior.
Safrit, M J; Cohen, A S; Costa, M G
1989-12-01
Item response theory (IRT) has been the focus of intense research and development activity in educational and psychological measurement during the past decade. Because this theory can provide more precise information about test items than other theories usually used in measuring motor behavior, the application of IRT in physical education and exercise science merits investigation. In IRT, the difficulty level of each item (e.g., trial or task) can be estimated and placed on the same scale as the ability of the examinee. Using this information, the test developer can determine the ability levels at which the test functions best. Equating the scores of individuals on two or more items or tests can be handled efficiently by applying IRT. The precision of the identification of performance standards in a mastery test context can be enhanced, as can adaptive testing procedures. In this tutorial, several potential benefits of applying IRT to the measurement of motor behavior were described. An example is provided using bowling data and applying the graded-response form of the Rasch IRT model. The data were calibrated and the goodness of fit was examined. This analysis is described in a step-by-step approach. Limitations to using an IRT model with a test consisting of repeated measures were noted.
Putting Interoperability to the Test: Building a Large Reusable Assessment Item Bank
ERIC Educational Resources Information Center
Sclater, Niall; MacDonald, Mary
2004-01-01
The COLA project has been developing a large bank of assessment items for units across the Scottish further education curriculum since May 2003. These will be made available to learners mainly via colleges' virtual learning environments (VLEs). Many people have been involved in the development of the COLA assessment item bank to ensure a high…
ERIC Educational Resources Information Center
Martinková, Patricia; Drabinová, Adéla; Liaw, Yuan-Ling; Sanders, Elizabeth A.; McFarland, Jenny L.; Price, Rebecca M.
2017-01-01
We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments. After explaining a number of methodological approaches, we test for gender bias in two scenarios that demonstrate why DIF analysis is crucial for developing assessments, particularly because…
Classification Scheme for Items in CAAT.
ERIC Educational Resources Information Center
Epstein, Marion G.
In planning the development of the system for computer assisted assembly of tests, it was agreed at the outset that one of the basic requirements for the successful initiation of any such system would be the development of a detailed item content classification system. The design of the system for classifying item content is a key element in…
ERIC Educational Resources Information Center
Kim, Jiyoung; Chi, Youngshin; Huensch, Amanda; Jun, Heesung; Li, Hongli; Roullion, Vanessa
2010-01-01
This article discusses a case study on an item writing process that reflects on our practical experience in an item development project. The purpose of the article is to share our lessons from the experience aiming to demystify item writing process. The study investigated three issues that naturally emerged during the project: how item writers use…
An Instrument to Predict Job Performance of Home Health Aides--Testing the Reliability and Validity.
ERIC Educational Resources Information Center
Sturges, Jack; Quina, Patricia
The development of four paper-and-pencil tests, useful in assessing the effectiveness of inservice training provided to either nurses aides or home health aides, was described. These tests were designed for utilization in employment selection and case assignment. Two tests of 37 multiple-choice items and two tests of 10 matching items were…
Using the Rasch Measurement Model in Psychometric Analysis of the Family Effectiveness Measure
McCreary, Linda L.; Conrad, Karen M.; Conrad, Kendon J.; Scott, Christy K; Funk, Rodney R.; Dennis, Michael L.
2013-01-01
Background Valid assessment of family functioning can play a vital role in optimizing client outcomes. Because family functioning is influenced by family structure, socioeconomic context, and culture, existing measures of family functioning--primarily developed with nuclear, middle class European American families--may not be valid assessments of families in diverse populations. The Family Effectiveness Measure was developed to address this limitation. Objectives To test the Family Effectiveness Measure with data from a primarily low-income African American convenience sample, using the Rasch measurement model. Method A sample of 607 adult women completed the measure. Rasch analysis was used to assess unidimensionality, response category functioning, item fit, person reliability, differential item functioning by race and parental status, and item hierarchy. Criterion-related validity was tested using correlations with five other variables related to family functioning. Results The Family Effectiveness Measure measures two separate constructs: The effective family functioning construct was a psychometrically sound measure of the target construct that was more efficient due to the deletion of 22 items. The ineffective family functioning construct consisted of 16 of those deleted items but was not as strong psychometrically. Items in both constructs evidenced no differential item functioning by race. Criterion-related validity was supported for both. Discussion In contrast to the prevailing conceptualization that family functioning is a single construct, assessed by positively and negatively worded items, use of the Rasch analysis suggested the existence of two constructs. While the effective family functioning is a strong and efficient measure of family functioning, the ineffective family functioning will require additional item development and psychometric testing. PMID:23636342
Kelly, Laura; Ziebland, Sue; Jenkinson, Crispin
2015-11-01
Health-related websites have developed to be much more than information sites: they are used to exchange experiences and find support as well as information and advice. This paper documents the development of a tool to compare the potential consequences and experiences a person may encounter when using health-related websites. Questionnaire items were developed following a review of relevant literature and qualitative secondary analysis of interviews relating to experiences of health. Item reduction steps were performed on pilot survey data (n=167). Tests of validity and reliability were subsequently performed (n=170) to determine the psychometric properties of the questionnaire. Two independent item pools entered psychometric testing: (1) Items relating to general views of using the internet in relation to health and, (2) Items relating to the consequences of using a specific health-related website. Identified sub-scales were found to have high construct validity, internal consistency and test-retest reliability. Analyses confirmed good psychometric properties in the eHIQ-Part 1 (11 items) and the eHIQ-Part 2 (26 items). This tool will facilitate the measurement of the potential consequences of using websites containing different types of material (scientific facts and figures, blogs, experiences, images) across a range of health conditions. Copyright © 2015 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Role of Cognitive Testing in the Development of the CAHPS® Hospital Survey
Levine, Roger E; Fowler, Floyd J; Brown, Julie A
2005-01-01
Objective To describe how cognitive testing results were used to inform the modification and selection of items for the Consumer Assessment of Health Providers and Systems (CAHPS®) Hospital Survey pilot test instrument. Data Sources Cognitive interviews were conducted on 31 subjects in two rounds of testing: in December 2002–January 2003 and in February 2003. In both rounds, interviews were conducted in northern California, southern California, Massachusetts, and North Carolina. Study Design A common protocol served as the basis for cognitive testing activities in each round. This protocol was modified to enable testing of the items as interviewer-administered and self-administered items and to allow members of each of three research teams to use their preferred cognitive research tools. Data Collection/Extraction Methods Each research team independently summarized, documented, and reported their findings. Item-specific and general issues were noted. The results were reviewed and discussed by senior staff from each research team after each round of testing, to inform the acceptance, modification, or elimination of candidate items. Principal Findings Many candidate items required modification because respondents lacked the information required to answer them, respondents failed to understand them consistently, the items were not measuring the constructs they were intended to measure, the items were based on erroneous assumptions about what respondents wanted or experienced during their hospitalization, or the items were asking respondents to make distinctions that were too fine for them to make. Cognitive interviewing enabled the detection of these problems; an understanding of the etiology of the problem informed item revisions. However, for some constructs, the revisions proved to be inadequate. Accordingly, items could not be developed to provide acceptable measures of certain constructs such as shared decision making, coordination of care, and delays in the admissions process. Conclusions Cognitive testing is the most direct way of finding out whether respondents understand questions consistently, have the information needed to answer the questions, and can use the response alternatives provided to describe their experiences or their opinions accurately. Many of the candidate questions failed to meet these standards. Cognitive testing only evaluates the way in which respondents understand and answer questions. Although it does not directly assess the validity of the answers, it is a reasonable premise that cognitive problems will seriously compromise validity and reliability. PMID:16316437
ERIC Educational Resources Information Center
Eleje, Lydia I.; Esomonu, Nkechi P. M.
2018-01-01
A Test to measure achievement in quantitative economics among secondary school students was developed and validated in this study. The test is made up 20 multiple choice test items constructed based on quantitative economics sub-skills. Six research questions guided the study. Preliminary validation was done by two experienced teachers in…
ERIC Educational Resources Information Center
Muiznieks, Viktors J.; Cox, John
The Computerized Test-Result Reporting System (CTRS), which consists of three programs written in the BASIC language, was developed to analyze obective tests, test items, test results, and to provide the teacher-user with interpreted data about the performance of tests, Lest items, and students. This paper documents the three programs from the…
Development of the PROMIS coping expectancies of smoking item banks.
Shadel, William G; Edelen, Maria Orlando; Tucker, Joan S; Stucky, Brian D; Hansen, Mark; Cai, Li
2014-09-01
Smoking is a coping strategy for many smokers who then have difficulty finding new ways to cope with negative affect when they quit. This paper describes analyses conducted to develop and evaluate item banks for assessing the coping expectancies of smoking for daily and nondaily smokers. Using data from a large sample of daily (N = 4,201) and nondaily (N = 1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning (DIF) analyses (according to gender, age, and ethnicity) to arrive at a unidimensional set of items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) for assessing coping expectancies of smoking. For both daily and nondaily smokers, the unidimensional Coping Expectancies item banks (21 items) are relatively DIF free and are highly reliable (0.96 and 0.97, respectively). A common 4-item SF for daily and nondaily smokers also showed good reliability (0.85). Adaptive tests required an average of 4.3 and 3.7 items for simulated daily and nondaily respondents, respectively, and achieved reliabilities of 0.91 for both when the maximum test length was 10 items. This research provides a new set of items that can be used to reliably assess coping expectancies of smoking, through a SF, CAT, or a tailored set selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Suen, Yi-Nam; Cerin, Ester; Mellecker, Robin R
2014-07-18
Parents' perceived informal social control, defined as the informal ways residents intervene to create a safe and orderly neighbourhood environment, may influence young children's physical activity (PA) in the neighbourhood. This study aimed to develop and test the reliability of a scale of PA-related informal social control relevant to Chinese parents/caregivers of pre-schoolers (children aged 3 to 5 years) living in Hong Kong. Nominal Group Technique (NGT), a structured, multi-step brainstorming technique, was conducted with two groups of caregivers (mainly parents; n = 11) of Hong Kong pre-schoolers in June 2011. Items collected in the NGT sessions and those generated by a panel of experts were used to compile a list of items (n = 22) for a preliminary version of a questionnaire of informal social control. The newly-developed scale was tested with 20 Chinese-speaking parents/caregivers using cognitive interviews (August 2011). The modified scale, including all 22 original items of which a few were slightly reworded, was subsequently administered on two occasions, a week apart, to 61 Chinese parents/caregivers of Hong Kong pre-schoolers in early 2012. The test-retest reliability and internal consistency of the items and scale were examined using intraclass correlation coefficients (ICC), paired t-tests, relative percentages of shifts in responses to items, and Cronbach's α coefficient. Thirteen items generated by parents/caregivers and nine items generated by the panel of experts (total 22 items) were included in a first working version of the scale and classified into three subscales: "Personal involvement and general informal supervision", "Civic engagement for the creation of a better neighbourhood environment" and "Educating and assisting neighbourhood children". Twenty out of 22 items showed moderate to excellent test-test reliability (ICC range: 0.40-0.81). All three subscales of informal social control showed acceptable levels of internal consistency (Cronbach's α >0.70). A reliable scale examining PA-related informal social control relevant to Chinese parents/caregivers of pre-schoolers living in Hong Kong was developed. Further studies should examine the factorial validity of the scale, its associations with Chinese children's PA and its appropriateness for other populations of parents of young children.
Shen, Linjun; Juul, Dorthea; Faulkner, Larry R
2016-01-01
The development of recertification programs (now referred to as Maintenance of Certification or MOC) by the members of the American Board of Medical Specialties provides the opportunity to study knowledge base across the professional lifespan of physicians. Research results to date are mixed with some studies finding negative associations between age and various measures of competency and others finding no or minimal relationships. Four groups of multiple choice test items that were independently developed for certification and MOC examinations in psychiatry and neurology were administered to certification and MOC examinees within each specialty. Percent correct scores were calculated for each examinee. Differences between certification and MOC examinees were compared using unpaired t tests, and logistic regression was used to compare MOC and certification examinee performance on the common test items. Except for the neurology certification test items that addressed basic neurology concepts, the performance of the certification and MOC examinees was similar. The differences in performance on individual test items did not consistently favor one group or the other and could not be attributed to any distinguishable content or format characteristics of those items. The findings of this study are encouraging in that physicians who had recently completed residency training possessed clinical knowledge that was comparable to that of experienced physicians, and the experienced physicians' clinical knowledge was equivalent to that of recent residency graduates. The role testing can play in enhancing expertise is described.
Automatic item generation implemented for measuring artistic judgment aptitude.
Bezruczko, Nikolaus
2014-01-01
Automatic item generation (AIG) is a broad class of methods that are being developed to address psychometric issues arising from internet and computer-based testing. In general, issues emphasize efficiency, validity, and diagnostic usefulness of large scale mental testing. Rapid prominence of AIG methods and their implicit perspective on mental testing is bringing painful scrutiny to many sacred psychometric assumptions. This report reviews basic AIG ideas, then presents conceptual foundations, image model development, and operational application to artistic judgment aptitude testing.
Item analysis of three Spanish naming tests: a cross-cultural investigation.
Marquez de la Plata, Carlos; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C Munro
2009-01-01
Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test's construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (136 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided.
A Methodology for Zumbo's Third Generation DIF Analyses and the Ecology of Item Responding
ERIC Educational Resources Information Center
Zumbo, Bruno D.; Liu, Yan; Wu, Amery D.; Shear, Benjamin R.; Olvera Astivia, Oscar L.; Ark, Tavinder K.
2015-01-01
Methods for detecting differential item functioning (DIF) and item bias are typically used in the process of item analysis when developing new measures; adapting existing measures for different populations, languages, or cultures; or more generally validating test score inferences. In 2007 in "Language Assessment Quarterly," Zumbo…
Item Vector Plots for the Multidimensional Three-Parameter Logistic Model
ERIC Educational Resources Information Center
Bryant, Damon; Davis, Larry
2011-01-01
This brief technical note describes how to construct item vector plots for dichotomously scored items fitting the multidimensional three-parameter logistic model (M3PLM). As multidimensional item response theory (MIRT) shows promise of being a very useful framework in the test development life cycle, graphical tools that facilitate understanding…
A Balance Sheet for Educational Item Banking.
ERIC Educational Resources Information Center
Hiscox, Michael D.
Educational item banking presents observers with a considerable paradox. The development of test items from scratch is viewed as wasteful, a luxury in times of declining resources. On the other hand, item banking has failed to become a mature technology despite large amounts of money and the efforts of talented professionals. The question of which…
Santos, Sandra; Viana, Fernanda Leopoldina; Ribeiro, Iolanda; Prieto, Gerardo; Brandão, Sara; Cadime, Irene
2015-03-03
This investigation aimed to develop and collect psychometric data for two tests assessing listening comprehension of Portuguese students in primary school: the Test of Listening Comprehension of Narrative Texts (TLC-n) and the Test of Listening Comprehension of Expository Texts (TLC-e). Two studies were conducted. The purpose of study 1 was to construct four test forms for each of the two tests to assess first, second, third and fourth grade students of the primary school. The TLC-n was administered to 1042 students, and the TLC-e was administered to 848 students. The purpose of study 2 was to test the psychometric properties of new items for the TLC-n form for fourth graders, given that the results in study 1 indicated a severe lack of difficult items. The participants were 260 fourth graders. The data were analysed using the Rasch model. Thirty items were selected for each test form. The results provided support for the model assumptions: Unidimensionality and local independence of the items. The reliability coefficients were higher than .70 for all test forms. The TLC-n and the TLC-e present good psychometric properties and represent an important contribution to the learning disabilities assessment field.
Language development and affecting factors in 3- to 6-year-old children.
Muluk, Nuray Bayar; Bayoğlu, Birgül; Anlar, Banu
2014-05-01
The aim of this study was to assess factors affecting language developmental screening test results in 33.0- to 75.0-month-old children. The study group consists of 402 children, 172 (42.8%) boys and 230 (57.2%) girls, aged 33.0-75.0 months who were examined in four age groups: 3 years (33.0-39.0 months), 4 years (45.0-51.0 months), 5 years (57.0-63.0 months) and 6 years (69.0-75.0 months). Demographic data and medical history obtained by a standard questionnaire and Denver II Developmental Test results were evaluated. Maternal factors such as mother's age, educational level, and socioeconomic status (SES) correlated with language items in all age groups. Linear regression analysis indicated a significant effect of mother's education and higher SES on certain expressive and receptive language items at 3 and 4 years. Fine motor items were closely related to language items at all ages examined, while in the younger (3- and 4-year-old) group gross motor items also were related to language development. Maternal and socioeconomic factors influence language development in children: these effects, already discernible with a screening test, can be potential targets for social and educational interventions. The interpretation of screening test results should take into account the interaction between fine motor and language development in preschool children.
Gillespie, Brigid M; Polit, Denise F; Hamlin, Lois; Chaboyer, Wendy
2012-01-01
This paper describes the development and validation of the Revised Perioperative Competence Scale (PPCS-R). There is a lack of a psychometrically tested sound self-assessment tools to measure nurses' perceived competence in the operating room. Content validity was established by a panel of international experts and the original 98-item scale was pilot tested with 345 nurses in Queensland, Australia. Following the removal of several items, a national sample that included all 3209 nurses who were members of the Australian College of Operating Room Nurses was surveyed using the 94-item version. Psychometric testing assessed content validity using exploratory factor analysis, internal consistency using Cronbach's alpha, and construct validity using the "known groups" technique. During item reduction, several preliminary factor analyses were performed on two random halves of the sample (n=550). Usable data for psychometric assessment were obtained from 1122 nurses. The original 94-item scale was reduced to 40 items. The final factor analysis using the entire sample resulted in a 40 item six-factor solution. Cronbach's alpha for the 40-item scale was .96. Construct validation demonstrated significant differences (p<.0001) in perceived competence scores relative to years of operating room experience and receipt of specialty education. On the basis of these results, the psychometric properties of the PPCS-R were considered encouraging. Further testing of the tool in different samples of operating room nurses is necessary to enable cross-cultural comparisons. Copyright © 2011 Elsevier Ltd. All rights reserved.
2014-01-01
Background Foot disease complications, such as foot ulcers and infection, contribute to considerable morbidity and mortality. These complications are typically precipitated by “high-risk factors”, such as peripheral neuropathy and peripheral arterial disease. High-risk factors are more prevalent in specific “at risk” populations such as diabetes, kidney disease and cardiovascular disease. To the best of the authors’ knowledge a tool capturing multiple high-risk factors and foot disease complications in multiple at risk populations has yet to be tested. This study aimed to develop and test the validity and reliability of a Queensland High Risk Foot Form (QHRFF) tool. Methods The study was conducted in two phases. Phase one developed a QHRFF using an existing diabetes foot disease tool, literature searches, stakeholder groups and expert panel. Phase two tested the QHRFF for validity and reliability. Four clinicians, representing different levels of expertise, were recruited to test validity and reliability. Three cohorts of patients were recruited; one tested criterion measure reliability (n = 32), another tested criterion validity and inter-rater reliability (n = 43), and another tested intra-rater reliability (n = 19). Validity was determined using sensitivity, specificity and positive predictive values (PPV). Reliability was determined using Kappa, weighted Kappa and intra-class correlation (ICC) statistics. Results A QHRFF tool containing 46 items across seven domains was developed. Criterion measure reliability of at least moderate categories of agreement (Kappa > 0.4; ICC > 0.75) was seen in 91% (29 of 32) tested items. Criterion validity of at least moderate categories (PPV > 0.7) was seen in 83% (60 of 72) tested items. Inter- and intra-rater reliability of at least moderate categories (Kappa > 0.4; ICC > 0.75) was seen in 88% (84 of 96) and 87% (20 of 23) tested items respectively. Conclusions The QHRFF had acceptable validity and reliability across the majority of items; particularly items identifying relevant co-morbidities, high-risk factors and foot disease complications. Recommendations have been made to improve or remove identified weaker items for future QHRFF versions. Overall, the QHRFF possesses suitable practicality, validity and reliability to assess and capture relevant foot disease items across multiple at risk populations. PMID:24468080
Assessment in Science Education
NASA Astrophysics Data System (ADS)
Rustaman, N. Y.
2017-09-01
An analyses study focusing on scientific reasoning literacy was conducted to strengthen the stressing on assessment in science by combining the important of the nature of science and assessment as references, higher order thinking and scientific skills in assessing science learning as well. Having background in developing science process skills test items, inquiry in its many form, scientific and STEM literacy, it is believed that inquiry based learning should first be implemented among science educators and science learners before STEM education can successfully be developed among science teachers, prospective teachers, and students at all levels. After studying thoroughly a number of science researchers through their works, a model of scientific reasoning was proposed, and also simple rubrics and some examples of the test items were introduced in this article. As it is only the beginning, further studies will still be needed in the future with the involvement of prospective science teachers who have interests in assessment, either on authentic assessment or in test items development. In balance usage of alternative assessment rubrics, as well as valid and reliable test items (standard) will be needed in accelerating STEM education in Indonesia.
A Review of Guidelines on Home Drug Testing Websites for Parents
Washio, Yukiko; Fairfax-Columbo, Jaymes; Ball, Emily; Cassey, Heather; Arria, Amelia M.; Bresani, Elena; Curtis, Brenda L.; Kirby, Kimberly C.
2014-01-01
Purpose To update and extend prior work reviewing websites that discuss home drug testing for parents and assess the quality of information that the websites provide to assist them to decide when and how to use home drug testing. Methods We conducted a world-wide web search that identified eight websites providing information for parents on home drug testing. We assessed the information on the sites using checklist developed with field experts in adolescent substance abuse and psychosocial interventions that focus on urine testing. Results None of the websites covered all of items on the 24-item checklist, and only three covered at least half of the items (12, 14, and 21 items, respectively). The five remaining websites covered less than half the checklist items. The mean number of items covered by the websites was 11. Conclusions Among the websites that we reviewed, few provided thorough information to parents regarding empirically-supported strategies to effectively use drug testing to intervene on adolescent substance use. Furthermore, most websites did not provide thorough information regarding the risks and benefits to inform parents’ decision to use home drug testing. Empirical evidence regarding efficacy, benefits, risks, and limitations of home drug testing is needed. PMID:25026103
Suen, Yi-Nam; Cerin, Ester; Barnett, Anthony; Huang, Wendy Y J; Mellecker, Robin R
2017-09-01
Valid instruments of parenting practices related to children's physical activity (PA) are essential to understand how parents affect preschoolers' PA. This study developed and validated a questionnaire of PA-related parenting practices for Chinese-speaking parents of preschoolers in Hong Kong. Parents (n = 394) completed a questionnaire developed using findings from formative qualitative research and literature searches. Test-retest reliability was determined on a subsample (n = 61). Factorial validity was assessed using confirmatory factor analysis. Subscale internal consistency was determined. The scale of parenting practices encouraging PA comprised 2 latent factors: Modeling, structure and participatory engagement in PA (23 items), and Provision of appropriate places for child's PA (4 items). The scale of parenting practices discouraging PA scale encompassed 4 latent factors: Safety concern/overprotection (6 items), Psychological/behavioral control (5 items), Promoting inactivity (4 items), and Promoting screen time (2 items). Test-retest reliabilities were moderate to excellent (0.58 to 0.82), and internal subscale reliabilities were acceptable (0.63 to 0.89). We developed a theory-based questionnaire for assessing PA-related parenting practices among Chinese-speaking parents of Hong Kong preschoolers. While some items were context and culture specific, many were similar to those previously found in other populations, indicating a degree of construct generalizability across cultures.
ERIC Educational Resources Information Center
Makransky, Guido; Dale, Philip S.; Havmose, Philip; Bleses, Dorthe
2016-01-01
Purpose: This study investigated the feasibility and potential validity of an item response theory (IRT)-based computerized adaptive testing (CAT) version of the MacArthur-Bates Communicative Development Inventory: Words & Sentences (CDI:WS; Fenson et al., 2007) vocabulary checklist, with the objective of reducing length while maintaining…
The Development of More Efficient Measures for Evaluating Language Impairments in Aphasic Patients.
ERIC Educational Resources Information Center
Phillips, Phyllis P.; Halpin, Gerald
Because it generally took over an hour to administer the Porch Index of Communicative Ability (PICA), a shorter but comparable version of the test was developed. The original test was designed to quantify aphasic patients' ability level on common communicative tasks and consisted of 18 ten-item subtests. Each item resulted in a proficiency rating,…
Herschbach, Peter; Berg, Petra; Dankert, Andrea; Duran, Gabriele; Engst-Hastreiter, Ursula; Waadt, Sabine; Keller, Monika; Ukat, Robert; Henrich, Gerhard
2005-06-01
The aim of this study was the development and psychometric testing of a new psychological questionnaire to measure the fear of progression (FoP) in chronically ill patients (cancer, diabetes mellitus and rheumatic diseases). The Fear of Progression Questionnaire (FoP-Q) was developed in four phases: (1) generation of items (65 interviews); (2) reduction of items--the initial version of the questionnaire (87 items) was presented to 411 patients, to construct subscales and test the reliability; (3) testing the convergent and discriminative validity of the reduced test version (43 items) within a new sample (n=439); (4) translation--German to English. The scale comprised five factors (Cronbach's alpha >.70): affective reactions (13 items), partnership/family (7), occupation (7), loss of autonomy (7) and coping with anxiety (9). The test-retest reliability coefficients varied between .77 and .94. There was only a medium relationship to traditional anxiety scales. This is an indication of the independence of the FoP. Significant relationships between the FoP-Q and the patient's illness behaviour indicate discriminative validity. The FoP-Q is a new and unique questionnaire developed for the chronically ill. A major problem and source of stress for this patient group has been measuring both specifically and economically the FoP of an illness. The FoP-Q was designed to resolve this problem, fulfill this need and reduce this stress.
DEVELOPMENT OF A HIGH ALTITUDE LOW OPENING HUMANITARIAN AIRDROP SYSTEM
2017-07-12
4 2.3 Aid Item Testing ___________________________________________________________ 6 2.3.1 12 October 2010...2.3.3 USAARL Aid Item Safety Evaluation _________________________________________________ 10 2.3.4 USAARL Accelerated Impact Test ...16 3.1.3 System Testing __________________________________________________________________ 23 3.2 Sling Load System
Science Library of Test Items. Volume Four: Practical Testing Guide.
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test items collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, the guide gives a wide range of questions and activities for the manipulation of scientific equipment to allow assessment of students' practical laboratory skills. Instructions are given to make norm-referenced or…
PE Metrics: Background, Testing Theory, and Methods
ERIC Educational Resources Information Center
Zhu, Weimo; Rink, Judy; Placek, Judith H.; Graber, Kim C.; Fox, Connie; Fisette, Jennifer L.; Dyson, Ben; Park, Youngsik; Avery, Marybell; Franck, Marian; Raynes, De
2011-01-01
New testing theories, concepts, and psychometric methods (e.g., item response theory, test equating, and item bank) developed during the past several decades have many advantages over previous theories and methods. In spite of their introduction to the field, they have not been fully accepted by physical educators. Further, the manner in which…
Criterion-Referenced Test (CRT) Items for Building Trades.
ERIC Educational Resources Information Center
Davis, Diane, Ed.
This test item bank is intended to help instructors construct criterion-referenced tests for secondary-level courses in building trades. The bank is keyed to the Missouri Building Trades Competency Profile, which was developed by industry and education professionals in Missouri, and is designed to be used in conjunction with the Vocational…
Martinková, Patrícia; Drabinová, Adéla; Liaw, Yuan-Ling; Sanders, Elizabeth A.; McFarland, Jenny L.; Price, Rebecca M.
2017-01-01
We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments. After explaining a number of methodological approaches, we test for gender bias in two scenarios that demonstrate why DIF analysis is crucial for developing assessments, particularly because simply comparing two groups’ total scores can lead to incorrect conclusions about test fairness. First, a significant difference between groups on total scores can exist even when items are not biased, as we illustrate with data collected during the validation of the Homeostasis Concept Inventory. Second, item bias can exist even when the two groups have exactly the same distribution of total scores, as we illustrate with a simulated data set. We also present a brief overview of how DIF analysis has been used in the biology education literature to illustrate the way DIF items need to be reevaluated by content experts to determine whether they should be revised or removed from the assessment. Finally, we conclude by arguing that DIF analysis should be used routinely to evaluate items in developing conceptual assessments. These steps will ensure more equitable—and therefore more valid—scores from conceptual assessments. PMID:28572182
ERIC Educational Resources Information Center
Eignor, Daniel R.; Douglass, James B.
This paper attempts to provide some initial information about the use of a variety of item response theory (IRT) models in the item selection process; its purpose is to compare the information curves derived from the selection of items characterized by several different IRT models and their associated parameter estimation programs. These…
Doralp, Samantha; Bartlett, Doreen J
2013-09-01
The development and testing of a measure evaluating the quality and variability in the home environment as it relates to the motor development of infants during the first year of life. A sample of 112 boys and 95 girls with a mean age of 7.1 months (SD 1.8) and GA of 39.6 weeks (SD 1.5) participated in the study. The measurement development process was divided into three phases: measurement development (item generation or selection of items from existing measurement tools), pilot testing to determine acceptability and feasibility to parents, and exploratory factor analysis to organize items into meaningful concepts. Test-retest reliability and internal consistency were also determined. The environmental opportunities questionnaire (EOQ) is a feasible 21-item measure comprised of three factors including opportunities in the play space, sensory variety and parental encouragement. Overall, test-retest reliability was 0.92 (CI 0.84-0.96) and the internal consistency is 0.79. The EOQ emphasizes quality of the environment and access to equipment and toys that have the potential to facilitate early motor development. The preliminary analyses reported here suggest more work could be done on the EOQ to strengthen its use for research or clinical purposes; however, it is adequate for use in its current form. Implications for Rehabilitation New and feasible 21-item questionnaire that enables identification of malleable environmental factors that serve as potential points of intervention for children that are not developing typically. Therapeutic tool for use by therapists to inform and guide discussions with caregivers about potential influences of environmental, social and attitudinal factors in their child's early development.
Burkey, Matthew D.; Ghimire, Lajina; Adhikari, Ramesh P.; Kohrt, Brandon A.; Jordans, Mark J. D.; Haroz, Emily; Wissow, Lawrence
2017-01-01
Systematic processes are needed to develop valid measurement instruments for disruptive behavior disorders (DBDs) in cross-cultural settings. We employed a four-step process in Nepal to identify and select items for a culturally valid assessment instrument: 1) We extracted items from validated scales and local free-list interviews. 2) Parents, teachers, and peers (n=30) rated the perceived relevance and importance of behavior problems. 3) Highly rated items were piloted with children (n=60) in Nepal. 4) We evaluated internal consistency of the final scale. We identified 49 symptoms from 11 scales, and 39 behavior problems from free-list interviews (n=72). After dropping items for low ratings of relevance and severity and for poor item-test correlation, low frequency, and/or poor acceptability in pilot testing, 16 items remained for the Disruptive Behavior International Scale—Nepali version (DBIS-N). The final scale had good internal consistency (α=0.86). A 4-step systematic approach to scale development including local participation yielded an internally consistent scale that included culturally relevant behavior problems. PMID:28093575
Study deviance-type scale in the development of Korean elder.
Cho, Gun-Sang; Yi, Eun-Surk; Hwang, Hee-Jeong
2015-12-01
This research aims to develop a questionnaire of deviant behavior for the Korean elderly people which may make a big contribution to the examination of deviance behavior of the elderly people and may play an important role in providing a methodological basis. In order to accomplish the purpose of the this study, there were three different stages; (a) making preliminary question items, (b) refining the items of the scale through a plot study, and (c) finalizing question items by a main survey. In the first stage, 43 question items were developed using the open-ended questionnaire and structural inquiry of succession from 137 elderly people who are over 65 yr. In the second phase, based on data collected by the 200 elderly people pilot testing was performed through exploratory factor analysis and reliability test. The scale is a 27-item self-report questionnaire. In the main survey conducted by 184 elderly people, 21 items, which consisted of four subfactors, were finalized in order to measure deviance behaviors of the Korean elderly people: social deviance (n=8), economic deviance (n=5), psychological deviance (n=5), and physical deviance (n=3).
Item Bank Development for a Revised Pediatric Evaluation of Disability Inventory (PEDI)
ERIC Educational Resources Information Center
Dumas, Helene; Fragala-Pinkham, Maria; Haley, Stephen; Coster, Wendy; Kramer, Jessica; Kao, Ying-Chia; Moed, Richard
2010-01-01
The Pediatric Evaluation of Disability Inventory (PEDI) is a useful clinical and research assessment, but it has limitations in content, age range, and efficiency. The purpose of this article is to describe the development of the item bank for a new computer adaptive testing version of the PEDI (PEDI-CAT). An expanded item set and response options…
Test blueprints for psychiatry residency in-training written examinations in Riyadh, Saudi Arabia
Gaffas, Eisha M; Sequeira, Reginald P; Namla, Riyadh A Al; Al-Harbi, Khalid S
2012-01-01
Background The postgraduate training program in psychiatry in Saudi Arabia, which was established in 1997, is a 4-year residency program. Written exams comprising of multiple choice questions (MCQs) are used as a summative assessment of residents in order to determine their eligibility for promotion from one year to the next. Test blueprints are not used in preparing examinations. Objective To develop test blueprints for the written examinations used in the psychiatry residency program. Methods Based on the guidelines of four professional bodies, documentary analysis was used to develop global and detailed test blueprints for each year of the residency program. An expert panel participated during piloting and final modification of the test blueprints. Their opinion about the content, weightage for each content domain, and proportion of test items to be sampled in each cognitive category as defined by modified Bloom’s taxonomy were elicited. Results Eight global and detailed test blueprints, two for each year of the psychiatry residency program, were developed. The global test blueprints were reviewed by experts and piloted. Six experts participated in the final modification of test blueprints. Based on expert consensus, the content, total weightage for each content domain, and proportion of test items to be included in each cognitive category were determined for each global test blueprint. Experts also suggested progressively decreasing the weightage for recall test items and increasing problem solving test items in examinations, from year 1 to year 4 of the psychiatry residence program. Conclusion A systematic approach using a documentary and content analysis technique was used to develop test blueprints with additional input from an expert panel as appropriate. Test blueprinting is an important step to ensure the test validity in all residency programs. PMID:23762000
Test blueprints for psychiatry residency in-training written examinations in Riyadh, Saudi Arabia.
Gaffas, Eisha M; Sequeira, Reginald P; Namla, Riyadh A Al; Al-Harbi, Khalid S
2012-01-01
The postgraduate training program in psychiatry in Saudi Arabia, which was established in 1997, is a 4-year residency program. Written exams comprising of multiple choice questions (MCQs) are used as a summative assessment of residents in order to determine their eligibility for promotion from one year to the next. Test blueprints are not used in preparing examinations. To develop test blueprints for the written examinations used in the psychiatry residency program. Based on the guidelines of four professional bodies, documentary analysis was used to develop global and detailed test blueprints for each year of the residency program. An expert panel participated during piloting and final modification of the test blueprints. Their opinion about the content, weightage for each content domain, and proportion of test items to be sampled in each cognitive category as defined by modified Bloom's taxonomy were elicited. Eight global and detailed test blueprints, two for each year of the psychiatry residency program, were developed. The global test blueprints were reviewed by experts and piloted. Six experts participated in the final modification of test blueprints. Based on expert consensus, the content, total weightage for each content domain, and proportion of test items to be included in each cognitive category were determined for each global test blueprint. Experts also suggested progressively decreasing the weightage for recall test items and increasing problem solving test items in examinations, from year 1 to year 4 of the psychiatry residence program. A systematic approach using a documentary and content analysis technique was used to develop test blueprints with additional input from an expert panel as appropriate. Test blueprinting is an important step to ensure the test validity in all residency programs.
ERIC Educational Resources Information Center
Ramirez, Arnulfo G.; Politzer, Robert L.
A revised Spanish/English oral-proficiency test battery was administered to 40 Spanish-surnamed pupils equally divided by sex at grade levels 1, 3, 5, and 7. The test battery included parallel Spanish and English versions of: (1) a 12-item vocabulary pretest, (2) a 32-item vocabulary-by-domain test consisting of four sections--home, neighborhood,…
Jeong, Eunju; Lesiuk, Teresa L
2011-01-01
Impairments in attention are commonly seen in individuals with traumatic brain injury (TBI). While visual attention assessment measurements have been rigorously developed and frequently used in cognitive neurorehabilitation, there is a paucity of auditory attention assessment measurements for patients with TBI. The purpose of this study was to field test a researcher-developed Music-based Attention Assessment (MAA), a melodic contour identification test designed to assess three different types of attention (i.e., sustained attention, selective attention, and divided attention), for patients with TBI. Additionally, this study aimed to evaluate the readability and comprehensibility of the test items and to examine the preliminary psychometric properties of the scale and test items. Fifteen patients diagnosed with TBI completed 3 different series of tasks in which they were required to identify melodic contours. The resulting data showed that (a) test items in each of the 3 subtests were found to have an easy to moderate level of item difficulty and an acceptable to high level of item discrimination, and (b) the musical characteristics (i.e., contour, congruence, and pitch interference) were found to be associated with the level of item difficulty, and (c) the internal consistency of the MAA as computed by Cronbach's alpha was .95. Subsequent studies using a larger sample of typical participants, along with individuals with TBI, are needed to confirm construct validity and internal consistency of the MAA. In addition, the authors recommend examination of criterion validity of the MAA as correlated with current neuropsychological attention assessment measurements.
Item response theory - A first approach
NASA Astrophysics Data System (ADS)
Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar
2017-07-01
The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.
INTRODUCTION TO PATIENT-REPORTED OUTCOME ITEM BANKS: ISSUES IN MINORITY AGING RESEARCH
Templin, Thomas N; Hays, Ron D; Gershon, Richard C; Rothrock, Nan; Jones, Richard N; Teresi, Jeanne A; Stewart, Anita; Weech-Maldonado, Robert; Wallace, Steve
2014-01-01
In 2004 NIH awarded contracts to initiate the development of high quality psychological and neuropsychological outcome measures for improved assessment of health-related outcomes. The workshop introduced these measurement development initiatives, the measures created, and the NIH supported resource (Assessment Center) for internet or tablet-based test administration and scoring. Presentation covered: (a) item response theory (IRT) and assessment of test bias, (b) construction of item banks and computerized adaptive testing, and (c) the different ways in which qualitative analyses contribute to the definition of construct domains and the refinement of outcome constructs. The panel discussion included questions about representativeness of samples, and assessment of cultural bias. PMID:23570428
ERIC Educational Resources Information Center
Cheek, Jimmy G.; McGhee, Max B.
The central purpose of this study was to develop and field test written criterion-referenced tests for the ornamental horticulture component of applied principles of agribusiness and natural resources occupations programs. The test items were to be used by secondary agricultural education students in Florida. Based upon the objectives identified…
Development of the Attitudes to Domestic Violence Questionnaire for Children and Adolescents.
Fox, Claire L; Gadd, David; Sim, Julius
2015-09-01
To provide a more robust assessment of the effectiveness of a domestic abuse prevention education program, a questionnaire was developed to measure children's attitudes to domestic violence. The aim was to develop a short questionnaire that would be easy to use for practitioners but, at the same time, sensitive enough to pick up on subtle changes in young people's attitudes. We therefore chose to ask children about different situations in which they might be willing to condone domestic violence. In Study 1, we tested a set of 20 items, which we reduced by half to a set of 10 items. The factor structure of the scale was explored and its internal consistency was calculated. In Study 2, we tested the factor structure of the 10-item Attitudes to Domestic Violence (ADV) Scale in a separate calibration sample. Finally, in Study 3, we then assessed the test-retest reliability of the 10-item scale. The ADV Questionnaire is a promising tool to evaluate the effectiveness of domestic abuse education prevention programs. However, further development work is necessary. © The Author(s) 2014.
The development of the Pictorial Thai Quality of Life.
Phattharayuttawat, Sucheera; Ngamthipwatthana, Thienchai; Pitiyawaranun, Buncha
2005-11-01
"Quality of life" has become a main focus of interest in medicine. The Pictorial Thai Quality of Life (PTQL) was developed in order to measure the Thai mental illness both in a clinical setting and community. The purpose of this study was to develop the Pictorial Thai Quality of Life (PTQL), having adequate and sufficient construct validity, discriminant power, concurrent validity, and reliability. To develop the Pictorial Thai Quality of Life Test, two samples groups were used in the present study: (1) pilot study samples: 30 samples and (2) survey samples were 672 samples consisting of normal, and psychiatric patients. The developing tests items were collected from a review of the literature in which all the items were based on the WHO definition of Quality of Life. Then, experts judgment by the Delphi technique was used in the first stage. After that a pilot study was used to evaluate the testing administration, and wording of the tests items. The final stage was collected data from the survey samples. The results of the present study showed that the final test was composed 25 items. The construct validity of this test consists of six domains: Physical, Cognitive, Affective, Social Function, Economic and Self-Esteem. All the PTQL items have sufficient discriminant power It was found to be statistically significant different at the. 001 level between those people with mental disorders and normal people. There was a high level of concurrent validity association with WHOQOL-BREF, Pearson correlation coefficient and Area under ROC curve were 0.92 and 0.97 respectively. The reliability coefficients for the Alpha coefficients of the PTQL total test was 0.88. The values of the six scales were from 0.81 to 0:91. The present study was directed at developing an effective psychometric properties pictorial quality of life questionnaire. The result will be a more direct and meaningful application of an instrument to detect the mental health illness poor quality of life in Thai communities.
Akl, Elie A; Fadlallah, Racha; Ghandour, Lilian; Kdouh, Ola; Langlois, Etienne; Lavis, John N; Schünemann, Holger; El-Jardali, Fadi
2017-09-04
Groups or institutions funding or conducting systematic reviews in health policy and systems research (HPSR) should prioritise topics according to the needs of policymakers and stakeholders. The aim of this study was to develop and validate a tool to prioritise questions for systematic reviews in HPSR. We developed the tool following a four-step approach consisting of (1) the definition of the purpose and scope of tool, (2) item generation and reduction, (3) testing for content and face validity, (4) and pilot testing of the tool. The research team involved international experts in HPSR, systematic review methodology and tool development, led by the Center for Systematic Reviews on Health Policy and Systems Research (SPARK). We followed an inclusive approach in determining the final selection of items to allow customisation to the user's needs. The purpose of the SPARK tool was to prioritise questions in HPSR in order to address them in systematic reviews. In the item generation and reduction phase, an extensive literature search yielded 40 relevant articles, which were reviewed by the research team to create a preliminary list of 19 candidate items for inclusion in the tool. As part of testing for content and face validity, input from international experts led to the refining, changing, merging and addition of new items, and to organisation of the tool into two modules. Following pilot testing, we finalised the tool, with 22 items organised in two modules - the first module including 13 items to be rated by policymakers and stakeholders, and the second including 9 items to be rated by systematic review teams. Users can customise the tool to their needs, by omitting items that may not be applicable to their settings. We also developed a user manual that provides guidance on how to use the SPARK tool, along with signaling questions. We have developed and conducted initial validation of the SPARK tool to prioritise questions for systematic reviews in HPSR, along with a user manual. By aligning systematic review production to policy priorities, the tool will help support evidence-informed policymaking and reduce research waste. We invite others to contribute with additional real-life implementation of the tool.
Buchan, Jena; Janda, Monika; Box, Robyn; Rogers, Laura; Hayes, Sandi
2015-03-18
No tool exists to measure self-efficacy for overcoming lymphedema-related exercise barriers in individuals with cancer-related lymphedema. However, an existing scale measures confidence to overcome general exercise barriers in cancer survivors. Therefore, the purpose of this study was to develop, validate and assess the reliability of a subscale, to be used in conjunction with the general barriers scale, for determining exercise barriers self-efficacy in individuals facing lymphedema-related exercise barriers. A lymphedema-specific exercise barriers self-efficacy subscale was developed and validated using a cohort of 106 cancer survivors with cancer-related lymphedema, from Brisbane, Australia. An initial ten-item lymphedema-specific barrier subscale was developed and tested, with participant feedback and principal components analysis results used to guide development of the final version. Validity and test-retest reliability analyses were conducted on the final subscale. The final lymphedema-specific subscale contained five items. Principal components analysis revealed these items loaded highly (>0.75) on a separate factor when tested with a well-established nine-item general barriers scale. The final five-item subscale demonstrated good construct and criterion validity, high internal consistency (Cronbach's alpha = 0.93) and test-retest reliability (ICC = 0.67, p < 0.01). A valid and reliable lymphedema-specific subscale has been developed to assess exercise barriers self-efficacy in individuals with cancer-related lymphedema. This scale can be used in conjunction with an existing general exercise barriers scale to enhance exercise adherence in this understudied patient group.
Salsman, John M; Victorson, David; Choi, Seung W; Peterman, Amy H; Heinemann, Allen W; Nowinski, Cindy; Cella, David
2013-11-01
To develop and validate an item-response theory-based patient-reported outcomes assessment tool of positive affect and well-being (PAW). This is part of a larger NINDS-funded study to develop a health-related quality of life measurement system across major neurological disorders, called Neuro-QOL. Informed by a literature review and qualitative input from clinicians and patients, item pools were created to assess PAW concepts. Items were administered to a general population sample (N = 513) and a group of individuals with a variety of neurologic conditions (N = 581) for calibration and validation purposes, respectively. A 23-item calibrated bank and a 9-item short form of PAW was developed, reflecting components of positive affect, life satisfaction, or an overall sense of purpose and meaning. The Neuro-QOL PAW measure demonstrated sufficient unidimensionality and displayed good internal consistency, test-retest reliability, model fit, convergent and discriminant validity, and responsiveness. The Neuro-QOL PAW measure was designed to aid clinicians and researchers to better evaluate and understand the potential role of positive health processes for individuals with chronic neurological conditions. Further psychometric testing within and between neurological conditions, as well as testing in non-neurologic chronic diseases, will help evaluate the generalizability of this new tool.
Development of cultural belief scales for mammography screening.
Russell, Kathleen M; Champion, Victoria L; Perkins, Susan M
2003-01-01
To develop instruments to measure culturally related variables that may influence mammography screening behaviors in African American women. Instrumentation methodology. Community organizations and public housing in the Indianapolis, IN, area. 111 African American women with a mean age of 60.2 years and 64 Caucasian women with a mean age of 60 years. After item development, scales were administered. Data were analyzed by factor analysis, item analysis via internal consistency reliability using Cronbach's alpha, and independent t tests and logistic regression analysis to test theoretical relationships. Personal space preferences, health temporal orientation, and perceived personal control. Space items were factored into interpersonal and physical scales. Temporal orientation items were loaded on one factor, creating a one-dimensional scale. Control items were factored into internal and external control scales. Cronbach's alpha coefficients for the scales ranged from 0.76-0.88. Interpersonal space preference, health temporal orientation, and perceived internal control scales each were predictive of mammography screening adherence. The three tested scales were reliable and valid. Scales, on average, did not differ between African American and Caucasian populations. These scales may be useful in future investigations aimed at increasing mammography screening in African American and Caucasian women.
Medeiros, Lydia C; Hillers, Virginia N; Chen, Gang; Bergmann, Verna; Kendall, Patricia; Schroeder, Mary
2004-11-01
The objective of this study was to design and develop food safety knowledge and attitude scales based on food-handling guidelines developed by a national panel of food safety experts. Knowledge (n=43) and attitude (n=49) questions were developed and pilot-tested with a variety of consumer groups. Final questions were selected based on item analysis and on validity and reliability statistical tests. Knowledge questions were tested in Washington State with participants in low-income nutrition education programs (pretest/posttest n=58, test/retest n=19) and college students (pretest/posttest n=34). Attitude questions were tested in Ohio with nutrition education program participants (n=30) and college students (non-nutrition majors n=138, nutrition majors n=57). Item analysis, paired sample t tests, Pearson's correlation coefficients, and Cronbach's alpha were used. Reliability and validity tests of individual items and the question sets were used to reduce the scales to 18 knowledge questions and 10 attitude questions. The knowledge and attitude scales covered topics ranked as important by a national panel of experts and met most validity and reliability standards. The 18-item knowledge questionnaire had instructional sensitivity (mean score increase of more than three points after instruction), internal reliability (Cronbach's alpha >.75), and produced similar results in test-retest without intervention (coefficient of stability=.81). Knowledge of correct procedures for hand washing and avoiding cross-contamination was widespread before instruction. Knowledge was limited regarding avoiding food preparation while ill, cooking hamburgers, high-risk foods, and whether cooked rice and potatoes could be stored at room temperature. The 10-item attitude scale had an appropriate range of responses (item difficulty) and produced similar results in test-retest ( P =.01). Internal consistency ranged from alpha=.63 to .89. Students anticipating a career where food safety is valued had higher attitude scale scores than participants of extension education programs. Uses for the knowledge questionnaire include assessment of subject matter knowledge before instruction and knowledge gain after instruction. The attitude scale assesses an outcome variable that may predict food safety behavior.
Automated Item Generation with Recurrent Neural Networks.
von Davier, Matthias
2018-03-12
Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven's progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation.
Pilkonis, Paul A; Yu, Lan; Dodds, Nathan E; Johnston, Kelly L; Lawrence, Suzanne M; Hilton, Thomas F; Daley, Dennis C; Patkar, Ashwin A; McCarty, Dennis
2017-08-01
There is a need to monitor patients receiving prescription opioids to detect possible signs of abuse. To address this need, we developed and calibrated an item bank for severity of abuse of prescription pain medication as part of the Patient-Reported Outcomes Measurement Information System (PROMIS ® ). Comprehensive literature searches yielded an initial bank of 5,310 items relevant to substance use and abuse, including abuse of prescription pain medication, from over 80 unique instruments. After qualitative item analysis (i.e., focus groups, cognitive interviewing, expert review, and item revision), 25 items for abuse of prescribed pain medication were included in field testing. Items were written in a first-person, past-tense format, with a three-month time frame and five response options reflecting frequency or severity. The calibration sample included 448 respondents, 367 from the general population (ascertained through an internet panel) and 81 from community treatment programs participating in the National Drug Abuse Treatment Clinical Trials Network. A final bank of 22 items was calibrated using the two-parameter graded response model from item response theory. A seven-item static short form was also developed. The test information curve showed that the PROMIS ® item bank for abuse of prescription pain medication provided substantial information in a broad range of severity. The initial psychometric characteristics of the item bank support its use as a computerized adaptive test or short form, with either version providing a brief, precise, and efficient measure relevant to both clinical and community samples. © 2016 American Academy of Pain Medicine. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
DiFilippo, Kristen Nicole; Huang, Wenhao; Chapman-Novakofski, Karen M
2017-10-27
The extensive availability and increasing use of mobile apps for nutrition-based health interventions makes evaluation of the quality of these apps crucial for integration of apps into nutritional counseling. The goal of this research was the development, validation, and reliability testing of the app quality evaluation (AQEL) tool, an instrument for evaluating apps' educational quality and technical functionality. Items for evaluating app quality were adapted from website evaluations, with additional items added to evaluate the specific characteristics of apps, resulting in 79 initial items. Expert panels of nutrition and technology professionals and app users reviewed items for face and content validation. After recommended revisions, nutrition experts completed a second AQEL review to ensure clarity. On the basis of 150 sets of responses using the revised AQEL, principal component analysis was completed, reducing AQEL into 5 factors that underwent reliability testing, including internal consistency, split-half reliability, test-retest reliability, and interrater reliability (IRR). Two additional modifiable constructs for evaluating apps based on the age and needs of the target audience as selected by the evaluator were also tested for construct reliability. IRR testing using intraclass correlations (ICC) with all 7 constructs was conducted, with 15 dietitians evaluating one app. Development and validation resulted in the 51-item AQEL. These were reduced to 25 items in 5 factors after principal component analysis, plus 9 modifiable items in two constructs that were not included in principal component analysis. Internal consistency and split-half reliability of the following constructs derived from principal components analysis was good (Cronbach alpha >.80, Spearman-Brown coefficient >.80): behavior change potential, support of knowledge acquisition, app function, and skill development. App purpose split half-reliability was .65. Test-retest reliability showed no significant change over time (P>.05) for all but skill development (P=.001). Construct reliability was good for items assessing age appropriateness of apps for children, teens, and a general audience. In addition, construct reliability was acceptable for assessing app appropriateness for various target audiences (Cronbach alpha >.70). For the 5 main factors, ICC (1,k) was >.80, with a P value of <.05. When 15 nutrition professionals evaluated one app, ICC (2,15) was .98, with a P value of <.001 for all 7 constructs when the modifiable items were specified for adults seeking weight loss support. Our preliminary effort shows that AQEL is a valid, reliable instrument for evaluating nutrition apps' qualities for clinical interventions by nutrition clinicians, educators, and researchers. Further efforts in validating AQEL in various contexts are needed. ©Kristen Nicole DiFilippo, Wenhao Huang, Karen M. Chapman-Novakofski. Originally published in JMIR Mhealth and Uhealth (http://mhealth.jmir.org), 27.10.2017.
Launch Deployment Assembly Extravehicular Activity Neutral Buoyancy Development Test Report
NASA Technical Reports Server (NTRS)
Loughead, T.
1996-01-01
This test evaluated the Launch Deployment Assembly (LDA) design for Extravehicular Activity (EVA) work sites (setup, igress, egress), reach and visual access, and translation required for cargo item removal. As part of the LDA design, this document describes the method and results of the LDA EVA Neutral Buoyancy Development Test to ensure that the LDA hardware support the deployment of the cargo items from the pallet. This document includes the test objectives, flight and mockup hardware description, descriptions of procedures and data collection used in the testing, and the results of the development test at the National Aeronautics and Space Administrations (NASA) Marshall Space Flight Center (MSFC) Neutral Buoyancy Simulator (NBS).
ERIC Educational Resources Information Center
Chen, Pei-Hua; Chang, Hua-Hua; Wu, Haiyan
2012-01-01
Two sampling-and-classification-based procedures were developed for automated test assembly: the Cell Only and the Cell and Cube methods. A simulation study based on a 540-item bank was conducted to compare the performance of the procedures with the performance of a mixed-integer programming (MIP) method for assembling multiple parallel test…
Cho, Sun-Joo; Athay, Michele; Preacher, Kristopher J
2013-05-01
Even though many educational and psychological tests are known to be multidimensional, little research has been done to address how to measure individual differences in change within an item response theory framework. In this paper, we suggest a generalized explanatory longitudinal item response model to measure individual differences in change. New longitudinal models for multidimensional tests and existing models for unidimensional tests are presented within this framework and implemented with software developed for generalized linear models. In addition to the measurement of change, the longitudinal models we present can also be used to explain individual differences in change scores for person groups (e.g., learning disabled students versus non-learning disabled students) and to model differences in item difficulties across item groups (e.g., number operation, measurement, and representation item groups in a mathematics test). An empirical example illustrates the use of the various models for measuring individual differences in change when there are person groups and multiple skill domains which lead to multidimensionality at a time point. © 2012 The British Psychological Society.
DeGeest, David Scott; Schmidt, Frank
2015-01-01
Our objective was to apply the rigorous test developed by Browne (1992) to determine whether the circumplex model fits Big Five personality data. This test has yet to be applied to personality data. Another objective was to determine whether blended items explained correlations among the Big Five traits. We used two working adult samples, the Eugene-Springfield Community Sample and the Professional Worker Career Experience Survey. Fit to the circumplex was tested via Browne's (1992) procedure. Circumplexes were graphed to identify items with loadings on multiple traits (blended items), and to determine whether removing these items changed five-factor model (FFM) trait intercorrelations. In both samples, the circumplex structure fit the FFM traits well. Each sample had items with dual-factor loadings (8 items in the first sample, 21 in the second). Removing blended items had little effect on construct-level intercorrelations among FFM traits. We conclude that rigorous tests show that the fit of personality data to the circumplex model is good. This finding means the circumplex model is competitive with the factor model in understanding the organization of personality traits. The circumplex structure also provides a theoretically and empirically sound rationale for evaluating intercorrelations among FFM traits. Even after eliminating blended items, FFM personality traits remained correlated.
[A test to measure the degree of knowledge on food and nutrition at the onset of elementary school].
Ivanovic Marincovich, D; Castro Gómez, C G; Ivanovic Marincovich, R
1997-06-01
The objective of this work was to design a test to measure the degree of knowledge on food and nutrition in school-age children from elementary first and second grades. A graphic instrument was designed according to the psychological child development and was based on the specific objectives pursued by the curriculum programs of the Ministry of Education. The test was developed around the following topics through 15 items: Area 1: Basic Concepts on Food and Nutrition (9 items) and Area 2: Food, Personal and Environmental Hygiene (9 items). The test was pilot tested on 103 school-age children of both grades (1:1), of both sexes (1:1), belonging to Peñalolén and Las Condes counties from Chile's Metropolitan Region and from high and low socioeconomic status (SES) (1:1), measured through the Graffar's Modified Method. The final version of the test was applied in a representative sample of 1.482 school-age children from Chile's Metropolitan Region from elementary first and second grades during 1986-1987. Content validity was assured by a team of judges and by the curriculum programs. Reliability was assessed by the Spearman correlation with the Spearman-Brown correction. Item-test consistency was determined by the Pearson correlation coefficient. Data were processed by the statistical analysis system (SAS) package. Results showed that reliability coefficient was 0.84 and item-test consistency was equal or above 0.25 in all items. It can be concluded that this test can be useful to determine the degree of knowledge on food and nutrition at the onset of elementary school, both in Chile and in other countries.
ERIC Educational Resources Information Center
Hopley, Ken; And Others
The first of several planned volumes of Free Response Test Items contains geology questions developed by the Assessment and Evaluation Unit of the New South Wales Department of Education. Two additional geology volumes and biology and chemistry volumes are in preparation. The questions in this volume were written and reviewed by practicing…
Home Economics. Sample Test Items. Levels I and II.
ERIC Educational Resources Information Center
New York State Education Dept., Albany. Bureau of Elementary and Secondary Educational Testing.
A sample of behavioral objectives and related test items that could be developed for content modules in Home Economics levels I and II, this book is intended to enable teachers to construct more valid and reliable test materials. Forty-eight one-page modules are presented, and opposite each module are listed two to seven specific behavioral…
ERIC Educational Resources Information Center
Thomas, Ally
2016-01-01
With the advent of the newly developed Common Core State Standards and the Next Generation Science Standards, innovative assessments, including technology-enhanced items and tasks, will be needed to meet the challenges of developing valid and reliable assessments in a world of computer-based testing. In a recent critique of the next generation…
An Alternate Definition of the ETS Delta Scale of Item Difficulty. Program Statistics Research.
ERIC Educational Resources Information Center
Holland, Paul W.; Thayer, Dorothy T.
An alternative definition has been developed of the delta scale of item difficulty used at Educational Testing Service. The traditional delta scale uses an inverse normal transformation based on normal ogive models developed years ago. However, no use is made of this fact in typical uses of item deltas. It is simply one way to make the probability…
2011-01-01
Background To develop a web-based computer adaptive testing (CAT) application for efficiently collecting data regarding workers' perceptions of job satisfaction, we examined whether a 37-item Job Content Questionnaire (JCQ-37) could evaluate the job satisfaction of individual employees as a single construct. Methods The JCQ-37 makes data collection via CAT on the internet easy, viable and fast. A Rasch rating scale model was applied to analyze data from 300 randomly selected hospital employees who participated in job-satisfaction surveys in 2008 and 2009 via non-adaptive and computer-adaptive testing, respectively. Results Of the 37 items on the questionnaire, 24 items fit the model fairly well. Person-separation reliability for the 2008 surveys was 0.88. Measures from both years and item-8 job satisfaction for groups were successfully evaluated through item-by-item analyses by using t-test. Workers aged 26 - 35 felt that job satisfaction was significantly worse in 2009 than in 2008. Conclusions A Web-CAT developed in the present paper was shown to be more efficient than traditional computer-based or pen-and-paper assessments at collecting data regarding workers' perceptions of job content. PMID:21496311
Item Response Theory and Health Outcomes Measurement in the 21st Century
Hays, Ron D.; Morales, Leo S.; Reise, Steve P.
2006-01-01
Item response theory (IRT) has a number of potential advantages over classical test theory in assessing self-reported health outcomes. IRT models yield invariant item and latent trait estimates (within a linear transformation), standard errors conditional on trait level, and trait estimates anchored to item content. IRT also facilitates evaluation of differential item functioning, inclusion of items with different response formats in the same scale, and assessment of person fit and is ideally suited for implementing computer adaptive testing. Finally, IRT methods can be helpful in developing better health outcome measures and in assessing change over time. These issues are reviewed, along with a discussion of some of the methodological and practical challenges in applying IRT methods. PMID:10982088
Electronic Quality of Life Assessment Using Computer-Adaptive Testing
2016-01-01
Background Quality of life (QoL) questionnaires are desirable for clinical practice but can be time-consuming to administer and interpret, making their widespread adoption difficult. Objective Our aim was to assess the performance of the World Health Organization Quality of Life (WHOQOL)-100 questionnaire as four item banks to facilitate adaptive testing using simulated computer adaptive tests (CATs) for physical, psychological, social, and environmental QoL. Methods We used data from the UK WHOQOL-100 questionnaire (N=320) to calibrate item banks using item response theory, which included psychometric assessments of differential item functioning, local dependency, unidimensionality, and reliability. We simulated CATs to assess the number of items administered before prespecified levels of reliability was met. Results The item banks (40 items) all displayed good model fit (P>.01) and were unidimensional (fewer than 5% of t tests significant), reliable (Person Separation Index>.70), and free from differential item functioning (no significant analysis of variance interaction) or local dependency (residual correlations < +.20). When matched for reliability, the item banks were between 45% and 75% shorter than paper-based WHOQOL measures. Across the four domains, a high standard of reliability (alpha>.90) could be gained with a median of 9 items. Conclusions Using CAT, simulated assessments were as reliable as paper-based forms of the WHOQOL with a fraction of the number of items. These properties suggest that these item banks are suitable for computerized adaptive assessment. These item banks have the potential for international development using existing alternative language versions of the WHOQOL items. PMID:27694100
Examination of the PROMIS upper extremity item bank.
Hung, Man; Voss, Maren W; Bounsanga, Jerry; Crum, Anthony B; Tyser, Andrew R
Clinical measurement. The psychometric properties of the PROMIS v1.2 UE item bank were tested on various samples prior to its release, but have not been fully evaluated among the orthopaedic population. This study assesses the performance of the UE item bank within the UE orthopaedic patient population. The UE item bank was administered to 1197 adult patients presenting to a tertiary orthopaedic clinic specializing in hand and UE conditions and was examined using traditional statistics and Rasch analysis. The UE item bank fits a unidimensional model (outfit MNSQ range from 0.64 to 1.70) and has adequate reliabilities (person = 0.84; item = 0.82) and local independence (item residual correlations range from -0.37 to 0.34). Only one item exhibits gender differential item functioning. Most items target low levels of function. The UE item bank is a useful clinical assessment tool. Additional items covering higher functions are needed to enhance validity. Supplemental testing is recommended for patients at higher levels of function until more high function UE items are developed. 2c. Copyright © 2016 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.
Piredda, Michela; Ghezzi, Valerio; Fenizia, Elisa; Marchetti, Anna; Petitti, Tommasangelo; De Marinis, Maria Grazia; Sili, Alessandro
2017-12-01
To develop and psychometrically test the Italian-language Nurse Caring Behaviours Scale, a short measure of nurse caring behaviour as perceived by inpatients. Patient perceptions of nurses' caring behaviours are a predictor of care quality. Caring behaviours are culture-specific, but no measure of patient perceptions has previously been developed in Italy. Moreover, existing tools show unclear psychometric properties, are burdensome for respondents, or are not widely applicable. Instrument development and psychometric testing. Item generation included identifying and adapting items from existing measures of caring behaviours as perceived by patients. A pool of 28 items was evaluated for face validity. Content validity indexes were calculated for the resulting 15-item scale; acceptability and clarity were pilot tested with 50 patients. To assess construct validity, a sample of 2,001 consecutive adult patients admitted to a hospital in 2014 completed the scale and was split into two groups. Reliability was evaluated using nonlinear structural equation modelling coefficients. Measurement invariance was tested across subsamples. Item 15 loaded poorly in the exploratory factor analysis (n = 983) and was excluded from the final solution, positing a single latent variable with 14 indicators. This model fitted the data moderately. The confirmatory factor analysis (n = 1018) returned similar results. Internal consistency was excellent in both subsamples. Full scalar invariance was reached, and no significant latent mean differences were detected across subsamples. The new instrument shows reasonable psychometric properties and is a promising short and widely applicable measure of inpatient perceptions of nurse caring behaviours. © 2017 John Wiley & Sons Ltd.
Developing Multiple Choice Tests: Tips & Techniques
ERIC Educational Resources Information Center
McCowan, Richard J.
1999-01-01
Item writing is a major responsibility of trainers. Too often, qualified staff who prepare lessons carefully and teach conscientiously use inadequate tests that do not validly reflect the true level of trainee achievement. This monograph describes techniques for constructing multiple-choice items that measure student performance accurately. It…
Does the Position of Response Options in Multiple-Choice Tests Matter?
ERIC Educational Resources Information Center
Hohensinn, Christine; Baghaei, Purya
2017-01-01
In large scale multiple-choice (MC) tests alternate forms of a test may be developed to prevent cheating by changing the order of items or by changing the position of the response options. The assumption is that since the content of the test forms are the same the order of items or the positions of the response options do not have any effect on…
Farin, Erik; Nagl, Michaela; Gramm, Lukas; Heyduck, Katja; Glattacker, Manuela
2014-05-01
Study aim was to translate the PROMIS(®) pain interference (PI) item bank (41 items) into German, test its psychometric properties in patients with chronic low back pain and develop static subforms. We surveyed N = 262 patients undergoing rehabilitation who were asked to fill out questionnaires at the beginning and 2 weeks after the end of rehabilitation, applying the Oswestry Disability Index (ODI) and Pain Disability Index (PDI) in addition to the PROMIS(®) PI items. For psychometric testing, a 1-parameter item response theory (IRT) model was used. Exploratory and confirmatory factor analyses as well as reliability and construct validity analyses were conducted. The assumptions regarding IRT scaling of the translated PROMIS(®) PI item bank as a whole were not confirmed. However, we succeeded in devising three static subforms (PI-G scales: PI mental 13 items, PI functional 11 items, PI physical 4 items), revealing good psychometric properties. The PI-G scales in their static form can be recommended for use in German-speaking countries. Their strengths versus the ODI and PDI are that pain interference is assessed in a differentiated manner and that several psychometric values are somewhat better than those associated with the ODI and PDI (distribution properties, IRT model fit, reliability). To develop an IRT-scaled item bank of the German translations of the PROMIS(®) PI items, it would be useful to have additional studies (e.g., with larger sample sizes and using a 2-parameter IRT model).
Development of an instrument to measure self-efficacy in caregivers of people with advanced cancer.
Ugalde, Anna; Krishnasamy, Meinir; Schofield, Penelope
2013-06-01
Informal caregivers of people with advanced cancer experience many negative impacts as a result of their role. There is a lack of suitable measures specifically designed to assess their experience. This study aimed to develop a new measure to assess self-efficacy in caregivers of people with advanced cancer. The development and testing of the new measure consisted of four separate, sequential phases: generation of issues, development of issues into items, pilot testing and field testing. In the generation of issues, 17 caregivers were interviewed to generate data. These data were analysed to generate codes, which were then systematically developed into items to construct the instrument. The instrument was pilot tested with 14 health professionals and five caregivers. It was then administered to a large sample for field testing to establish the psychometric properties, with established measures including the Brief Cope and the Family Appraisals for Caregiving Questionnaire for Palliative Care. Ninety-four caregivers completed the questionnaire booklet to establish the factor structure, reliability and validity. The factor analysis resulted in a 21-item, four-factor instrument, with the subscales being termed Resilience, Self-Maintenance, Emotional Connectivity and Instrumental Caregiving. The test-retest reliability and internal consistency were both excellent, ranging from 0.73 to 0.85 and 0.81 to 0.94, respectively. Six convergent and divergent hypotheses were made, and five were supported. This study has developed a new instrument to assess self-efficacy in caregivers of people with advanced cancer. The result is a four-factor, 21-item instrument with demonstrated reliability and validity. Copyright © 2012 John Wiley & Sons, Ltd.
ITEM ANALYSIS OF THREE SPANISH NAMING TESTS: A CROSS-CULTURAL INVESTIGATION
de la Plata, Carlos Marquez; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C. Munro
2009-01-01
Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test’s construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (126 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided. PMID:19208960
Alexander, Dayna S; Alfonso, Moya L; Cao, Chunhua
2016-12-01
Currently, public health practitioners are analyzing the role that caregivers play in childhood obesity efforts. Assessing African American caregiver's perceptions of childhood obesity in rural communities is an important prevention effort. This article's objective is to describe the development and psychometric testing of a survey tool to assess childhood obesity perceptions among African American caregivers in a rural setting, which can be used for obesity prevention program development or evaluation. The Childhood Obesity Perceptions (COP) survey was developed to reflect the multidimensional nature of childhood obesity including risk factors, health complications, weight status, built environment, and obesity prevention strategies. A 97-item survey was pretested and piloted with the priority population. After pretesting and piloting, the survey was reduced to 59-items and administered to 135 African American caregivers. An exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) was conducted to test how well the survey items represented the number of Social Cognitive Theory constructs. Twenty items were removed from the original 59-item survey and acceptable internal consistency of the six factors (α=0.70-0.85) was documented for all scales in the final COP instrument. CFA resulted in a less than adequate fit; however, a multivariate Lagrange multiplier test identified modifications to improve the model fit. The COP survey represents a promising approach as a potentially comprehensive assessment for implementation or evaluation of childhood obesity programs. Copyright © 2016 Elsevier Ltd. All rights reserved.
Duncan, Mitch J; Rashid, Mahbub; Vandelanotte, Corneel; Cutumisu, Nicoleta; Plotnikoff, Ronald C
2013-02-04
Spatial configurations of office environments assessed by Space Syntax methodologies are related to employee movement patterns. These methods require analysis of floors plans which are not readily available in large population-based studies or otherwise unavailable. Therefore a self-report instrument to assess spatial configurations of office environments using four scales was developed. The scales are: local connectivity (16 items), overall connectivity (11 items), visibility of co-workers (10 items), and proximity of co-workers (5 items). A panel cohort (N = 1154) completed an online survey, only data from individuals employed in office-based occupations (n = 307) were used to assess scale measurement properties. To assess test-retest reliability a separate sample of 37 office-based workers completed the survey on two occasions 7.7 (±3.2) days apart. Redundant scale items were eliminated using factor analysis; Chronbach's α was used to evaluate internal consistency and test re-test reliability (retest-ICC). ANOVA was employed to examine differences between office types (Private, Shared, Open) as a measure of construct validity. Generalized Linear Models were used to examine relationships between spatial configuration scales and the duration of and frequency of breaks in occupational sitting. The number of items on all scales were reduced, Chronbach's α and ICCs indicated good scale internal consistency and test re-test reliability: local connectivity (5 items; α = 0.70; retest-ICC = 0.84), overall connectivity (6 items; α = 0.86; retest-ICC = 0.87), visibility of co-workers (4 items; α = 0.78; retest-ICC = 0.86), and proximity of co-workers (3 items; α = 0.85; retest-ICC = 0.70). Significant (p ≤ 0.001) differences, in theoretically expected directions, were observed for all scales between office types, except overall connectivity. Significant associations were observed between all scales and occupational sitting behaviour (p ≤ 0.05). All scales have good measurement properties indicating the instrument may be a useful alternative to Space Syntax to examine environmental correlates of occupational sitting in population surveys.
2013-01-01
Background Spatial configurations of office environments assessed by Space Syntax methodologies are related to employee movement patterns. These methods require analysis of floors plans which are not readily available in large population-based studies or otherwise unavailable. Therefore a self-report instrument to assess spatial configurations of office environments using four scales was developed. Methods The scales are: local connectivity (16 items), overall connectivity (11 items), visibility of co-workers (10 items), and proximity of co-workers (5 items). A panel cohort (N = 1154) completed an online survey, only data from individuals employed in office-based occupations (n = 307) were used to assess scale measurement properties. To assess test-retest reliability a separate sample of 37 office-based workers completed the survey on two occasions 7.7 (±3.2) days apart. Redundant scale items were eliminated using factor analysis; Chronbach’s α was used to evaluate internal consistency and test re-test reliability (retest-ICC). ANOVA was employed to examine differences between office types (Private, Shared, Open) as a measure of construct validity. Generalized Linear Models were used to examine relationships between spatial configuration scales and the duration of and frequency of breaks in occupational sitting. Results The number of items on all scales were reduced, Chronbach’s α and ICCs indicated good scale internal consistency and test re-test reliability: local connectivity (5 items; α = 0.70; retest-ICC = 0.84), overall connectivity (6 items; α = 0.86; retest-ICC = 0.87), visibility of co-workers (4 items; α = 0.78; retest-ICC = 0.86), and proximity of co-workers (3 items; α = 0.85; retest-ICC = 0.70). Significant (p ≤ 0.001) differences, in theoretically expected directions, were observed for all scales between office types, except overall connectivity. Significant associations were observed between all scales and occupational sitting behaviour (p ≤ 0.05). Conclusion All scales have good measurement properties indicating the instrument may be a useful alternative to Space Syntax to examine environmental correlates of occupational sitting in population surveys. PMID:23379485
A review of guidelines on home drug testing web sites for parents.
Washio, Yukiko; Fairfax-Columbo, Jaymes; Ball, Emily; Cassey, Heather; Arria, Amelia M; Bresani, Elena; Curtis, Brenda L; Kirby, Kimberly C
2014-01-01
To update and extend prior work reviewing Web sites that discuss home drug testing for parents, and assess the quality of information that the Web sites provide, to assist them in deciding when and how to use home drug testing. We conducted a worldwide Web search that identified 8 Web sites providing information for parents on home drug testing. We assessed the information on the sites using a checklist developed with field experts in adolescent substance abuse and psychosocial interventions that focus on urine testing. None of the Web sites covered all the items on the 24-item checklist, and only 3 covered at least half of the items (12, 14, and 21 items, respectively). The remaining 5 Web sites covered less than half of the checklist items. The mean number of items covered by the Web sites was 11. Among the Web sites that we reviewed, few provided thorough information to parents regarding empirically supported strategies to effectively use drug testing to intervene on adolescent substance use. Furthermore, most Web sites did not provide thorough information regarding the risks and benefits to inform parents' decision to use home drug testing. Empirical evidence regarding efficacy, benefits, risks, and limitations of home drug testing is needed.
ERIC Educational Resources Information Center
Grunert, Megan L.; Raker, Jeffrey R.; Murphy, Kristen L.; Holme, Thomas A.
2013-01-01
The concept of assigning partial credit on multiple-choice test items is considered for items from ACS Exams. Because the items on these exams, particularly the quantitative items, use common student errors to define incorrect answers, it is possible to assign partial credits to some of these incorrect responses. To do so, however, it becomes…
Development and Preliminary Validation of the Strategic Thinking Mindset Test (STMT)
2017-06-01
reliability. The test’s three subscales (intellectual flexibility, inclusiveness, and humility) each correlated significantly with alternative measures of...34 TABLE 9. STAGE 4 SAMPLE DEMOGRAPHICS ................................................................ 35 TABLE 10. INTERITEM CORRELATION ...MATRIX (ALL ITEMS) ...................................... 39 TABLE 11. ITEM-SCALE AND VALIDITY CORRELATIONS (ALL ITEMS) .................... 40
[Development of competency to stand trial rating scale in offenders with mental disorders].
Chen, Xiao-Bing; Cai, Wei-Xiong
2013-04-01
According with Chinese legal system, to develop a competency to stand trial rating scale in offenders with mental disorders. Proceeding from the juristical elements, 15 items were extracted and formulated a preliminary instrument named the competency to stand trial rating scale in offenders with mental disorders. The item analysis included six aspects, which were critical ratio, item-total correlation, corrected item-total correlation, alpha value if item deleted, communalities of items, and factor loading. The Logistic regression equation and cut-off score of ROC curve were used to explore the diagnostic efficiency. The data of critical ratio of extreme group were 18.390-46.763; item-total correlation, 0.639-0.952; corrected item-total correlation, 0.582-0.944; communalities of items, 0.377-0.916; and factor loadings, 0.614-0.957. Seven items were included in the regression equation and the accuracy of back substitution test was 96.0%. The score of 33 was ascertained as the cut-off score by ROC fitting curve, the overlapping ratio compared with the expertise was 95.8%. The sensibility and the specificity were 0.938 and 0.966, respectively, while the positive and negative likelihood ratios were 27.67 and 0.06, respectively. With all items satisfied the requirement of homogeneity test, the rating scale has a reasonable construct and excellent diagnostic efficiency.
NASA Astrophysics Data System (ADS)
Witzig, Stephen B.; Rebello, Carina M.; Siegel, Marcelle A.; Freyermuth, Sharyn K.; Izci, Kemal; McClure, Bruce
2014-10-01
Identifying students' conceptual scientific understanding is difficult if the appropriate tools are not available for educators. Concept inventories have become a popular tool to assess student understanding; however, traditionally, they are multiple choice tests. International science education standard documents advocate that assessments should be reform based, contain diverse question types, and should align with instructional approaches. To date, no instrument of this type targeting student conceptions in biotechnology has been developed. We report here the development, testing, and validation of a 35-item Biotechnology Instrument for Knowledge Elicitation (BIKE) that includes a mix of question types. The BIKE was designed to elicit student thinking and a variety of conceptual understandings, as opposed to testing closed-ended responses. The design phase contained nine steps including a literature search for content, student interviews, a pilot test, as well as expert review. Data from 175 students over two semesters, including 16 student interviews and six expert reviewers (professors from six different institutions), were used to validate the instrument. Cronbach's alpha on the pre/posttest was 0.664 and 0.668, respectively, indicating the BIKE has internal consistency. Cohen's kappa for inter-rater reliability among the 6,525 total items was 0.684 indicating substantial agreement among scorers. Item analysis demonstrated that the items were challenging, there was discrimination among the individual items, and there was alignment with research-based design principles for construct validity. This study provides a reliable and valid conceptual understanding instrument in the understudied area of biotechnology.
Mao, Hui-Fen; Chen, Wan-Yin; Yao, Grace; Huang, Sheau-Ling; Lin, Chia-Chi; Huang, Wen-Ni Wennie
2010-05-01
To develop and validate a cross-cultural version of the Quebec User Evaluation of Satisfaction with Assistive Technology (QUEST 2.0) for users of assistive technology devices in Taiwan. A cross-sectional survey. The standard cultural adaptation procedure was used for questionnaire translation and cultural item design. A field test was then conducted for item selection and psychometric properties testing. One hundred and five volunteer assistive device users in community. A questionnaire comprising 12 items of the QUEST 2.0 and 16 culture-specific items. One culture-specific item, 'Cost', was selected based on eight criteria and added to the QUEST 2.0 (12 items) to formulate the Taiwanese version of QUEST 2.0 (T-QUEST). The T-QUEST consisted of 13 items which were classified into two domains: device (8 items) and service (5 items). The internal consistencies of the device, service and total T-QUEST scores were 0.87, 0.84 and 0.90, respectively. The device, services and total T-QUEST scores achieved good test-retest stability (intraclass correlation coefficient (ICC) 0.90, 0.97, 0.95). Exploratory factor analysis revealed that T-QUEST had a two-factor structure for device and service in the construct of user satisfaction (53.42% of the variance explained). Users of assistive device in different culture may have different concerns regarding satisfaction. T-QUEST is the first published version of QUEST with culture-specific items added to the original translated items of QUEST 2.0. T-QUEST was a valid and reliable tool for measuring user satisfaction among Mandarin-speaking individuals using various kinds of assistive devices.
The relationship between memory and inductive reasoning: does it develop?
Hayes, Brett K; Fritz, Kristina; Heit, Evan
2013-05-01
In 2 studies, the authors examined the development of the relationship between inductive reasoning and visual recognition memory. In both studies, 5- to 6-year-old children and adults were shown instances of a basic-level category (dogs) followed by a test set containing old and new category members that varied in their similarity to study items. Participants were given either recognition instructions (memorize study items and discriminate between old and new test items) or induction instructions (learn about a novel property shared by the study items and decide whether it generalizes to test items). Across both tasks, children made a greater number of positive responses than did adults. Across both age groups, a greater number of positive responses were made in induction than in recognition. The application of a mathematical model, called GEN-EX for generalization from examples, showed that both memory and reasoning data could be explained by a single exemplar-based process that assumes task and age differences in generalization gradients. These results show considerable developmental continuity in the cognitive processes that underlie memory and inductive reasoning.
Huang, Chien-Yu; Tung, Li-Chen; Chou, Yeh-Tai; Chou, Willy; Chen, Kuan-Lin; Hsieh, Ching-Lin
2017-07-27
This study aimed at improving the utility of the fine motor subscale of the comprehensive developmental inventory for infants and toddlers (CDIIT) by developing a computerized adaptive test of fine motor skills. We built an item bank for the computerized adaptive test of fine motor skills using the fine motor subscale of the CDIIT items fitting the Rasch model. We also examined the psychometric properties and efficiency of the computerized adaptive test of fine motor skills with simulated computerized adaptive tests. Data from 1742 children with suspected developmental delays were retrieved. The mean scores of the fine motor subscale of the CDIIT increased along with age groups (mean scores = 1.36-36.97). The computerized adaptive test of fine motor skills contains 31 items meeting the Rasch model's assumptions (infit mean square = 0.57-1.21, outfit mean square = 0.11-1.17). For children of 6-71 months, the computerized adaptive test of fine motor skills had high Rasch person reliability (average reliability >0.90), high concurrent validity (rs = 0.67-0.99), adequate to excellent diagnostic accuracy (area under receiver operating characteristic = 0.71-1.00), and large responsiveness (effect size = 1.05-3.93). The computerized adaptive test of fine motor skills used 48-84% fewer items than the fine motor subscale of the CDIIT. The computerized adaptive test of fine motor skills used fewer items for assessment but was as reliable and valid as the fine motor subscale of the CDIIT. Implications for Rehabilitation We developed a computerized adaptive test based on the comprehensive developmental inventory for infants and toddlers (CDIIT) for assessing fine motor skills. The computerized adaptive test has been shown to be efficient because it uses fewer items than the original measure and automatically presents the results right after the test is completed. The computerized adaptive test is as reliable and valid as the CDIIT.
Analysis of Individual "Test Of Astronomy STandards" (TOAST) Item Responses
ERIC Educational Resources Information Center
Slater, Stephanie J.; Schleigh, Sharon Price; Stork, Debra J.
2015-01-01
The development of valid and reliable strategies to efficiently determine the knowledge landscape of introductory astronomy college students is an effort of great interest to the astronomy education community. This study examines individual item response rates from a widely used conceptual understanding survey, the Test Of Astronomy Standards…
Computerized Adaptive Testing: Some Issues in Development.
ERIC Educational Resources Information Center
Orcutt, Venetia L.
The emergence of enhanced capabilities in computer technology coupled with the growing body of knowledge regarding item response theory has resulted in the expansion of computerized adaptive test (CAT) utilization in a variety of venues. Newcomers to the field need a more thorough understanding of item response theory (IRT) principles, their…
Solari, A; Mattarozzi, K; Vignatelli, L; Giordano, A; Russo, P M; Uccelli, M Messmer; D'Alessandro, R
2010-10-01
We describe the development and clinical validation of a patient self-administered tool assessing the quality of multiple sclerosis diagnosis disclosure. A multiple sclerosis expert panel generated questionnaire items from the Doctor's Interpersonal Skills Questionnaire, literature review, and interviews with neurology inpatients. The resulting 19-item Comunicazione medico-paziente nella Sclerosi Multipla (COSM) was pilot tested/debriefed on seven patients with multiple sclerosis and administered to 80 patients newly diagnosed with multiple sclerosis. The resulting revised 20-item version (COSM-R) was debriefed on five patients with multiple sclerosis, field tested/debriefed on multiple sclerosis patients, and field tested on 105 patients newly diagnosed with multiple sclerosis participating in a clinical trial on an information aid. The hypothesized monofactorial structure of COSM-R section 2 was tested on the latter two groups. The questionnaire was well accepted. Scaling assumptions were satisfactory in terms of score distributions, item-total correlations and internal consistency. Factor analysis confirmed section 2's monofactorial structure, which was also test-retest reliable (intraclass correlation coefficient [ICC] 0.73; 95% CI 0.54-0.85). Section 1 had only fair test-retest reliability (ICC 0.45; 95% CI 0.12-0.69), and three items had 8-21% missed responses. COSM-R is a brief, easy-to-interpret MS-specific questionnaire for use as a health care indicator.
The value of item response theory in clinical assessment: a review.
Thomas, Michael L
2011-09-01
Item response theory (IRT) and related latent variable models represent modern psychometric theory, the successor to classical test theory in psychological assessment. Although IRT has become prevalent in the measurement of ability and achievement, its contributions to clinical domains have been less extensive. Applications of IRT to clinical assessment are reviewed to appraise its current and potential value. Benefits of IRT include comprehensive analyses and reduction of measurement error, creation of computer adaptive tests, meaningful scaling of latent variables, objective calibration and equating, evaluation of test and item bias, greater accuracy in the assessment of change due to therapeutic intervention, and evaluation of model and person fit. The theory may soon reinvent the manner in which tests are selected, developed, and scored. Although challenges remain to the widespread implementation of IRT, its application to clinical assessment holds great promise. Recommendations for research, test development, and clinical practice are provided.
Stochl, Jan; Böhnke, Jan R; Pickett, Kate E; Croudace, Tim J
2016-05-20
Recent developments in psychometric modeling and technology allow pooling well-validated items from existing instruments into larger item banks and their deployment through methods of computerized adaptive testing (CAT). Use of item response theory-based bifactor methods and integrative data analysis overcomes barriers in cross-instrument comparison. This paper presents the joint calibration of an item bank for researchers keen to investigate population variations in general psychological distress (GPD). Multidimensional item response theory was used on existing health survey data from the Scottish Health Education Population Survey (n = 766) to calibrate an item bank consisting of pooled items from the short common mental disorder screen (GHQ-12) and the Affectometer-2 (a measure of "general happiness"). Computer simulation was used to evaluate usefulness and efficacy of its adaptive administration. A bifactor model capturing variation across a continuum of population distress (while controlling for artefacts due to item wording) was supported. The numbers of items for different required reliabilities in adaptive administration demonstrated promising efficacy of the proposed item bank. Psychometric modeling of the common dimension captured by more than one instrument offers the potential of adaptive testing for GPD using individually sequenced combinations of existing survey items. The potential for linking other item sets with alternative candidate measures of positive mental health is discussed since an optimal item bank may require even more items than these.
Development and reliability testing of the Worksite and Energy Balance Survey.
Hoehner, Christine M; Budd, Elizabeth L; Marx, Christine M; Dodson, Elizabeth A; Brownson, Ross C
2013-01-01
Worksites represent important venues for health promotion. Development of psychometrically sound measures of worksite environments and policy supports for physical activity and healthy eating are needed for use in public health research and practice. Assess the test-retest reliability of the Worksite and Energy Balance Survey (WEBS), a self-report instrument for assessing perceptions of worksite supports for physical activity and healthy eating. The WEBS included items adapted from existing surveys or new items on the basis of a review of the literature and expert review. Cognitive interviews among 12 individuals were used to test the clarity of items and further refine the instrument. A targeted random-digit-dial telephone survey was administered on 2 occasions to assess test-retest reliability (mean days between time periods = 8; minimum = 5; maximum = 14). Five Missouri census tracts that varied by racial-ethnic composition and walkability. Respondents included 104 employed adults (67% white, 64% women, mean age = 48.6 years). Sixty-three percent were employed at worksites with less than 100 employees, approximately one-third supervised other people, and the majority worked a regular daytime shift (75%). Test-retest reliability was assessed using Spearman correlations for continuous variables, Cohen's κ statistics for nonordinal categorical variables, and 1-way random intraclass correlation coefficients for ordinal categorical variables. Test-retest coefficients ranged from 0.41 to 0.97, with 80% of items having reliability coefficients of more than 0.6. Items that assessed participation in or use of worksite programs/facilities tended to have lower reliability. Reliability of some items varied by gender, obesity status, and worksite size. Test-retest reliability and internal consistency for the 5 scales ranged from 0.84 to 0.94 and 0.63 to 0.84, respectively. The WEBS items and scales exhibited sound test-retest reliability and may be useful for research and surveillance. Further evaluation is needed to document the validity of the WEBS and associations with energy balance outcomes.
ERIC Educational Resources Information Center
Muratti, Jose E.; And Others
A parallel Spanish edition was developed of released objectives and objective-referenced items used in the National Assessment of Educational Progress (NAEP) in the field of Career and Occupational Development (COD). The Spanish edition was designed to assess the identical skills, attitudes, concepts, and knowledge of Spanish-dominant students…
NASA Astrophysics Data System (ADS)
Liu, Xiufeng; McKeough, Anne
2005-05-01
The aim of this study was to develop a model of students' energy concept development. Applying Case's (1985, 1992) structural theory of cognitive development, we hypothesized that students' concept of energy undergoes a series of transitions, corresponding to systematic increases in working memory capacity. The US national sample from the Third International Mathematics and Science Study (TIMSS) database was used to test our hypothesis. Items relevant to the energy concept in the TIMSS test booklets for three populations were identified. Item difficulty from Rasch modeling was used to test the hypothesized developmental sequence, and percentage of students' correct responses was used to test the correspondence between students' age/grade level and level of the energy concepts. The analysis supported our hypothesized sequence of energy concept development and suggested mixed effects of maturation and schooling on energy concept development. Further, the results suggest that curriculum and instruction design take into consideration the developmental progression of students' concept of energy.
ERIC Educational Resources Information Center
Zechner, Klaus; Chen, Lei; Davis, Larry; Evanini, Keelan; Lee, Chong Min; Leong, Chee Wee; Wang, Xinhao; Yoon, Su-Youn
2015-01-01
This research report presents a summary of research and development efforts devoted to creating scoring models for automatically scoring spoken item responses of a pilot administration of the Test of English-for-Teaching ("TEFT"™) within the "ELTeach"™ framework.The test consists of items for all four language modalities:…
Rodríguez, Daniela C; Hoe, Connie; Dale, Elina M; Rahman, M Hafizur; Akhter, Sadika; Hafeez, Assad; Irava, Wayne; Rajbangshi, Preety; Roman, Tamlyn; Ţîrdea, Marcela; Yamout, Rouham; Peters, David H
2017-08-01
The capacity to demand and use research is critical for governments if they are to develop policies that are informed by evidence. Existing tools designed to assess how government officials use evidence in decision-making have significant limitations for low- and middle-income countries (LMICs); they are rarely tested in LMICs and focus only on individual capacity. This paper introduces an instrument that was developed to assess Ministry of Health (MoH) capacity to demand and use research evidence for decision-making, which was tested for reliability and validity in eight LMICs (Bangladesh, Fiji, India, Lebanon, Moldova, Pakistan, South Africa, Zambia). Instrument development was based on a new conceptual framework that addresses individual, organisational and systems capacities, and items were drawn from existing instruments and a literature review. After initial item development and pre-testing to address face validity and item phrasing, the instrument was reduced to 54 items for further validation and item reduction. In-country study teams interviewed a systematic sample of 203 MoH officials. Exploratory factor analysis was used in addition to standard reliability and validity measures to further assess the items. Thirty items divided between two factors representing organisational and individual capacity constructs were identified. South Africa and Zambia demonstrated the highest level of organisational capacity to use research, whereas Pakistan and Bangladesh were the lowest two. In contrast, individual capacity was highest in Pakistan, followed by South Africa, whereas Bangladesh and Lebanon were the lowest. The framework and related instrument represent a new opportunity for MoHs to identify ways to understand and improve capacities to incorporate research evidence in decision-making, as well as to provide a basis for tracking change.
Development and testing of the Multidimensional Trust in Health Care Systems Scale.
Egede, Leonard E; Ellis, Charles
2008-06-01
To describe the development and psychometric testing of the Multidimensional Trust in Health Care Systems Scale (MTHCSS). Scale development occurred in 2 phases. In phase 1, a pilot instrument with 70 items was generated from the review of the trust literature, focus groups, and expert opinion. The 70 items were pilot tested in a sample of 256 students. Exploratory factor analysis was used to derive an orthogonal set of correlated factors. In phase 2, the final scale was administered to 301 primary care patients to assess reliability and validity. Phase 2 participants also completed validated measures of patient-centered care, health locus of control, medication nonadherence, social support, and patient satisfaction. In phase 1, a 17-item scale (MTHCSS) was developed with 10 items measuring trust in health care providers, 4 items measuring trust in health care payers, and 3 items measuring trust in health care institutions. In phase 2, the 17-item MTHCSS had a mean score of 63.0 (SD 8.8); the provider subscale had a mean of 40.0 (SD 6.2); the payers subscale had a mean of 12.8 (SD 3.0); and the institutions subscale had a mean of 10.3 (SD 2.1). Cronbach's alpha for the MTHCSS was 0.89 and 0.92, 0.74, and 0.64 for the 3 subscales. The MTHCSS was significantly correlated with patient-centered care (r = .22 to .62), locus of control-chance (r = .42), medication nonadherence (r = -.22), social support (r = .25), and patient satisfaction (r = .67). The MTHCSS is a valid and reliable instrument for measuring the 3 objects of trust in health care and is correlated with patient-level health outcomes.
Development of the multiple sclerosis (MS) early mobility impairment questionnaire (EMIQ).
Ziemssen, Tjalf; Phillips, Glenn; Shah, Ruchit; Mathias, Adam; Foley, Catherine; Coon, Cheryl; Sen, Rohini; Lee, Andrew; Agarwal, Sonalee
2016-10-01
The Early Mobility Impairment Questionnaire (EMIQ) was developed to facilitate early identification of mobility impairments in multiple sclerosis (MS) patients. We describe the initial development of the EMIQ with a focus on the psychometric evaluation of the questionnaire using classical and item response theory methods. The initial 20-item EMIQ was constructed by clinical specialists and qualitatively tested among people with MS and physicians via cognitive interviews. Data from an observational study was used to make additional updates to the instrument based on exploratory factor analysis (EFA) and item response theory (IRT) analysis, and psychometric analyses were performed to evaluate the reliability and validity of the final instrument's scores and screening properties (i.e., sensitivity and specificity). Based on qualitative interview analyses, a revised 15-item EMIQ was included in the observational study. EFA, IRT and item-to-item correlation analyses revealed redundant items which were removed leading to the final nine-item EMIQ. The nine-item EMIQ performed well with respect to: test-retest reliability (ICC = 0.858); internal consistency (α = 0.893); convergent validity; and known-groups methods for construct validity. A cut-point of 41 on the 0-to-100 scale resulted in sufficient sensitivity and specificity statistics for viably identifying patients with mobility impairment. The EMIQ is a content valid and psychometrically sound instrument for capturing MS patients' experience with mobility impairments in a clinical practice setting. Additional research is suggested to further confirm the EMIQ's screening properties over time.
Study deviance-type scale in the development of Korean elder
Cho, Gun-Sang; Yi, Eun-Surk; Hwang, Hee-Jeong
2015-01-01
This research aims to develop a questionnaire of deviant behavior for the Korean elderly people which may make a big contribution to the examination of deviance behavior of the elderly people and may play an important role in providing a methodological basis. In order to accomplish the purpose of the this study, there were three different stages; (a) making preliminary question items, (b) refining the items of the scale through a plot study, and (c) finalizing question items by a main survey. In the first stage, 43 question items were developed using the open-ended questionnaire and structural inquiry of succession from 137 elderly people who are over 65 yr. In the second phase, based on data collected by the 200 elderly people pilot testing was performed through exploratory factor analysis and reliability test. The scale is a 27-item self-report questionnaire. In the main survey conducted by 184 elderly people, 21 items, which consisted of four subfactors, were finalized in order to measure deviance behaviors of the Korean elderly people: social deviance (n=8), economic deviance (n=5), psychological deviance (n=5), and physical deviance (n=3). PMID:26730382
Operationalization of Burnout.
ERIC Educational Resources Information Center
Matthews, Doris B.
This study was designed to develop instruments to measure employee burnout. The Matthews Burnout Scale for Employees is a 50-item self-report measure. The Matthews Burnout Scale for Supervisors is a 50-item scale for use in evaluating employee burnout. Content-based items were tested for construct validity with a group of employees, and their…
Improving Cancer-Related Outcomes with Connected Health - Action Items at a Glance
Action Item 1.1: Health IT stakeholder groups should continue to collaborate to overcome policy and technical barriers to a nationwide, interoperable health IT system. Action Item 1.2: Technical standards for information related to cancer care across the continuum should be developed, tested, disseminated, and adopted.
Different Approaches to Covariate Inclusion in the Mixture Rasch Model
ERIC Educational Resources Information Center
Li, Tongyun; Jiao, Hong; Macready, George B.
2016-01-01
The present study investigates different approaches to adding covariates and the impact in fitting mixture item response theory models. Mixture item response theory models serve as an important methodology for tackling several psychometric issues in test development, including the detection of latent differential item functioning. A Monte Carlo…
Dückers, Michel L A; Wagner, Cordula; Groenewegen, Peter P
2008-08-11
In quality improvement collaboratives (QICs) teams of practitioners from different health care organizations are brought together to systematically improve an aspect of patient care. Teams take part in a series of meetings to learn about relevant best practices, quality methods and change ideas, and share experiences in making changes in their own local setting. The purpose of this study was to develop an instrument for measuring team organization, external change agent support and support from the team's home institution in a Dutch national improvement and dissemination programme for hospitals based on several QICs. The exploratory methodological design included two phases: a) content development and assessment, resulting in an instrument with 15 items, and b) field testing (N = 165). Internal consistency reliability was tested via Cronbach's alpha coefficient. Principal component analyses were used to identify underlying constructs. Tests of scaling assumptions according to the multi trait/multi-item matrix, were used to confirm the component structure. Three components were revealed, explaining 65% of the variability. The components were labelled 'organizational support', 'team organization' and 'external change agent support'. One item not meeting item-scale criteria was removed. This resulted in a 14 item instrument. Scale reliability ranged from 0.77 to 0.91. Internal item consistency and divergent validity were satisfactory. On the whole, the instrument appears to be a promising tool for assessing team organization and internal and external support during QIC implementation. The psychometric properties were good and warrant application of the instrument for the evaluation of the national programme and similar improvement programmes.
Development of the Leadership Influence Self-Assessment (LISA©) instrument.
Shillam, Casey R; Adams, Jeffrey M; Bryant, Debbie Chatman; Deupree, Joy P; Miyamoto, Suzanne; Gregas, Matt
This study aims to describe the development and psychometric evaluation of the Leadership Influence Self-Assessment (LISA©) tool. LISA© was designed to help nurse leaders assess and enhance their influence capacity by measuring influence traits and practices and identifying areas of strength and weakness. Concepts identified in the Adams Influence Model and input from content experts guided the development of 145 items for testing. Administered to 165 nurse leaders, the assessment was subjected to exploratory factor analysis (EFA). EFA yielded a four-factor solution that comprised 80 items. Cronbach's alpha for factors ranged between 0.912 and 0.938. All factor loadings were >0.4; the smallest factor contained 14 items. Items grouped together in the theoretical model also clustered together in the EFA. Preliminary psychometric testing supports validity and reliability of the LISA© and its potential use as a tool to assess influence capacity for purposes of leadership development and research. Copyright © 2017 Elsevier Inc. All rights reserved.
Local Development of Subject Area Item Banks.
ERIC Educational Resources Information Center
Ward, Annie W.; Barlow, Gene
1984-01-01
It is feasible for school districts to develop and use subject area tests as reliable as those previously available only from commercial publishers. Three projects in local item development in a large school district are described. The first involved only Algebra 1. The second involved life science and career education at the elementary level; and…
Scales for assessing self-efficacy of nurses and assistants for preventing falls
Dykes, Patricia C.; Carroll, Diane; McColgan, Kerry; Hurley, Ann C.; Lipsitz, Stuart R.; Colombo, Lisa; Zuyev, Lyubov; Middleton, Blackford
2011-01-01
Aim This paper is a report of the development and testing of the Self-Efficacy for Preventing Falls Nurse and Assistant scales. Background Patient falls and fall-related injuries are traumatic ordeals for patients, family members and providers, and carry a toll for hospitals. Self-efficacy is an important factor in determining actions persons take and levels of performance they achieve. Performance of individual caregivers is linked to the overall performance of hospitals. Scales to assess nurses and certified nursing assistants’ self-efficacy to prevent patients from falling would allow for targeting resources to increase SE, resulting in improved individual performance and ultimately decreased numbers of patient falls. Method Four phases of instrument development were carried out to (1) generate individual items from eight focus groups (four each nurse and assistant conducted in October 2007), (2) develop prototype scales, (3) determine content validity during a second series of four nurse and assistant focus groups (January 2008) and (4) conduct item analysis, paired t-tests, Student’s t-tests and internal consistency reliability to refine and confirm the scales. Data were collected during February–December, 2008. Results The 11-item Self-Efficacy for Preventing Falls Nurse had an alpha of 0·89 with all items in the range criterion of 0·3–0·7 for item total correlation. The 8-item Self-Efficacy for Preventing Falls Assistant had an alpha of 0·74 and all items had item total correlations in the 0·3–0·7 range. Conclusions The Self-Efficacy for Preventing Falls Nurse and Self-Efficacy for Preventing Falls Assistant scales demonstrated psychometric adequacy and are recommended to measure bedside staff’s self-efficacy beliefs in preventing patient falls. PMID:21073506
Better assessment of physical function: item improvement is neglected but essential
2009-01-01
Introduction Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. Methods The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. Results We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Conclusions Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes. PMID:20015354
Better assessment of physical function: item improvement is neglected but essential.
Bruce, Bonnie; Fries, James F; Ambrosini, Debbie; Lingala, Bharathi; Gandek, Barbara; Rose, Matthias; Ware, John E
2009-01-01
Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes.
Development of the Assessment of Belief Conflict in Relationship-14 (ABCR-14).
Kyougoku, Makoto; Teraoka, Mutsumi; Masuda, Noriko; Ooura, Mariko; Abe, Yasushi
2015-01-01
Nurses and other healthcare workers frequently experience belief conflict, one of the most important, new stress-related problems in both academic and clinical fields. In this study, using a sample of 1,683 nursing practitioners, we developed The Assessment of Belief Conflict in Relationship-14 (ABCR-14), a new scale that assesses belief conflict in the healthcare field. Standard psychometric procedures were used to develop and test the scale, including a qualitative framework concept and item-pool development, item reduction, and scale development. We analyzed the psychometric properties of ABCR-14 according to entropy, polyserial correlation coefficient, exploratory factor analysis, confirmatory factor analysis, average variance extracted, Cronbach's alpha, Pearson product-moment correlation coefficient, and multidimensional item response theory (MIRT). The results of the analysis supported a three-factor model consisting of 14 items. The validity and reliability of ABCR-14 was suggested by evidence from high construct validity, structural validity, hypothesis testing, internal consistency reliability, and concurrent validity. The result of the MIRT offered strong support for good item response of item slope parameters and difficulty parameters. However, the ABCR-14 Likert scale might need to be explored from the MIRT point of view. Yet, as mentioned above, there is sufficient evidence to support that ABCR-14 has high validity and reliability. The ABCR-14 demonstrates good psychometric properties for nursing belief conflict. Further studies are recommended to confirm its application in clinical practice.
Hong, Quan Nha; Coutu, Marie-France; Berbiche, Djamal
2017-01-01
The Work Role Functioning Questionnaire (WRFQ) was developed to assess workers' perceived ability to perform job demands and is used to monitor presenteeism. Still few studies on its validity can be found in the literature. The purpose of this study was to assess the items and factorial composition of the Canadian French version of the WRFQ (WRFQ-CF). Two measurement approaches were used to test the WRFQ-CF: Classical Test Theory (CTT) and non-parametric Item Response Theory (IRT). A total of 352 completed questionnaires were analyzed. A four-factor and three-factor model models were tested and shown respectively good fit with 14 items (Root Mean Square Error of Approximation (RMSEA) = 0.06, Standardized Root Mean Square Residual (SRMR) = 0.04, Bentler Comparative Fit Index (CFI) = 0.98) and with 17 items (RMSEA = 0.059, SRMR = 0.048, CFI = 0.98). Using IRT, 13 problematic items were identified, of which 9 were common with CTT. This study tested different models with fewer problematic items found in a three-factor model. Using a non-parametric IRT and CTT for item purification gave complementary results. IRT is still scarcely used and can be an interesting alternative method to enhance the quality of a measurement instrument. More studies are needed on the WRFQ-CF to refine its items and factorial composition.
Ye, Zeng Jie; Liang, Mu Zi; Zhang, Hao Wei; Li, Peng Fei; Ouyang, Xue Ren; Yu, Yuan Liang; Liu, Mei Ling; Qiu, Hong Zhong
2018-06-01
Classic theory test has been used to develop and validate the 25-item Resilience Scale Specific to Cancer (RS-SC) in Chinese patients with cancer. This study was designed to provide additional information about the discriminative value of the individual items tested with an item response theory analysis. A two-parameter graded response model was performed to examine whether any of the items of the RS-SC exhibited problems with the ordering and steps of thresholds, as well as the ability of items to discriminate patients with different resilience levels using item characteristic curves. A sample of 214 Chinese patients with cancer diagnosis was analyzed. The established three-dimension structure of the RS-SC was confirmed. Several items showed problematic thresholds or discrimination ability and require further revision. Some problematic items should be refined and a short-form of RS-SC maybe feasible in clinical settings in order to reduce burden on patients. However, the generalizability of these findings warrants further investigations.
Assessment of the psychometrics of a PROMIS item bank: self-efficacy for managing daily activities
Hong, Ickpyo; Li, Chih-Ying; Romero, Sergio; Gruber-Baldini, Ann L.; Shulman, Lisa M.
2017-01-01
Purpose The aim of this study is to investigate the psychometrics of the Patient-Reported Outcomes Measurement Information System self-efficacy for managing daily activities item bank. Methods The item pool was field tested on a sample of 1087 participants via internet (n = 250) and in-clinic (n = 837) surveys. All participants reported having at least one chronic health condition. The 35 item pool was investigated for dimensionality (confirmatory factor analyses, CFA and exploratory factor analysis, EFA), item-total correlations, local independence, precision, and differential item functioning (DIF) across gender, race, ethnicity, age groups, data collection modes, and neurological chronic conditions (McFadden Pseudo R2 less than 10 %). Results The item pool met two of the four CFA fit criteria (CFI = 0.952 and SRMR = 0.07). EFA analysis found a dominant first factor (eigenvalue = 24.34) and the ratio of first to second eigenvalue was 12.4. The item pool demonstrated good item-total correlations (0.59–0.85) and acceptable internal consistency (Cronbach’s alpha = 0.97). The item pool maintained its precision (reliability over 0.90) across a wide range of theta (3.70), and there was no significant DIF. Conclusion The findings indicated the item pool has sound psychometric properties and the test items are eligible for development of computerized adaptive testing and short forms. PMID:27048495
Assessment of the psychometrics of a PROMIS item bank: self-efficacy for managing daily activities.
Hong, Ickpyo; Velozo, Craig A; Li, Chih-Ying; Romero, Sergio; Gruber-Baldini, Ann L; Shulman, Lisa M
2016-09-01
The aim of this study is to investigate the psychometrics of the Patient-Reported Outcomes Measurement Information System self-efficacy for managing daily activities item bank. The item pool was field tested on a sample of 1087 participants via internet (n = 250) and in-clinic (n = 837) surveys. All participants reported having at least one chronic health condition. The 35 item pool was investigated for dimensionality (confirmatory factor analyses, CFA and exploratory factor analysis, EFA), item-total correlations, local independence, precision, and differential item functioning (DIF) across gender, race, ethnicity, age groups, data collection modes, and neurological chronic conditions (McFadden Pseudo R (2) less than 10 %). The item pool met two of the four CFA fit criteria (CFI = 0.952 and SRMR = 0.07). EFA analysis found a dominant first factor (eigenvalue = 24.34) and the ratio of first to second eigenvalue was 12.4. The item pool demonstrated good item-total correlations (0.59-0.85) and acceptable internal consistency (Cronbach's alpha = 0.97). The item pool maintained its precision (reliability over 0.90) across a wide range of theta (3.70), and there was no significant DIF. The findings indicated the item pool has sound psychometric properties and the test items are eligible for development of computerized adaptive testing and short forms.
Development of the PROMIS positive emotional and sensory expectancies of smoking item banks.
Tucker, Joan S; Shadel, William G; Edelen, Maria Orlando; Stucky, Brian D; Li, Zhen; Hansen, Mark; Cai, Li
2014-09-01
The positive emotional and sensory expectancies of cigarette smoking include improved cognitive abilities, positive affective states, and pleasurable sensorimotor sensations. This paper describes development of Positive Emotional and Sensory Expectancies of Smoking item banks that will serve to standardize the assessment of this construct among daily and nondaily cigarette smokers. Data came from daily (N = 4,201) and nondaily (N =1,183) smokers who completed an online survey. To identify a unidimensional set of items, we conducted item factor analyses, item response theory analyses, and differential item functioning analyses. Additionally, we evaluated the performance of fixed-item short forms (SFs) and computer adaptive tests (CATs) to efficiently assess the construct. Eighteen items were included in the item banks (15 common across daily and nondaily smokers, 1 unique to daily, 2 unique to nondaily). The item banks are strongly unidimensional, highly reliable (reliability = 0.95 for both), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.86). Results from simulated CATs indicated that, on average, less than 8 items are needed to assess the construct with adequate precision using the item banks. These analyses identified a new set of items that can assess the positive emotional and sensory expectancies of smoking in a reliable and standardized manner. Considerable efficiency in assessing this construct can be achieved by using the item bank SF, employing computer adaptive tests, or selecting subsets of items tailored to specific research or clinical purposes. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Linking Existing Instruments to Develop an Activity of Daily Living Item Bank.
Li, Chih-Ying; Romero, Sergio; Bonilha, Heather S; Simpson, Kit N; Simpson, Annie N; Hong, Ickpyo; Velozo, Craig A
2018-03-01
This study examined dimensionality and item-level psychometric properties of an item bank measuring activities of daily living (ADL) across inpatient rehabilitation facilities and community living centers. Common person equating method was used in the retrospective veterans data set. This study examined dimensionality, model fit, local independence, and monotonicity using factor analyses and fit statistics, principal component analysis (PCA), and differential item functioning (DIF) using Rasch analysis. Following the elimination of invalid data, 371 veterans who completed both the Functional Independence Measure (FIM) and minimum data set (MDS) within 6 days were retained. The FIM-MDS item bank demonstrated good internal consistency (Cronbach's α = .98) and met three rating scale diagnostic criteria and three of the four model fit statistics (comparative fit index/Tucker-Lewis index = 0.98, root mean square error of approximation = 0.14, and standardized root mean residual = 0.07). PCA of Rasch residuals showed the item bank explained 94.2% variance. The item bank covered the range of θ from -1.50 to 1.26 (item), -3.57 to 4.21 (person) with person strata of 6.3. The findings indicated the ADL physical function item bank constructed from FIM and MDS measured a single latent trait with overall acceptable item-level psychometric properties, suggesting that it is an appropriate source for developing efficient test forms such as short forms and computerized adaptive tests.
Methodology for Developing and Evaluating the PROMIS® Smoking Item Banks
Cai, Li; Stucky, Brian D.; Tucker, Joan S.; Shadel, William G.; Edelen, Maria Orlando
2014-01-01
Introduction: This article describes the procedures used in the PROMIS® Smoking Initiative for the development and evaluation of item banks, short forms (SFs), and computerized adaptive tests (CATs) for the assessment of 6 constructs related to cigarette smoking: nicotine dependence, coping expectancies, emotional and sensory expectancies, health expectancies, psychosocial expectancies, and social motivations for smoking. Methods: Analyses were conducted using response data from a large national sample of smokers. Items related to each construct were subjected to extensive item factor analyses and evaluation of differential item functioning (DIF). Final item banks were calibrated, and SF assessments were developed for each construct. The performance of the SFs and the potential use of the item banks for CAT administration were examined through simulation study. Results: Item selection based on dimensionality assessment and DIF analyses produced item banks that were essentially unidimensional in structure and free of bias. Simulation studies demonstrated that the constructs could be accurately measured with a relatively small number of carefully selected items, either through fixed SFs or CAT-based assessment. Illustrative results are presented, and subsequent articles provide detailed discussion of each item bank in turn. Conclusions: The development of the PROMIS smoking item banks provides researchers with new tools for measuring smoking-related constructs. The use of the calibrated item banks and suggested SF assessments will enhance the quality of score estimates, thus advancing smoking research. Moreover, the methods used in the current study, including innovative approaches to item selection and SF construction, may have general relevance to item bank development and evaluation. PMID:23943843
Evaluation of item candidates for a diabetic retinopathy quality of life item bank.
Fenwick, Eva K; Pesudovs, Konrad; Khadka, Jyoti; Rees, Gwyn; Wong, Tien Y; Lamoureux, Ecosse L
2013-09-01
We are developing an item bank assessing the impact of diabetic retinopathy (DR) on quality of life (QoL) using a rigorous multi-staged process combining qualitative and quantitative methods. We describe here the first two qualitative phases: content development and item evaluation. After a comprehensive literature review, items were generated from four sources: (1) 34 previously validated patient-reported outcome measures; (2) five published qualitative articles; (3) eight focus groups and 18 semi-structured interviews with 57 DR patients; and (4) seven semi-structured interviews with diabetes or ophthalmic experts. Items were then evaluated during 3 stages, namely binning (grouping) and winnowing (reduction) based on key criteria and panel consensus; development of item stems and response options; and pre-testing of items via cognitive interviews with patients. The content development phase yielded 1,165 unique items across 7 QoL domains. After 3 sessions of binning and winnowing, items were reduced to a minimally representative set (n = 312) across 9 domains of QoL: visual symptoms; ocular surface symptoms; activity limitation; mobility; emotional; health concerns; social; convenience; and economic. After 8 cognitive interviews, 42 items were amended resulting in a final set of 314 items. We have employed a systematic approach to develop items for a DR-specific QoL item bank. The psychometric properties of the nine QoL subscales will be assessed using Rasch analysis. The resulting validated item bank will allow clinicians and researchers to better understand the QoL impact of DR and DR therapies from the patient's perspective.
Jelínek, Martin; Květon, Petr; Vobořil, Dalibor
2015-02-01
Despite initial expectations, which have emerged with the advancement of computer technology over the last decade of the twentieth century, scientific literature does not contain many relevant references regarding the development and use of innovative items in psychological testing. Our study presents and evaluates two novel item types. One item type is derived from a standard schematic test item used for the assessment of the spatial perception aspect of spatial ability, enhanced by an interactive response module. The performance on this item type is correlated with the performance on its paper and pencil counterpart. The other innovative item type used complex stimuli in the form of a short video of a ride through a city presented in an on-route perspective, which is intended to measure navigation skills and the ability to keep oneself oriented in space. In this case, the scores were related to the capacity of visuo-spatial working memory and also to the overall score in the paper/pencil test of spatial ability. The second relationship was moderated by gender.
Promoting the hydrostatic conceptual change test (HCCT) with four-tier diagnostic test item
NASA Astrophysics Data System (ADS)
Purwanto, M. G.; Nurliani, R.; Kaniawati, I.; Samsudin, A.
2018-05-01
Hydrostatic Conceptual Change Test (HCCT) is a diagnostic test instrument to identify students’ conception on Hydrostatic field. It is very important to support the learning process in the classroom. Based on that point of view, the researcher decided to develop HCCT instrument test into four-tier test diagnostic items. The resolve of this research is planned as the first step of four-tier test-formatted HCCT development as one of investigative test instrument on Hydrostatic. The research method used the 4D model which has four comprehensive steps: 1) defining, 2) designing, 3) developing and 4) disseminating. The instrument developed has been tried to 30 students in one of senior high schools. The data showed that four-tier- test-formatted HCCT is able to identify student’s conception level of Hydrostatic. In conclusion, the development of four-tier test-formatted HCCT is one of potential diagnostic test instrument that able to classify the category of students who misconception, no understanding, understanding, partial understanding and no codeable about concept of Hydrostatic.
[Development and validation of the Korean patient safety culture scale for nursing homes].
Yoon, Sook Hee; Kim, Byungsoo; Kim, Se Young
2013-06-01
The purpose of this study was to develop a tool to evaluate patient safety culture in nursing homes and to test its validity and reliability. A preliminary tool was developed through interviews with focus group, content validity tests, and a pilot study. A nationwide survey was conducted from February to April, 2011, using self-report questionnaires. Participants were 982 employees in nursing homes. Data were analyzed using Cronbach's alpha, item analysis, factor analysis, and multitrait/multi-Item analysis. From the results of the analysis, 27 final items were selected from 49 items on the preliminary tool. Items with low correlation with total scale were excluded. The 4 factors sorted by factor analysis contributed 63.4% of the variance in the total scale. The factors were labeled as leadership, organizational system, working attitude, management practice. Cronbach's alpha for internal consistency was .95 and the range for the 4 factors was from .86 to .93. The results of this study indicate that the Korean Patient Safety Culture Scale has reliability and validity and is suitable for evaluation of patient safety culture in Korean nursing homes.
Development and validation of an asthma first aid knowledge questionnaire.
Luckie, Kate; Pang, Tsz Chun; Kritikos, Vicky; Saini, Bandana; Moles, Rebekah Jane
2018-05-01
There is no gold standard outcome assessment for asthma first-aid knowledge. We therefore aimed to develop and validate an asthma first-aid knowledge questionnaire (AFAKQ) to be used before and after educational interventions. The AFAKQ was developed based on a content analysis of existing asthma knowledge questionnaires and current asthma management guidelines. Content and face validity was performed by a review panel consisting of expert respiratory physicians, researchers and parents of school aged children. A 21 item questionnaire was then pilot tested among a sample of caregivers, health professionals and pharmacy students. Exploratory Factor analysis was performed to determine internal consistency. The initial 46 item version of the AFAKQ, was reduced to 21 items after revision by the expert panel. This was then pilot tested amongst 161 participants and further reduced to 14 items. The exploratory factor analysis revealed a parsimonious one factor solution with a Cronbach's Alpha of 0.77 with the 14 item AFAKQ. The AFAKQ is a valid tool ready for application in evaluating the impact of educational interventions on asthma first-aid knowledge. Copyright © 2017 Elsevier Inc. All rights reserved.
Perez, Samara; Shapiro, Gilla K; Tatar, Ovidiu; Joyal-Desmarais, Keven; Rosberger, Zeev
2016-10-01
Parents' human papillomavirus (HPV) vaccination decision-making is strongly influenced by their attitudes and beliefs toward vaccination. To date, psychometrically evaluated HPV vaccination attitudes scales have been narrow in their range of measured beliefs and often limited to attitudes surrounding female HPV vaccination. The study aimed to develop a comprehensive, validated and reliable HPV vaccination attitudes and beliefs scale among parents of boys. Data were collected from Canadian parents of 9- to 16-year-old boys using an online questionnaire completed in 2 waves with a 7-month interval. Based on existing vaccination attitudes scales, a set of 61 attitude and belief items were developed. Exploratory and confirmatory factor analyses were conducted. Internal consistency was evaluated with Cronbach's α and stability over time with intraclass correlations. The HPV Attitudes and Beliefs Scale (HABS) was informed by 3117 responses at time 1 and 1427 at time 2. The HABS contains 46 items organized in 9 factors: Benefits (10 items), Threat (3 items), Influence (8 items), Harms (6 items), Risk (3 items), Affordability (3 items), Communication (5 items), Accessibility (4 items), and General Vaccination Attitudes (4 items). Model fit at time 2 were: χ/df = 3.13, standardized root mean square residual = 0.056, root mean square error approximation (confidence interval) = 0.039 (0.037-0.04), comparative fit index = 0.962 and Tucker-Lewis index = 0.957. Cronbach's αs were greater than 0.8 and intraclass correlations of factors were greater than 0.6. The HABS is the first psychometrically-tested scale of HPV attitude and beliefs among parents of boys available for use in English and French. Further testing among parents of girls and young adults and assessing predictive validity are warranted.
48 CFR Appendix F to Chapter 2 - Material Inspection and Receiving Report
Code of Federal Regulations, 2014 CFR
2014-10-01
... items are maintenance, repair, alteration, rehabilitation, engineering, research, development, training... slashes. Show the descriptive noun of the item nomenclature and if provided, the Government assigned..., engineering, research, development, training, and testing. Do not complete Blocks 4, 13, and 14 when there is...
Morris, Scott; Bass, Mike; Lee, Mirinae; Neapolitan, Richard E
2017-09-01
The Patient Reported Outcomes Measurement Information System (PROMIS) initiative developed an array of patient reported outcome (PRO) measures. To reduce the number of questions administered, PROMIS utilizes unidimensional item response theory and unidimensional computer adaptive testing (UCAT), which means a separate set of questions is administered for each measured trait. Multidimensional item response theory (MIRT) and multidimensional computer adaptive testing (MCAT) simultaneously assess correlated traits. The objective was to investigate the extent to which MCAT reduces patient burden relative to UCAT in the case of PROs. One MIRT and 3 unidimensional item response theory models were developed using the related traits anxiety, depression, and anger. Using these models, MCAT and UCAT performance was compared with simulated individuals. Surprisingly, the root mean squared error for both methods increased with the number of items. These results were driven by large errors for individuals with low trait levels. A second analysis focused on individuals aligned with item content. For these individuals, both MCAT and UCAT accuracies improved with additional items. Furthermore, MCAT reduced the test length by 50%. For the PROMIS Emotional Distress banks, neither UCAT nor MCAT provided accurate estimates for individuals at low trait levels. Because the items in these banks were designed to detect clinical levels of distress, there is little information for individuals with low trait values. However, trait estimates for individuals targeted by the banks were accurate and MCAT asked substantially fewer questions. By reducing the number of items administered, MCAT can allow clinicians and researchers to assess a wider range of PROs with less patient burden. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Development and validation of a nutrition knowledge questionnaire for a Canadian population.
Bradette-Laplante, Maude; Carbonneau, Élise; Provencher, Véronique; Bégin, Catherine; Robitaille, Julie; Desroches, Sophie; Vohl, Marie-Claude; Corneau, Louise; Lemieux, Simone
2017-05-01
The present study aimed to develop and validate a nutrition knowledge questionnaire in a sample of French Canadians from the province of Quebec, taking into account dietary guidelines. A thirty-eight-item questionnaire was developed by the research team and evaluated for content validity by an expert panel, and then administered to respondents. Face validity and construct validity were measured in a pre-test. Exploratory factor analysis and covariance structure analysis were performed to verify the structure of the questionnaire and identify problematic items. Internal consistency and test-retest reliability were evaluated through a validation study. Online survey. Six nutrition and psychology experts, fifteen registered dietitians (RD) and 180 lay people participated. Content validity evaluation resulted in the removal of two items and reformulation of one item. Following face validity, one item was reformulated. Construct validity was found to be adequate, with higher scores for RD v. non-RD (21·5 (sd 2·1) v. 15·7 (sd 3·0) out of 24, P<0·001). Exploratory factor analysis revealed that the questionnaire contained only one factor. Covariance structure analysis led to removal of sixteen items. Internal consistency for the overall questionnaire was adequate (Cronbach's α=0·73). Assessment of test-retest reliability resulted in significant associations for the total knowledge score (r=0·59, P<0·001). This nutrition knowledge questionnaire was found to be a suitable instrument which can be used to measure levels of nutrition knowledge in a Canadian population. It could also serve as a model for the development of similar instruments in other populations.
Development and validation of the Cancer Exercise Stereotypes Scale.
Falzon, Charlène; Sabiston, Catherine; Bergamaschi, Alessandro; Corrion, Karine; Chalabaev, Aïna; D'Arripe-Longueville, Fabienne
2014-01-01
The objective of this study was to develop and validate a French-language questionnaire measuring stereotypes related to exercise in cancer patients: The Cancer Exercise Stereotypes Scale (CESS). Four successive steps were carried out with 806 participants. First, a preliminary version was developed on the basis of the relevant literature and qualitative interviews. A test of clarity then led to the reformulation of six of the 30 items. Second, based on the modification indices of the first confirmatory factorial analysis, 11 of the 30 initial items were deleted. A new factorial structure analysis showed a good fit and validated a 19-item instrument with five subscales. Third, the stability of the instrument was tested over time. Last, tests of construct validity were conducted to examine convergent validity and discriminant validity. The French-language CESS appears to have good psychometric qualities and can be used to test theoretical tenets and inform intervention strategies on ways to foster exercise in cancer patients.
Learning to Think Spatially: What Do Students "See" in Numeracy Test Items?
ERIC Educational Resources Information Center
Diezmann, Carmel M.; Lowrie, Tom
2012-01-01
Learning to think spatially in mathematics involves developing proficiency with graphics. This paper reports on 2 investigations of spatial thinking and graphics. The first investigation explored the importance of graphics as 1 of 3 communication systems (i.e. text, symbols, graphics) used to provide information in numeracy test items. The results…
Emotional Intelligence in Applicant Selection for Care-Related Academic Programs
ERIC Educational Resources Information Center
Zysberg, Leehu; Levy, Anat; Zisberg, Anna
2011-01-01
Two studies describe the development of the Audiovisual Test of Emotional Intelligence (AVEI), aimed at candidate selection in educational settings. Study I depicts the construction of the test and the preliminary examination of its psychometric properties in a sample of 92 college students. Item analysis allowed the modification of problem items,…
Interpretation of the Rasch Ability and Difficulty Scales for Educational Purposes.
ERIC Educational Resources Information Center
Woodcock, Richard W.
Though many test developers have utilized item response theory in their work, few have taken advantage of the potential of item response theory for providing new interpretation procedures that accentuate the educational implications to be drawn from test scores. This paper describes several features, based upon the Rasch difficulty and ability…
Expectations for Visual Function: An Initial Evaluation of a New Clinical Instrument.
ERIC Educational Resources Information Center
Corn, Anne L.; Webne, Steve L.
2001-01-01
A study explored the internal consistency of items in a visual screening instrument developed by Project PAVE: Expectations for Visual Functioning (EVF). The test includes 20 items that evaluate a child's functional use of vision. A pilot test involving 129 teachers indicates the EFV is internally consistent. (Contains three references.) (CR)
ERIC Educational Resources Information Center
Choi, Seung W.; Podrabsky, Tracy; McKinney, Natalie
2012-01-01
Computerized adaptive testing (CAT) enables efficient and flexible measurement of latent constructs. The majority of educational and cognitive measurement constructs are based on dichotomous item response theory (IRT) models. An integral part of developing various components of a CAT system is conducting simulations using both known and empirical…
Outlier Detection in High-Stakes Certification Testing. Research Report.
ERIC Educational Resources Information Center
Meijer, Rob R.
Recent developments of person-fit analysis in computerized adaptive testing (CAT) are discussed. Methods from statistical process control are presented that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory (IRT) model in a CAT. Most person-fit research in CAT is restricted to…
Dual-Objective Item Selection Criteria in Cognitive Diagnostic Computerized Adaptive Testing
ERIC Educational Resources Information Center
Kang, Hyeon-Ah; Zhang, Susu; Chang, Hua-Hua
2017-01-01
The development of cognitive diagnostic-computerized adaptive testing (CD-CAT) has provided a new perspective for gaining information about examinees' mastery on a set of cognitive attributes. This study proposes a new item selection method within the framework of dual-objective CD-CAT that simultaneously addresses examinees' attribute mastery…
Kertesz, Stefan. G.; Pollio, David E.; Jones, Richard N.; Steward, Jocelyn; Stringfellow, Erin J.; Gordon, Adam J.; Johnson, Nancy K.; Kim, Theresa A.; Granstaff, Unita; Austin, Erika L.; Young, Alexander S.; Golden, Joya; Davis, Lori L.; Roth, David L.; Holt, Cheryl L.
2015-01-01
Background Homeless patients face unique challenges in obtaining primary care responsive to their needs and context. Patient experience questionnaires could permit assessment of patient-centered medical homes for this population, but standard instruments may not reflect homeless patients' priorities and concerns. Objectives This report describes (a) the content and psychometric properties of a new primary care questionnaire for homeless patients and (b) the methods utilized in its development. Methods Starting with quality-related constructs from the Institute of Medicine, we identified relevant themes by interviewing homeless patients and experts in their care. A multidisciplinary team drafted a preliminary set of 78 items. This was administered to homeless-experienced clients (n=563) across 3 VA facilities and 1 non-VA Health Care for the Homeless Program. Using Item Response Theory, we examined Test Information Function curves to eliminate less informative items and devise plausibly distinct subscales. Results The resulting 33-item instrument (Primary Care Quality-Homeless, PCQ-H) has four subscales: Patient-Clinician Relationship (15 items), Cooperation among Clinicians (3 items), Access/Coordination (11 items) and Homeless-Specific Needs (4 items). Evidence for divergent and convergent validity is provided. Test Information Function (TIF) graphs showed adequate informational value to permit inferences about groups for 3 subscales (Relationship, Cooperation and Access/Coordination). The 3-item Cooperation subscale had lower informational value (TIF<5) but had good internal consistency (alpha=0.75) and patients frequently reported problems in this aspect of care. Conclusions Systematic application of qualitative and quantitative methods supported the development of a brief patient-reported questionnaire focused on the primary care of homeless patients and offers guidance for future population-specific instrument development. PMID:25023918
Sandilos, Lia E.; Lewis, Kandia; Komaroff, Eugene; Hammer, Carol Scheffner; Scarpino, Shelley E.; Lopez, Lisa; Rodriguez, Barbara; Goldstein, Brian
2015-01-01
The purpose of this study was to investigate the way in which items on the Woodcock-Muñoz Language Survey Revised (WMLS-R) Spanish and English versions function for bilingual children from different ethnic subgroups who speak different dialects of Spanish. Using data from a sample of 324 bilingual Hispanic families and their children living on the United States mainland, differential item functioning (DIF) was conducted to determine if test items in English and Spanish functioned differently for Mexican, Cuban, and Puerto Rican bilingual children. Data on child and parent language characteristics and children’s scores on Picture Vocabulary and Story Recall subtests in English and Spanish were collected. DIF was not detected for items on the Spanish subtests. Results revealed that some items on English subtests displayed statistically and practically significant DIF. The findings indicate that there are differences in the difficulty level of WMLS-R English-form test items depending on the examinees’ ethnic subgroup membership. This outcome suggests that test developers need to be mindful of potential differences in performance based on ethnic subgroup and dialect when developing standardized language assessments that may be administered to bilingual students. PMID:26705400
ERIC Educational Resources Information Center
Attali, Yigal; Powers, Don; Freedman, Marshall; Harrison, Marissa; Obetz, Susan
2008-01-01
This report describes the development, administration, and scoring of open-ended variants of GRE® Subject Test items in biology and psychology. These questions were administered in a Web-based experiment to registered examinees of the respective Subject Tests. The questions required a short answer of 1-3 sentences, and responses were automatically…
Jo, Min-Woo; Lee, Hyeon-Jeong; Kim, Soo Young; Kim, Seon-Ha; Chang, Hyejung; Ahn, Jeonghoon; Ock, Minsu
2017-01-01
Few attempts have been made to develop a generic health-related quality of life (HRQoL) instrument and to examine its validity and reliability in Korea. We aimed to do this in our present study. After a literature review of existing generic HRQoL instruments, a focus group discussion, in-depth interviews, and expert consultations, we selected 30 tentative items for a new HRQoL measure. These items were evaluated by assessing their ceiling effects, difficulty, and redundancy in the first survey. To validate the HRQoL instrument that was developed, known-groups validity and convergent/discriminant validity were evaluated and its test-retest reliability was examined in the second survey. Of the 30 items originally assessed for the HRQoL instrument, four were excluded due to high ceiling effects and six were removed due to redundancy. We ultimately developed a HRQoL instrument with a reduced number of 20 items, known as the Health-related Quality of Life Instrument with 20 items (HINT-20), incorporating physical, mental, social, and positive health dimensions. The results of the HINT-20 for known-groups validity were poorer in women, the elderly, and those with a low income. For convergent/discriminant validity, the correlation coefficients of items (except vitality) in the physical health dimension with the physical component summary of the Short Form 36 version 2 (SF-36v2) were generally higher than the correlations of those items with the mental component summary of the SF-36v2, and vice versa. Regarding test-retest reliability, the intraclass correlation coefficient of the total HINT-20 score was 0.813 (p<0.001). A novel generic HRQoL instrument, the HINT-20, was developed for the Korean general population and showed acceptable validity and reliability.
Development of a refractive error quality of life scale for Thai adults (the REQ-Thai).
Sukhawarn, Roongthip; Wiratchai, Nonglak; Tatsanavivat, Pyatat; Pitiyanuwat, Somwung; Kanato, Manop; Srivannaboon, Sabong; Guyatt, Gordon H
2011-08-01
To develop a scale for measuring refractive error quality of life (QOL) for Thai adults. The full survey comprised 424 respondents from 5 medical centers in Bangkok and from 3 medical centers in Chiangmai, Songkla and KhonKaen provinces. Participants were emmetropes and persons with refractive correction with visual acuity of 20/30 or better An item reduction process was employed by combining 3 methods-expert opinion, impact method and item-total correlation methods. The classical reliability testing and the validity testing including convergent, discriminative and construct validity was performed. The developed questionnaire comprised 87 items in 6 dimensions: 1) quality of vision, 2) visual function, 3) social function, 4) psychological function, 5) symptoms and 6) refractive correction problems. It is the 5-level Likert scale type. The Cronbach's Alpha coefficients of its dimensions ranged from 0.756 to 0. 979. All validity testing were shown to be valid. The construct validity was validated by the confirmatory factor analysis. A short version questionnaire comprised 48 items with good reliability and validity was also developed. This is the first validated instrument for measuring refractive error quality of life for Thai adults that was developed with strong research methodology and large sample size.
Karnoe, Astrid; Furstrand, Dorthe; Batterham, Roy; Christensen, Karl Bang; Elsworth, Gerald; Osborne, Richard H
2018-01-01
Background For people to be able to access, understand, and benefit from the increasing digitalization of health services, it is critical that services are provided in a way that meets the user’s needs, resources, and competence. Objective The objective of the study was to develop a questionnaire that captures the 7-dimensional eHealth Literacy Framework (eHLF). Methods Draft items were created in parallel in English and Danish. The items were generated from 450 statements collected during the conceptual development of eHLF. In all, 57 items (7 to 9 items per scale) were generated and adjusted after cognitive testing. Items were tested in 475 people recruited from settings in which the scale was intended to be used (community and health care settings) and including people with a range of chronic conditions. Measurement properties were assessed using approaches from item response theory (IRT) and classical test theory (CTT) such as confirmatory factor analysis (CFA) and reliability using composite scale reliability (CSR); potential bias due to age and sex was evaluated using differential item functioning (DIF). Results CFA confirmed the presence of the 7 a priori dimensions of eHLF. Following item analysis, a 35-item 7-scale questionnaire was constructed, covering (1) using technology to process health information (5 items, CSR=.84), (2) understanding of health concepts and language (5 items, CSR=.75), (3) ability to actively engage with digital services (5 items, CSR=.86), (4) feel safe and in control (5 items, CSR=.87), (5) motivated to engage with digital services (5 items, CSR=.84), (6) access to digital services that work (6 items, CSR=.77), and (7) digital services that suit individual needs (4 items, CSR=.85). A 7-factor CFA model, using small-variance priors for cross-loadings and residual correlations, had a satisfactory fit (posterior productive P value: .27, 95% CI for the difference between the observed and replicated chi-square values: −63.7 to 133.8). The CFA showed that all items loaded strongly on their respective factors. The IRT analysis showed that no items were found to have disordered thresholds. For most scales, discriminant validity was acceptable; however, 2 pairs of dimensions were highly correlated; dimensions 1 and 5 (r=.95), and dimensions 6 and 7 (r=.96). All dimensions were retained because of strong content differentiation and potential causal relationships between these dimensions. There is no evidence of DIF. Conclusions The eHealth Literacy Questionnaire (eHLQ) is a multidimensional tool based on a well-defined a priori eHLF framework with robust properties. It has satisfactory evidence of construct validity and reliable measurement across a broad range of concepts (using both CTT and IRT traditions) in various groups. It is designed to be used to understand and evaluate people’s interaction with digital health services. PMID:29434011
The Nature of Objectivity with the Rasch Model.
ERIC Educational Resources Information Center
Whitely, Susan E.; Dawis, Rene V.
Although it has been claimed that the Rasch model leads to a higher degree of objectivity in measurement than has been previously possible, this model has had little impact on test development. Population-invariant item and ability calibrations along with the statistical equivalency of any two item subsets are supposedly possible if the item pool…
Peterson, Alexander C; Sutherland, Jason M; Liu, Guiping; Crump, R Trafford; Karimuddin, Ahmer A
2018-06-01
The Fecal Incontinence Quality of Life Scale (FIQL) is a commonly used patient-reported outcome measure for fecal incontinence, often used in clinical trials, yet has not been validated in English since its initial development. This study uses modern methods to thoroughly evaluate the psychometric characteristics of the FIQL and its potential for differential functioning by gender. This study analyzed prospectively collected patient-reported outcome data from a sample of patients prior to colorectal surgery. Patients were recruited from 14 general and colorectal surgeons in Vancouver Coastal Health hospitals in Vancouver, Canada. Confirmatory factor analysis was used to assess construct validity. Item response theory was used to evaluate test reliability, describe item-level characteristics, identify local item dependence, and test for differential functioning by gender. 236 patients were included for analysis, with mean age 58 and approximately half female. Factor analysis failed to identify the lifestyle, coping, depression, and embarrassment domains, suggesting lack of construct validity. Items demonstrated low difficulty, indicating that the test has the highest reliability among individuals who have low quality of life. Five items are suggested for removal or replacement. Differential test functioning was minimal. This study has identified specific improvements that can be made to each domain of the Fecal Incontinence Quality of Life Scale and to the instrument overall. Formatting, scoring, and instructions may be simplified, and items with higher difficulty developed. The lifestyle domain can be used as is. The embarrassment domain should be significantly revised before use.
Chang, Li-Chun; Chen, Yu-Chi; Liao, Li-Ling; Wu, Fei Ling; Hsieh, Pei-Lin; Chen, Hsiao-Jung
2017-01-01
The study aimed to illustrate the constructs and test the psychometric properties of an instrument of health literacy competencies (IOHLC) for health professionals. A multi-phase questionnaire development method was used to develop the scale. The categorization of the knowledge and practice domains achieved consensus through a modified Delphi process. To reduce the number of items, the 92-item IOHLC was psychometrically evaluated through internal consistency, Rasch modeling, and two-stage factor analysis. In total, 736 practitioners, including nurses, nurse practitioners, health educators, case managers, and dieticians completed the 92-item IOHLC online from May 2012 to January 2013. The final version of the IOHLC covered 9 knowledge items and 40 skill items containing 9 dimensions, with good model fit, and explaining 72% of total variance. All domains had acceptable internal consistency and discriminant validity. The tool in this study is the first to verify health literacy competencies rigorously. Moreover, through psychometric testing, the 49-item IOHLC demonstrates adequate reliability and validity. The IOHLC may serve as a reference for the theoretical and in-service training of Chinese-speaking individuals' health literacy competencies.
Design Patterns for Digital Item Types in Higher Education
ERIC Educational Resources Information Center
Draaijer, S.; Hartog, R. J. M.
2007-01-01
A set of design patterns for digital item types has been developed in response to challenges identified in various projects by teachers in higher education. The goal of the projects in question was to design and develop formative and summative tests, and to develop interactive learning material in the form of quizzes. The subject domains involved…
Vilaro, Melissa J; Zhou, Wenjun; Colby, Sarah E; Byrd-Bredbenner, Carol; Riggsbee, Kristin; Olfert, Melissa D; Barnett, Tracey E; Mathews, Anne E
2017-12-01
Understanding factors that influence food choice may help improve diet quality. Factors that commonly affect adults' food choices have been described, but measures that identify and assess food choice factors specific to college students are lacking. This study developed and tested the Food Choice Priorities Survey (FCPS) among college students. Thirty-seven undergraduates participated in two focus groups ( n = 19; 11 in the male-only group, 8 in the female-only group) and interviews ( n = 18) regarding typical influences on food choice. Qualitative data informed the development of survey items with a 5-point Likert-type scale (1 = not important, 5 = extremely important). An expert panel rated FCPS items for clarity, relevance, representativeness, and coverage using a content validity form. To establish test-retest reliability, 109 first-year college students completed the 14-item FCPS at two time points, 0-48 days apart ( M = 13.99, SD = 7.44). Using Cohen's weighted κ for responses within 20 days, 11 items demonstrated moderate agreement and 3 items had substantial agreement. Factor analysis revealed a three-factor structure (9 items). The FCPS is designed for college students and provides a way to determine the factors of greatest importance regarding food choices among this population. From a public health perspective, practical applications include using the FCPS to tailor health communications and behavior change interventions to factors most salient for food choices of college students.
Development of a Culturally Valid Counselor Burnout Inventory for Korean Counselors
ERIC Educational Resources Information Center
Yu, Kumlan; Lee, Sang Min; Nesbit, Elisabeth A.
2008-01-01
This article describes the development of the culturally valid Counselor Burnout Inventory. A multistage approach including item translation; item refinement; and evaluation of factorial validity, reliability, and score validity was used to test constructs and validation. Implications for practice and future research are discussed. (Contains 3…
Development of autonomous grasping and navigating robot
NASA Astrophysics Data System (ADS)
Kudoh, Hiroyuki; Fujimoto, Keisuke; Nakayama, Yasuichi
2015-01-01
The ability to find and grasp target items in an unknown environment is important for working robots. We developed an autonomous navigating and grasping robot. The operations are locating a requested item, moving to where the item is placed, finding the item on a shelf or table, and picking the item up from the shelf or the table. To achieve these operations, we designed the robot with three functions: an autonomous navigating function that generates a map and a route in an unknown environment, an item position recognizing function, and a grasping function. We tested this robot in an unknown environment. It achieved a series of operations: moving to a destination, recognizing the positions of items on a shelf, picking up an item, placing it on a cart with its hand, and returning to the starting location. The results of this experiment show the applicability of reducing the workforce with robots.
Development and Initial Validation of Military Deployment-Related TBI Quality-of-Life Item Banks.
Toyinbo, Peter A; Vanderploeg, Rodney D; Donnell, Alison J; Mutolo, Sandra A; Cook, Karon F; Kisala, Pamela A; Tulsky, David S
2016-01-01
To investigate unique factors that affect health-related quality of life (QOL) in individuals with military deployment-related traumatic brain injury (MDR-TBI) and to develop appropriate assessment tools, consistent with the TBI-QOL/PROMIS/Neuro-QOL systems. Three focus groups from each of the 4 Veterans Administration (VA) Polytrauma Rehabilitation Centers, consisting of 20 veterans with mild to severe MDR-TBI, and 36 VA providers were involved in early stage of new item banks development. The item banks were field tested in a sample (N = 485) of veterans enrolled in VA and diagnosed with an MDR-TBI. Focus groups and survey. Developed item banks and short forms for Guilt, Posttraumatic Stress Disorder/Trauma, and Military-Related Loss. Three new item banks representing unique domains of MDR-TBI health outcomes were created: 15 new Posttraumatic Stress Disorder items plus 16 SCI-QOL legacy Trauma items, 37 new Military-Related Loss items plus 18 TBI-QOL legacy Grief/Loss items, and 33 new Guilt items. Exploratory and confirmatory factor analyses plus bifactor analysis of the items supported sufficient unidimensionality of the new item pools. Convergent and discriminant analyses results, as well as known group comparisons, provided initial support for the validity and clinical utility of the new item response theory-calibrated item banks and their short forms. This work provides a unique opportunity to identify issues specific to individuals with MDR-TBI and ensure that they are captured in QOL assessment, thus extending the existing TBI-QOL measurement system.
Poghosyan, Lusine; Nannini, Angela; Finkelstein, Stacey R; Mason, Emanuel; Shaffer, Jonathan A
2013-01-01
Policy makers and healthcare organizations are calling for expansion of the nurse practitioner (NP) workforce in primary care settings to assure timely access and high-quality care for the American public. However, many barriers, including those at the organizational level, exist that may undermine NP workforce expansion and their optimal utilization in primary care. This study developed a new NP-specific survey instrument, Nurse Practitioner Primary Care Organizational Climate Questionnaire (NP-PCOCQ), to measure organizational climate in primary care settings and conducted its psychometric testing. Using instrument development design, the organizational climate domain pertinent for primary care NPs was identified. Items were generated from the evidence and qualitative data. Face and content validity were established through two expert meetings. Content validity index was computed. The 86-item pool was reduced to 55 items, which was pilot tested with 81 NPs using mailed surveys and then field-tested with 278 NPs in New York State. SPSS 18 and Mplus software were used for item analysis, reliability testing, and maximum likelihood exploratory factor analysis. Nurse Practitioner Primary Care Organizational Climate Questionnaire had face and content validity. The content validity index was .90. Twenty-nine items loaded on four subscale factors: professional visibility, NP-administration relations, NP-physician relations, and independent practice and support. The subscales had high internal consistency reliability. Cronbach's alphas ranged from.87 to .95. Having a strong instrument is important to promote future research. Also, administrators can use it to assess organizational climate in their clinics and propose interventions to improve it, thus promoting NP practice and the expansion of NP workforce.
Identifying predictors of physics item difficulty: A linear regression approach
NASA Astrophysics Data System (ADS)
Mesic, Vanes; Muratovic, Hasnija
2011-06-01
Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge structures. Identified predictors point out the fundamental cognitive dimensions of student physics achievement at the end of compulsory education in Bosnia and Herzegovina, whose level of development influenced the test results within the conducted assessments.
de Pinho, Lucinéia; Moura, Paulo Henrique Tolentino; Silveira, Marise Fagundes; de Botelho, Ana Cristina Carvalho; Caldeira, Antônio Prates
2013-07-18
In light of its epidemic proportions in developed and developing countries, obesity is considered a serious public health issue. In order to increase knowledge concerning the ability of health care professionals in caring for obese adolescents and adopt more efficient preventive and control measures, a questionnaire was developed and validated to assess non-dietitian health professionals regarding their Knowledge of Nutrition in Obese Adolescents (KNOA). The development and evaluation of a questionnaire to assess the knowledge of primary care practitioners with respect to nutrition in obese adolescents was carried out in five phases, as follows: 1) definition of study dimensions 2) development of 42 questions and preliminary evaluation of the questionnaire by a panel of experts; 3) characterization and selection of primary care practitioners (35 dietitians and 265 non-dietitians) and measurement of questionnaire criteria by contrasting the responses of dietitians and non-dietitians; 4) reliability assessment by question exclusion based on item difficulty (too easy and too difficult for non-dietitian practitioners), item discrimination, internal consistency and reproducibility index determination; and 5) scoring the completed questionnaires. Dietitians obtained higher scores than non-dietitians (Mann-Whitney U test, P < 0.05), confirming the validity of the questionnaire criteria. Items were discriminated by correlating the score for each item with the total score, using a minimum of 0.2 as a correlation coefficient cutoff value. Item difficulty was controlled by excluding questions answered correctly by more than 90% of the non-dietitian subjects (too easy) or by less than 10% of them (too difficult). The final questionnaire contained 26 of the original 42 questions, increasing Cronbach's α value from 0.788 to 0.807. Test-retest agreement between respondents was classified as good to very good (Kappa test, >0.60). The KNOA questionnaire developed for primary care practitioners is a valid, consistent and suitable instrument that can be applied over time, making it a promising tool for developing and guiding public health policies.
Ability evaluation by binary tests: Problems, challenges & recent advances
NASA Astrophysics Data System (ADS)
Bashkansky, E.; Turetsky, V.
2016-11-01
Binary tests designed to measure abilities of objects under test (OUTs) are widely used in different fields of measurement theory and practice. The number of test items in such tests is usually very limited. The response to each test item provides only one bit of information per OUT. The problem of correct ability assessment is even more complicated, when the levels of difficulty of the test items are unknown beforehand. This fact makes the search for effective ways of planning and processing the results of such tests highly relevant. In recent years, there has been some progress in this direction, generated by both the development of computational tools and the emergence of new ideas. The latter are associated with the use of so-called “scale invariant item response models”. Together with maximum likelihood estimation (MLE) approach, they helped to solve some problems of engineering and proficiency testing. However, several issues related to the assessment of uncertainties, replications scheduling, the use of placebo, as well as evaluation of multidimensional abilities still present a challenge for researchers. The authors attempt to outline the ways to solve the above problems.
Walker, Gemma M; Carter, Tim; Aubeeluck, Aimee; Witchell, Miranda; Coad, Jane
2018-01-01
Introduction Currently, no standardised, evidence-based assessment tool for assessing immediate self-harm and suicide in acute paediatric inpatient settings exists. Aim The aim of this study is to develop and test the psychometric properties of an assessment tool that identifies immediate risk of self-harm and suicide in children and young people (10–19 years) in acute paediatric hospital settings. Methods and analysis Development phase: This phase involved a scoping review of the literature to identify and extract items from previously published suicide and self-harm risk assessment scales. Using a modified electronic Delphi approach, these items will then be rated according to their relevance for assessment of immediate suicide or self-harm risk by expert professionals. Inclusion of items will be determined by 65%–70% consensus between raters. Subsequently, a panel of expert members will convene to determine the face validity, appropriate phrasing, item order and response format for the finalised items. Psychometric testing phase: The finalised items will be tested for validity and reliability through a multicentre, psychometric evaluation. Psychometric testing will be undertaken to determine the following: internal consistency, inter-rater reliability, convergent, divergent validity and concurrent validity. Ethics and dissemination Ethical approval was provided by the National Health Service East Midlands—Derby Research Ethics Committee (17/EM/0347) and full governance clearance received by the Health Research Authority and local participating sites. Findings from this study will be disseminated to professionals and the public via peer-reviewed journal publications, popular social media and conference presentations. PMID:29654046
Cappelleri, Joseph C.; Lundy, J. Jason; Hays, Ron D.
2014-01-01
Introduction The U.S. Food and Drug Administration’s patient-reported outcome (PRO) guidance document defines content validity as “the extent to which the instrument measures the concept of interest” (FDA, 2009, p. 12). “Construct validity is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity” (Strauss & Smith, 2009, p. 7). Hence both qualitative and quantitative information are essential in evaluating the validity of measures. Methods We review classical test theory and item response theory approaches to evaluating PRO measures including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized “difficulty” (severity) order of items is represented by observed responses. Conclusion Classical test theory and item response theory can be useful in providing a quantitative assessment of items and scales during the content validity phase of patient-reported outcome measures. Depending on the particular type of measure and the specific circumstances, either one or both approaches should be considered to help maximize the content validity of PRO measures. PMID:24811753
Methodology for developing and evaluating the PROMIS smoking item banks.
Hansen, Mark; Cai, Li; Stucky, Brian D; Tucker, Joan S; Shadel, William G; Edelen, Maria Orlando
2014-09-01
This article describes the procedures used in the PROMIS Smoking Initiative for the development and evaluation of item banks, short forms (SFs), and computerized adaptive tests (CATs) for the assessment of 6 constructs related to cigarette smoking: nicotine dependence, coping expectancies, emotional and sensory expectancies, health expectancies, psychosocial expectancies, and social motivations for smoking. Analyses were conducted using response data from a large national sample of smokers. Items related to each construct were subjected to extensive item factor analyses and evaluation of differential item functioning (DIF). Final item banks were calibrated, and SF assessments were developed for each construct. The performance of the SFs and the potential use of the item banks for CAT administration were examined through simulation study. Item selection based on dimensionality assessment and DIF analyses produced item banks that were essentially unidimensional in structure and free of bias. Simulation studies demonstrated that the constructs could be accurately measured with a relatively small number of carefully selected items, either through fixed SFs or CAT-based assessment. Illustrative results are presented, and subsequent articles provide detailed discussion of each item bank in turn. The development of the PROMIS smoking item banks provides researchers with new tools for measuring smoking-related constructs. The use of the calibrated item banks and suggested SF assessments will enhance the quality of score estimates, thus advancing smoking research. Moreover, the methods used in the current study, including innovative approaches to item selection and SF construction, may have general relevance to item bank development and evaluation. © The Author 2013. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Chen, Pei-Hua
2017-05-01
This rejoinder responds to the commentary by van der Linden and Li entiled "Comment on Three-Element Item Selection Procedures for Multiple Forms Assembly: An Item Matching Approach" on the article "Three-Element Item Selection Procedures for Multiple Forms Assembly: An Item Matching Approach" by Chen. Van der Linden and Li made a strong statement calling for the cessation of test assembly heuristics development, and instead encouraged embracing mixed integer programming (MIP). This article points out the nondeterministic polynomial (NP)-hard nature of MIP problems and how solutions found using heuristics could be useful in an MIP context. Although van der Linden and Li provided several practical examples of test assembly supporting their view, the examples ignore the cases in which a slight change of constraints or item pool data might mean it would not be possible to obtain solutions as quickly as before. The article illustrates the use of heuristic solutions to improve both the performance of MIP solvers and the quality of solutions. Additional responses to the commentary by van der Linden and Li are included.
Wiklander, Maria; Brännström, Johanna; Svedhem, Veronica; Eriksson, Lars E
2015-11-19
Barriers to HIV testing experienced by individuals at risk for HIV can result in treatment delay and further transmission of the disease. Instruments to systematically measure barriers are scarce, but could contribute to improved strategies for HIV testing. Aims of this study were to develop and test a barriers to HIV testing scale in a Swedish context. An 18-item scale was developed, based on an existing scale with addition of six new items related to fear of the disease or negative consequences of being diagnosed as HIV-infected. Items were phrased as statements about potential barriers with a three-point response format representing not important, somewhat important, and very important. The scale was evaluated regarding missing values, floor and ceiling effects, exploratory factor analysis, and internal consistencies. The questionnaire was completed by 292 adults recently diagnosed with HIV infection, of whom 7 were excluded (≥9 items missing) and 285 were included (≥12 items completed) in the analyses. The participants were 18-70 years old (mean 40.5, SD 11.5), 39 % were females and 77 % born outside Sweden. Routes of transmission were heterosexual transmission 63 %, male to male sex 20 %, intravenous drug use 5 %, blood product/transfusion 2 %, and unknown 9 %. All scale items had <3 % missing values. The data was feasible for factor analysis (KMO = 0.92) and a four-factor solution was chosen, based on level of explained common variance (58.64 %) and interpretability of factor structure. The factors were interpreted as; personal consequences, structural barriers, social and economic security, and confidentiality. Ratings on the minimum level (suggested barrier not important) were common, resulting in substantial floor effects on the scales. The scales were internally consistent (Cronbach's α 0.78-0.91). This study gives preliminary evidence of the scale being feasible, reliable and valid to identify different types of barriers to HIV testing.
Validation of the HIV/AIDS Stigma Instrument - PLWA (HASI-P).
Holzemer, William L; Uys, Leana R; Chirwa, Maureen L; Greeff, Minrie; Makoae, Lucia N; Kohi, Thecla W; Dlamini, Priscilla S; Stewart, Anita L; Mullan, Joseph; Phetlhu, René D; Wantland, Dean; Durrheim, Kevin
2007-09-01
This article describes the development and testing of a quantitative measure of HIV/AIDS stigma as experienced by people living with HIV/AIDS. This instrument is designed to measure perceived stigma, create a baseline from which to measure changes in stigma over time, and track potential progress towards reducing stigma. It was developed in three phases from 2003-2006: generating items based on results of focus group discussions; pilot testing and reducing the original list of items; and validating the instrument. Data for all phases were collected from five African countries: Lesotho, Malawi, South Africa, Swaziland and Tanzania. The instrument was validated with a sample of 1,477 persons living with HIV/AIDS from all of the five countries. The sample had a mean age of 36.1 years and 74.1% was female. The participants reported they knew they were HIV positive for an average of 3.4 years and 46% of the sample was taking antiretroviral medications. A six factor solution with 33 items explained 60.72% of the variance. Scale alpha reliabilities were examined and items that did not contribute to scale reliability were dropped. The factors included: Verbal Abuse (8 items, alpha=0.886); Negative Self-Perception (5 items, alpha=0.906); Health Care Neglect (7 items, alpha=0.832); Social Isolation (5 items, alpha=0.890); Fear of Contagion (6 items, alpha=0.795); and Workplace Stigma (2 items, alpha=0.758). This article reports on the development and validation of a new measure of stigma, HIV/AIDS Stigma Instrument - PLWA (HASI-P) providing evidence that supports adequate content and construct validity, modest concurrent validity, and acceptable internal consistency reliability for each of the six subscales and total score. The scale is available is several African languages.
Validity and Reliability of General Nutrition Knowledge Questionnaire for Adults in Uganda
Bukenya, Richard; Ahmed, Abhiya; Andrade, Jeanette M.; Grigsby-Toussaint, Diana S.; Muyonga, John; Andrade, Juan E.
2017-01-01
This study sought to develop and validate a general nutrition knowledge questionnaire (GNKQ) for Ugandan adults. The initial draft consisted of 133 items on five constructs associated with nutrition knowledge; expert recommendations (16 items), food groups (70 items), selecting food (10 items), nutrition and disease relationship (23 items), and food fortification in Uganda (14 items). The questionnaire validity was evaluated in three studies. For the content validity (study 1), a panel of five content matter nutrition experts reviewed the GNKQ draft before and after face validity. For the face validity (study 2), head teachers and health workers (n = 27) completed the questionnaire before attending one of three focus groups to review the clarity of the items. For the construct and test-rest reliability (study 3), head teachers (n = 40) from private and public primary schools and nutrition (n = 52) and engineering (n = 49) students from Makerere University took the questionnaire twice (two weeks apart). Experts agreed (content validity index, CVI > 0.9; reliability, Gwet’s AC1 > 0.85) that all constructs were relevant to evaluate nutrition knowledge. After the focus groups, 29 items were identified as unclear, requiring major (n = 5) and minor (n = 24) reviews. The final questionnaire had acceptable internal consistency (Cronbach α > 0.95), test-retest reliability (r = 0.89), and differentiated (p < 0.001) nutrition knowledge scores between nutrition (67 ± 5) and engineering (39 ± 11) students. Only the construct on nutrition recommendations was unreliable (Cronbach α = 0.51, test-retest r = 0.55), which requires further optimization. The final questionnaire included topics on food groups (41 items), selecting food (2 items), nutrition and disease relationship (14 items), and food fortification in Uganda (22 items) and had good content, construct, and test-retest reliability to evaluate nutrition knowledge among Ugandan adults. PMID:28230779
The Pieper-Zulkowski pressure ulcer knowledge test.
Pieper, Barbara; Zulkowski, Karen
2014-09-01
To describe the development and initial testing of the Pieper-Zulkowski Pressure Ulcer Knowledge Test (PZ-PUKT). Cross-sectional, instrument testing. Hospital association pressure ulcer educational program conference. Pressure ulcer research and guidelines from the last 5 years were examined for test item content. The initial PZ-PUKT had 115 items; response options were "true," "false," and "don't know." Registered nurses (N = 108) were randomly divided into 2 groups to take either the 60 prevention/risk and staging items or the 55 wound description items. Analyses of these responses resulted in 72 items, which were administered in total to a second cohort of 98 nurses for reliability. Cronbach's α was .80 for the 72-item PZ-PUKT. Cronbach's α values for the subscales were as follows: staging, .67; wound description, .64; and prevention/risk, .56. The mean correct scores were as follows: total, 80%; prevention, 77%; staging, 86%; and wound description, 77%. Nurses with wound care certification scored significantly higher on the PZ-PUKT than did nurses with other clinical certifications or with nurses who lacked certification. The PZ-PUKT has updated content about pressure ulcer prevention/risk, staging, and wound description. Reliability values are highest for the total test. Further use of the instrument in diverse settings will add to reliability testing and may provide direction for determination of a passing cutoff score.
The Effects of Item Format and Cognitive Domain on Students' Science Performance in TIMSS 2011
NASA Astrophysics Data System (ADS)
Liou, Pey-Yan; Bulut, Okan
2017-12-01
The purpose of this study was to examine eighth-grade students' science performance in terms of two test design components, item format, and cognitive domain. The portion of Taiwanese data came from the 2011 administration of the Trends in International Mathematics and Science Study (TIMSS), one of the major international large-scale assessments in science. The item difficulty analysis was initially applied to show the proportion of correct items. A regression-based cumulative link mixed modeling (CLMM) approach was further utilized to estimate the impact of item format, cognitive domain, and their interaction on the students' science scores. The results of the proportion-correct statistics showed that constructed-response items were more difficult than multiple-choice items, and that the reasoning cognitive domain items were more difficult compared to the items in the applying and knowing domains. In terms of the CLMM results, students tended to obtain higher scores when answering constructed-response items as well as items in the applying cognitive domain. When the two predictors and the interaction term were included together, the directions and magnitudes of the predictors on student science performance changed substantially. Plausible explanations for the complex nature of the effects of the two test-design predictors on student science performance are discussed. The results provide practical, empirical-based evidence for test developers, teachers, and stakeholders to be aware of the differential function of item format, cognitive domain, and their interaction in students' science performance.
Personality Measurement with Mentally Retarded and Other Sub-Cultural Adults. Final Report.
ERIC Educational Resources Information Center
Eber, Herbert W.
Two 160-item experimental forms of multidimensional personality test to assess vocational potential of clients of limited literacy (third grade reading level) were developed and administered to clients at rehabilitation centers and at centers for the retarded. Using the 16 Personality Factors Test as a model, items were constructed to do the…
Managing a Test Item Bank on a Microcomputer: Can It Help You and Your Students?
ERIC Educational Resources Information Center
Peterson, Julian A.; Meister, Lynn L.
1983-01-01
Describes a test item bank developed by the Association for Medical School Departments of Biochemistry (Texas). Programs (written in Pascal) allow self-evaluation by interactive student access to questions randomly selected from a chosen category. Potential users of the system (having student, manager, and instructor modes) are invited to contact…
ERIC Educational Resources Information Center
Hendrickson, Amy; Patterson, Brian; Ewing, Maureen
2010-01-01
The psychometric considerations and challenges associated with including constructed response items on tests are discussed along with how these issues affect the form assembly specifications for mixed-format exams. Reliability and validity, security and fairness, pretesting, content and skills coverage, test length and timing, weights, statistical…
Shinya, Sugimoto; Masaru, Akimoto; Akira, Hayakawa; Eisaku, Hokazono; Susumu, Osawa
2012-01-18
Lifestyle-related diseases in Japan account for 30% of the entire medical expenditure of the country and cause 60% of all deaths. For the prevention of lifestyle-related diseases, medical examination by laboratory tests on metabolic syndrome is important. To undertake examination by collection of blood from a fingertip, we developed the "Well Kit". About 65 μl of blood collected from a fingertip was diluted with buffer solution, which contained two internal standard materials. The kit also separated corpuscles and diluted plasma with a special filter. It measured the obtained diluted plasma using the JCA-BM2250. This measurement system was evaluated for the quantitative analysis of 8 items. The uncertainties of tested items of this measurement system were 1.7% to 6.4%. The coefficients of correlation of all tested items between this measurement value and the venous plasma sample value were 0.876-0.991, and hematocrit was 0.958. This system for testing blood collected from a fingertip is simple to use and can be applied in testing for metabolic syndrome. In addition, this testing system is useful in the medical examination of the personal healthcare and inhabitants. Copyright © 2011 Elsevier B.V. All rights reserved.
Mirzaei, Ardalan; Carter, Stephen R; Chen, Jenny Yimin; Rittsteuer, Claudia; Schneider, Carl R
2018-06-11
Recent changes within community pharmacy have seen a shift towards some pharmacies providing "value-added" services. However, providing high levels of service is resource intensive yet revenues from dispensing are declining. Of significance therefore, is how consumers perceive service quality (SQ). However, at present there are no validated and reliable instruments to measure consumers' perceptions of SQ in Australian community pharmacies. The aim of this study was to build a theory-grounded model of service quality (SQ) in community pharmacies and to create a valid survey instrument to measure consumers' perceptions of service quality. Stage 1 dealt with item generation using theory, prior research and qualitative interviews with pharmacy consumers. Selected items were then subjected to content validity and face validity. Stages 2 and 3 included psychometric testing among English-speaking adult consumers of Australian pharmacies. Exploratory factor analysis was used for item reduction and to explain the domains of SQ. In stage 1, item generation for SQ initially generated 113 items which were then refined, through content and face validity, down to 61 items. In stage 2, after subjecting the questionnaire to psychometric testing on the data from the first pharmacy (n = 374), the use of the primary dimensions of SQ was abandoned leaving 32 items representing 5 domains of SQ. In stage 3, the questionnaire was subject to further testing and item reduction in 3 other pharmacies (n = 320). SQ was best described using 23 items representing 6 domains: 'health and medicines advice', 'relationship quality', 'technical quality', 'environmental quality', 'non-prescription service', and 'health outcomes'. This research presents a theoretically-grounded and robust measurement scale developed for consumer perceptions of SQ in a community pharmacy. Copyright © 2018. Published by Elsevier Inc.
Development of the Assessment of Belief Conflict in Relationship-14 (ABCR-14)
Kyougoku, Makoto; Teraoka, Mutsumi; Masuda, Noriko; Ooura, Mariko; Abe, Yasushi
2015-01-01
Purpose Nurses and other healthcare workers frequently experience belief conflict, one of the most important, new stress-related problems in both academic and clinical fields. Methods In this study, using a sample of 1,683 nursing practitioners, we developed The Assessment of Belief Conflict in Relationship-14 (ABCR-14), a new scale that assesses belief conflict in the healthcare field. Standard psychometric procedures were used to develop and test the scale, including a qualitative framework concept and item-pool development, item reduction, and scale development. We analyzed the psychometric properties of ABCR-14 according to entropy, polyserial correlation coefficient, exploratory factor analysis, confirmatory factor analysis, average variance extracted, Cronbach’s alpha, Pearson product-moment correlation coefficient, and multidimensional item response theory (MIRT). Results The results of the analysis supported a three-factor model consisting of 14 items. The validity and reliability of ABCR-14 was suggested by evidence from high construct validity, structural validity, hypothesis testing, internal consistency reliability, and concurrent validity. The result of the MIRT offered strong support for good item response of item slope parameters and difficulty parameters. However, the ABCR-14 Likert scale might need to be explored from the MIRT point of view. Yet, as mentioned above, there is sufficient evidence to support that ABCR-14 has high validity and reliability. Conclusion The ABCR-14 demonstrates good psychometric properties for nursing belief conflict. Further studies are recommended to confirm its application in clinical practice. PMID:26247356
Developing an item bank to measure economic quality of life for individuals with disabilities.
Tulsky, David S; Kisala, Pamela A; Lai, Jin-Shei; Carlozzi, Noelle; Hammel, Joy; Heinemann, Allen W
2015-04-01
To develop and evaluate the psychometric properties of an item set measuring economic quality of life (QOL) for use by individuals with disabilities. Survey. Community settings. Individuals with disabilities completed individual interviews (n=64), participated in focus groups (n=172), and completed cognitive interviews (n=15). Inclusion criteria included the following: traumatic brain injury, spinal cord injury, or stroke; age ≥18 years; and ability to read and speak English. We calibrated the items with 305 former rehabilitation inpatients. None. Economic QOL. Confirmatory factor analysis showed acceptable fit indices (comparative fit index=.939, root mean square error of approximation=.089) for the 37 items. However, 3 items demonstrated local item dependence. Dropping 9 items improved fit and obviated local dependence. Rasch analysis of the remaining 28 items yielded a person reliability of .92, suggesting that these items discriminate about 4 economic QOL levels. We developed a 28-item bank that measures economic aspects of QOL. Preliminary confirmatory factor analysis and Rasch analysis results support the psychometric properties of this new measure. It fills a gap in health-related QOL measurement by describing the economic barriers and facilitators of community participation. Future development will make the item bank available as a computer adaptive test. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
An Examination of Two Procedures for Identifying Consequential Item Parameter Drift
ERIC Educational Resources Information Center
Wells, Craig S.; Hambleton, Ronald K.; Kirkpatrick, Robert; Meng, Yu
2014-01-01
The purpose of the present study was to develop and evaluate two procedures flagging consequential item parameter drift (IPD) in an operational testing program. The first procedure was based on flagging items that exhibit a meaningful magnitude of IPD using a critical value that was defined to represent barely tolerable IPD. The second procedure…
ERIC Educational Resources Information Center
Lorié, William A.
2013-01-01
A reverse engineering approach to automatic item generation (AIG) was applied to a figure-based publicly released test item from the Organisation for Economic Cooperation and Development (OECD) Programme for International Student Assessment (PISA) mathematical literacy cognitive instrument as part of a proof of concept. The author created an item…
The Construction of a Long Variable of Conceptual Development in Social Education.
ERIC Educational Resources Information Center
Doig, Brian
This paper demonstrates a method for constructing long variables using items that elicit partically correct responses across ages. Long variables may be defined by students at different ages (year levels) attempting common items within a test containing other items considered to be appropriate for each age or year level. A developmental model of…
Cognitive-Developmental Hierarchies: A Search for Structure Using Item-Level Data.
ERIC Educational Resources Information Center
Martinez, Michael E.; Simpson, R. Scott
Item-level statistics from ability and achievement tests have been underutilized as sources of data for building models of cognitive development. How item data can be used to build a cognitive-developmental map of proportional reasoning is demonstrated. The product of the analysis is a cognitive hierarchy with levels corresponding to categories of…
NASA Astrophysics Data System (ADS)
Qin, Huaili; Yang, Guang; Kuang, Shan; Wang, Qiang; Liu, Jingjing; Zhang, Xiaomin; Li, Cancan; Han, Zhiwei; Li, Yuanjing
2018-02-01
The present project will adopt the principle and technology of X-ray imaging to quickly measure the mass thickness (wherein the mass thickness of the item =density of the item × thickness of the item) of the irradiated items and thus to determine whether the packaging size and inside location of the item will meet the requirements for treating thickness upon electron beam irradiation processing. The development of algorithm of X-ray mass thickness detector as well as the prediction of dose distribution have been completed. The development of the algorithm was based on the X-ray attenuation. 4 standard modules, Al sheet, Al ladders, PMMA sheet and PMMA ladders, were selected for the algorithm development. The algorithm was optimized until the error between tested mass thickness and standard mass thickness was less than 5%. Dose distribution of all energy (1-10 MeV) for each mass thickness was obtained using Monte-carlo method and used for the analysis of dose distribution, which provides the information of whether the item will be penetrated or not, as well as the Max. dose, Min. dose and DUR of the whole item.
Job Specific Tests and an Overview of Research on Alternatives.
ERIC Educational Resources Information Center
MacLane, Charles N.; O'Leary, Brian S.
The development of job-specific tests (JSTs) for two occupations is discussed. A reading comprehension test and a mathematical reasoning test were developed for Customs Inspectors, and a reading comprehension test was developed for Social Security Claims workers. JST items incorporated reading samples or math problems from those found on the job.…
Developing energy and momentum conceptual survey (EMCS) with four-tier diagnostic test items
NASA Astrophysics Data System (ADS)
Afif, Nur Faadhilah; Nugraha, Muhammad Gina; Samsudin, Achmad
2017-05-01
Students' conceptions of work and energy are important to support the learning process in the classroom. For that reason, a diagnostic test instrument is needed to diagnose students' conception of work and energy. As a result, the researcher decided to develop Energy and Momentum Conceptual Survey (EMCS) instrument test into four-tier test diagnostic items. The purpose of this research is organized as the first step of four-tier test-formatted EMCS development as one of diagnostic test instruments on work and Energy. The research method used the 4D model (Defining, Designing, Developing and Disseminating). The instrument developed has been tested to 39 students in one of Senior High Schools. The resulting research showed that four-tier test-formatted EMCS is able to diagnose students' conception level of work and energy concept. It can be concluded that the development of four-tier test-formatted EMCS is one of potential diagnostic test instruments that able to obtain the category of students who understand concepts, misconceptions and do not understand about Work and Energy concept at all.
Development of the Online Assessment of Athletic Training Education (OAATE) Instrument
ERIC Educational Resources Information Center
Carr, W. David; Frey, Bruce B.; Swann, Elizabeth
2009-01-01
Objective: To establish the validity and reliability of an online assessment instrument's items developed to track educational outcomes over time. Design and Setting: A descriptive study of the validation arguments and reliability testing of the assessment items. The instrument is available to graduating students enrolled in entry-level Athletic…
Identifying Promising Items: The Use of Crowdsourcing in the Development of Assessment Instruments
ERIC Educational Resources Information Center
Sadler, Philip M.; Sonnert, Gerhard; Coyle, Harold P.; Miller, Kelly A.
2016-01-01
The psychometrically sound development of assessment instruments requires pilot testing of candidate items as a first step in gauging their quality, typically a time-consuming and costly effort. Crowdsourcing offers the opportunity for gathering data much more quickly and inexpensively than from most targeted populations. In a simulation of a…
Indicators of Family Care for Development for Use in Multicountry Surveys
Kariger, Patricia; Engle, Patrice; Britto, Pia M. Rebello; Sywulka, Sara M.; Menon, Purnima
2012-01-01
Indicators of family care for development are essential for ascertaining whether families are providing their children with an environment that leads to positive developmental outcomes. This project aimed to develop indicators from a set of items, measuring family care practices and resources important for caregiving, for use in epidemiologic surveys in developing countries. A mixed method (quantitative and qualitative) design was used for item selection and evaluation. Qualitative and quantitative analyses were conducted to examine the validity of candidate items in several country samples. Qualitative methods included the use of global expert panels to identify and evaluate the performance of each candidate item as well as in-country focus groups to test the content validity of the items. The quantitative methods included analyses of item-response distributions, using bivariate techniques. The selected items measured two family care practices (support for learning/stimulating environment and limit-setting techniques) and caregiving resources (adequacy of the alternate caregiver when the mother worked). Six play-activity items, indicative of support for learning/stimulating environment, were included in the core module of UNICEF's Multiple Cluster Indictor Survey 3. The other items were included in optional modules. This project provided, for the first time, a globally-relevant set of items for assessing family care practices and resources in epidemiological surveys. These items have multiple uses, including national monitoring and cross-country comparisons of the status of family care for development used globally. The obtained information will reinforce attention to efforts to improve the support for development of children. PMID:23304914
[Development of a questionnaire to measure family stress among married working women].
Kim, Gwang Suk; Cho, Won Jung
2006-08-01
Even though a number of studies have suggested that appropriate measuring instruments of family stress for working women have to be developed, the validity and reliability of the instruments used have not been consistently examined. The purpose of the present study was to develop a sensitive instrument to measure family stress for married working women, and to test the validity and reliability of the instrument. The items generated for this instrument were drawn from a comprehensive literature review. Twenty four items were developed through evaluation by 10 experts and twenty one items were finally confirmed through item analysis. Psychometric testing was preformed and confirmed with a convenient sample of 240 women employed in the industrial sector. Four factors evolved by factor analysis, which explained 50.5% of the total variance. The first factor 'Cooperation' explained 28.1%, 2nd factor 'Satisfaction with relationships' 10.6%, 3rd factor 'Democratic and comfortable environment' 6.3%, and 4th factor 'Disturbance of own living' 5.5%. Cronbach's coefficient of this instrument was 0.86. The study supports the validity and reliability of the instrument.
Biomarker development targeting unmet clinical needs.
Monaghan, Phillip J; Lord, Sarah J; St John, Andrew; Sandberg, Sverre; Cobbaert, Christa M; Lennartz, Lieselotte; Verhagen-Kamerbeek, Wilma D J; Ebert, Christoph; Bossuyt, Patrick M M; Horvath, Andrea R
2016-09-01
The introduction of new biomarkers can lead to inappropriate utilization of tests if they do not fill in existing gaps in clinical care. We aimed to define a strategy and checklist for identifying unmet needs for biomarkers. A multidisciplinary working group used a 4-step process: 1/ scoping literature review; 2/ face-to-face meetings to discuss scope, strategy and checklist items; 3/ iterative process of feedback and consensus to develop the checklist; 4/ testing and refinement of checklist items using case scenarios. We used clinical pathway mapping to identify clinical management decisions linking biomarker testing to health outcomes and developed a 14-item checklist organized into 4 domains: 1/ identifying and 2/ verifying the unmet need; 3/ validating the intended use; and 4/ assessing the feasibility of the new biomarker to influence clinical practice and health outcome. We present an outcome-focused approach that can be used by multiple stakeholders for any medical test, irrespective of the purpose and role of testing. The checklist intends to achieve more efficient biomarker development and translation into practice. We propose the checklist is field tested by stakeholders, and advocate the role of the clinical laboratory professional to foster trans-sector collaboration in this regard. Copyright © 2016 Elsevier B.V. All rights reserved.
Development and preliminary testing of a computerized adaptive assessment of chronic pain.
Anatchkova, Milena D; Saris-Baglama, Renee N; Kosinski, Mark; Bjorner, Jakob B
2009-09-01
The aim of this article is to report the development and preliminary testing of a prototype computerized adaptive test of chronic pain (CHRONIC PAIN-CAT) conducted in 2 stages: (1) evaluation of various item selection and stopping rules through real data-simulated administrations of CHRONIC PAIN-CAT; (2) a feasibility study of the actual prototype CHRONIC PAIN-CAT assessment system conducted in a pilot sample. Item calibrations developed from a US general population sample (N = 782) were used to program a pain severity and impact item bank (kappa = 45), and real data simulations were conducted to determine a CAT stopping rule. The CHRONIC PAIN-CAT was programmed on a tablet PC using QualityMetric's Dynamic Health Assessment (DYHNA) software and administered to a clinical sample of pain sufferers (n = 100). The CAT was completed in significantly less time than the static (full item bank) assessment (P < .001). On average, 5.6 items were dynamically administered by CAT to achieve a precise score. Scores estimated from the 2 assessments were highly correlated (r = .89), and both assessments discriminated across pain severity levels (P < .001, RV = .95). Patients' evaluations of the CHRONIC PAIN-CAT were favorable. This report demonstrates that the CHRONIC PAIN-CAT is feasible for administration in a clinic. The application has the potential to improve pain assessment and help clinicians manage chronic pain.
De Silva Weliange, Shreenika H; Fernando, Dulitha; Gunatilake, Jagath
2014-05-03
Environmental characteristics are known to be associated with patterns of physical activity (PA). Although several validated tools exist, to measure the environment characteristics, these instruments are not necessarily suitable for application in all settings especially in a developing country. This study was carried out to develop and validate an instrument named the "Physical And Social Environment Scale--PASES" to assess the physical and social environmental factors associated with PA. This will enable identification of various physical and social environmental factors affecting PA in Sri Lanka, which will help in the development of more tailored intervention strategies for promoting higher PA levels in Sri Lanka. The PASES was developed using a scientific approach of defining the construct, item generation, analysis of content of items and item reduction. Both qualitative and quantitative methods of key informant interviews, in-depth interviews and rating of the items generated by experts were conducted. A cross sectional survey among 180 adults was carried out to assess the factor structure through principal component analysis. Another cross sectional survey among a different group of 180 adults was carried out to assess the construct validity through confirmatory factor analysis. Reliability was assessed with test re-test reliability and internal consistency using Spearman r and Cronbach's alpha respectively. Thirty six items were selected after the expert ratings and were developed into interviewer administered questions. Exploration of factor structure of the 34 items which were factorable through principal component analysis with Quartimax rotation extracted 8 factors. The 34 item instrument was assessed for construct validity with confirmatory factor analysis which confirmed an 8 factor model (x2 = 339.9, GFI = 0.90). The identified factors were infrastructure for walking, aesthetics and facilities for cycling, vehicular traffic safety, access and connectivity, recreational facilities for PA, safety, social cohesion and social acceptance of PA with the two non-factorable factors, residential density and land use mix. The PASES also showed good test re-test reliability and a moderate level of internal consistency. The PASES is a valid and reliable tool which could be used to assess the physical and social environment associated with PA in Sri Lanka.
Item response theory, computerized adaptive testing, and PROMIS: assessment of physical function.
Fries, James F; Witter, James; Rose, Matthias; Cella, David; Khanna, Dinesh; Morgan-DeWitt, Esi
2014-01-01
Patient-reported outcome (PRO) questionnaires record health information directly from research participants because observers may not accurately represent the patient perspective. Patient-reported Outcomes Measurement Information System (PROMIS) is a US National Institutes of Health cooperative group charged with bringing PRO to a new level of precision and standardization across diseases by item development and use of item response theory (IRT). With IRT methods, improved items are calibrated on an underlying concept to form an item bank for a "domain" such as physical function (PF). The most informative items can be combined to construct efficient "instruments" such as 10-item or 20-item PF static forms. Each item is calibrated on the basis of the probability that a given person will respond at a given level, and the ability of the item to discriminate people from one another. Tailored forms may cover any desired level of the domain being measured. Computerized adaptive testing (CAT) selects the best items to sharpen the estimate of a person's functional ability, based on prior responses to earlier questions. PROMIS item banks have been improved with experience from several thousand items, and are calibrated on over 21,000 respondents. In areas tested to date, PROMIS PF instruments are superior or equal to Health Assessment Questionnaire and Medical Outcome Study Short Form-36 Survey legacy instruments in clarity, translatability, patient importance, reliability, and sensitivity to change. Precise measures, such as PROMIS, efficiently incorporate patient self-report of health into research, potentially reducing research cost by lowering sample size requirements. The advent of routine IRT applications has the potential to transform PRO measurement.
Prins, Martin H; Marrel, Alexia; Carita, Paulo; Anderson, David; Bousser, Marie-Germaine; Crijns, Harry; Consoli, Silla; Arnould, Benoit
2009-01-01
Background The side effects and burden of anticoagulant treatments may contribute to poor compliance and consequently to treatment failure. A specific questionnaire is necessary to assess patients' needs and their perceptions of anticoagulant treatment. Methods A conceptual model of expectation and satisfaction with anticoagulant treatment was designed by an advisory board and used to guide patient (n = 31) and clinician (n = 17) interviews in French, US English and Dutch. Patients had either atrial fibrillation (AF), deep venous thrombosis (DVT), or pulmonary embolism (PE). Following interviews, three PACT-Q language versions were developed simultaneously and further pilot-tested by 19 patients. Linguistic validations were performed for additional language versions. Results Initial concepts were developed to cover three areas of interest: 'Treatment', 'Disease and Complications' and 'Information about disease and anticoagulant treatment'. After clinician and patient interviews, concepts were further refined into four domains and 17 concepts; test versions of the PACT-Q were then created simultaneously in three languages, each containing 27 items grouped into four domains: "Treatment Expectations" (7 items), "Convenience" (11 items), "Burden of Disease and Treatment" (2 items) and "Anticoagulant Treatment Satisfaction" (7 items). No item was deleted or added after pilot testing as patients found the PACT-Q easy to understand and appropriate in length in all languages. The PACT-Q was divided into two parts: the first part to measure the expectations and the second to measure the convenience, burden and treatment satisfaction, for evaluation prior to and after anticoagulant treatment, respectively. Eleven additional language versions were linguistically validated. Conclusion The PACT-Q has been rigorously developed and linguistically validated. It is available in 14 languages for use with thromboembolic patients, including AF, PE and DVT patients. Its validation and psychometric properties have been tested and are presented in a separate manuscript. PMID:19196486
Yordanova, Ralitsa; Ivanov, Ivan
2018-04-25
Developmental testing is essential for early recognition of the various developmental impairments. The tools used should be composed of items that are age specific, adapted, and standardized for the population they are applied to. The achievements of neurosciences, medicine, psychology, pedagogy, etc. are applied in the elaboration of a comprehensive examination tool that should screen all major areas of development. The key age of 5 years permits identification of almost all major developmental disabilities leaving time for therapeutic intervention before school entrance. The aim of the research is to evaluate the developmental performance of 5-year-old Bulgarian children using the approach of translation neuroscience. A comprehensive test program was developed composed of 89 items grouped in the following domains: fine and gross motor development, coordination and balance, central motor neuron disturbances, language development and articulation, perception, attention and behavior, visual acuity, and strabismus. The overall sample comprises 434 children of mean age 63.5 months (SD-3.7). Male to female ratio is 1:1.02. From this group, 390 children are between 60 and 71 months of age. The children are examined in 51 kindergartens in 21 villages and 18 cities randomly chosen in southern Bulgaria. Eight children were excluded from the final analysis because they fulfilled less than 50% of the test items (7 children did not cooperate and 1 child was with autistic spectrum disorder). The items with abnormal response in less than 5% of the children are 43. The items with abnormal response in 6% to 35% of the children are 37. The items with high abnormal response (more than 35%) rate are only 9. The test is an example of a translational approach in neuroscience. On one hand, it is based on the results of several sciences studying growth and development from different perspective. On the other hand, the results from the present research may be implemented in other fields of child development-education, psychology, speech and language therapy, and intervention programs. © 2018 John Wiley & Sons, Ltd.
Development and psychometric characteristics of the SCI-QOL Pressure Ulcers scale and short form.
Kisala, Pamela A; Tulsky, David S; Choi, Seung W; Kirshblum, Steven C
2015-05-01
To develop a self-reported measure of the subjective impact of pressure ulcers on health-related quality of life (HRQOL) in individuals with spinal cord injury (SCI) as part of the SCI quality of life (SCI-QOL) measurement system. Grounded-theory based qualitative item development methods, large-scale item calibration testing, confirmatory factor analysis (CFA), and item response theory-based psychometric analysis. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. SCI-QOL Pressure Ulcers scale. 189 individuals with traumatic SCI who experienced a pressure ulcer within the past 7 days completed 30 items related to pressure ulcers. CFA confirmed a unidimensional pool of items. IRT analyses were conducted. A constrained Graded Response Model with a constant slope parameter was used to estimate item thresholds for the 12 retained items. The 12-item SCI-QOL Pressure Ulcers scale is unique in that it is specifically targeted to individuals with spinal cord injury and at every stage of development has included input from individuals with SCI. Furthermore, use of CFA and IRT methods provide flexibility and precision of measurement. The scale may be administered in its entirety or as a 7-item "short form" and is available for both research and clinical practice.
Development of an instrument for the evaluation of advanced life support performance.
Peltonen, L-M; Peltonen, V; Salanterä, S; Tommila, M
2017-10-01
Assessing advanced life support (ALS) competence requires validated instruments. Existing instruments include aspects of technical skills (TS), non-technical skills (NTS) or both, but one instrument for detailed assessment that suits all resuscitation situations is lacking. This study aimed to develop an instrument for the evaluation of the overall ALS performance of the whole team. This instrument development study had four phases. First, we reviewed literature and resuscitation guidelines to explore items to include in the instrument. Thereafter, we interviewed resuscitation team professionals (n = 66), using the critical incident technique, to determine possible additional aspects associated with the performance of ALS. Second, we developed an instrument based on the findings. Third, we used an expert panel (n = 20) to assess the validity of the developed instrument. Finally, we revised the instrument based on the experts' comments and tested it with six experts who evaluated 22 video recorded resuscitations. The final version of the developed instrument had 69 items divided into adherence to guidelines (28 items), clinical decision-making (5 items), workload management (12 items), team behaviour (8 items), information management (6 items), patient integrity and consideration of laymen (4 items) and work routines (6 items). The Cronbach's α values were good, and strong correlations between the overall performance and the instrument were observed. The instrument may be useful for detailed assessment of the team's overall performance, but the numerous items make the use demanding. The instrument is still under development, and more research is needed to determine its psychometric properties. © 2017 The Acta Anaesthesiologica Scandinavica Foundation. Published by John Wiley & Sons Ltd.
Odukoya, Jonathan A; Adekeye, Olajide; Igbinoba, Angie O; Afolabi, A
2018-01-01
Teachers and Students worldwide often dance to the tune of tests and examinations. Assessments are powerful tools for catalyzing the achievement of educational goals, especially if done rightly. One of the tools for 'doing it rightly' is item analysis. The core objectives for this study, therefore, were: ascertaining the item difficulty and distractive indices of the university wide courses. A range of 112-1956 undergraduate students participated in this study. With the use of secondary data, the ex-post facto design was adopted for this project. In virtually all cases, majority of the items (ranging between 65% and 97% of the 70 items fielded in each course) did not meet psychometric standard in terms of difficulty and distractive indices and consequently needed to be moderated or deleted. Considering the importance of these courses, the need to apply item analyses when developing these tests was emphasized.
Development of the NIH PROMIS ® Sexual Function and Satisfaction measures in patients with cancer.
Flynn, Kathryn E; Lin, Li; Cyranowski, Jill M; Reeve, Bryce B; Reese, Jennifer Barsky; Jeffery, Diana D; Smith, Ashley Wilder; Porter, Laura S; Dombeck, Carrie B; Bruner, Deborah Watkins; Keefe, Francis J; Weinfurt, Kevin P
2013-02-01
We describe the development and validation of the Patient-Reported Outcomes Measurement Information System(®) Sexual Function and Satisfaction (PROMIS(®) SexFS; National Institutes of Health) measures, version 1.0, for cancer populations. To develop a customizable self-report measure of sexual function and satisfaction as part of the U.S. National Institutes of Health PROMIS Network. Our multidisciplinary working group followed a comprehensive protocol for developing psychometrically robust patient-reported outcome measures including qualitative (scale development) and quantitative (psychometric evaluation) development. We performed an extensive literature review, conducted 16 focus groups with cancer patients and multiple discussions with clinicians, and evaluated candidate items in cognitive testing with patients. We administered items to 819 cancer patients. Items were calibrated using item-response theory and evaluated for reliability and validity. The PROMIS SexFS measures, version 1.0, include 81 items in 11 domains: Interest in Sexual Activity, Lubrication, Vaginal Discomfort, Erectile Function, Global Satisfaction with Sex Life, Orgasm, Anal Discomfort, Therapeutic Aids, Sexual Activities, Interfering Factors, and Screener Questions. In addition to content validity (patients indicate that items cover important aspects of their experiences) and face validity (patients indicate that items measure sexual function and satisfaction), the measure shows evidence for discriminant validity (domains discriminate between groups expected to be different) and convergent validity (strong correlations between scores on PROMIS and scores on conceptually similar older measures of sexual function), as well as favorable test-retest reliability among people not expected to change (interclass correlations from two administrations of the instrument, 1 month apart). The PROMIS SexFS offers researchers a reliable and valid set of tools to measure self-reported sexual function and satisfaction among diverse men and women. The measures are customizable; researchers can select the relevant domains and items comprising those domains for their study. © 2013 International Society for Sexual Medicine.
Development of the NIH PROMIS® Sexual Function and Satisfaction Measures in Patients with Cancer
Flynn, Kathryn E.; Lin, Li; Cyranowski, Jill M.; Reeve, Bryce B.; Reese, Jennifer Barsky; Jeffery, Diana D.; Smith, Ashley Wilder; Porter, Laura S.; Dombeck, Carrie B.; Bruner, Deborah Watkins; Keefe, Francis J.; Weinfurt, Kevin P.
2013-01-01
Introduction We describe the development and validation of the PROMIS Sexual Function and Satisfaction (PROMIS SexFS) measures version 1.0 for cancer populations. Aim To develop a customizable self-report measure of sexual function and satisfaction as part of the U.S. National Institutes of Health PROMIS® Network. Methods Our multidisciplinary working group followed a comprehensive protocol for developing psychometrically robust patient reported outcome (PRO) measures including qualitative (scale development) and quantitative (psychometric evaluation) development. We performed an extensive literature review, conducted 16 focus groups with cancer patients and multiple discussions with clinicians, and evaluated candidate items in cognitive testing with patients. We administered items to 819 cancer patients. Items were calibrated using item response theory and evaluated for reliability and validity. Main Outcome Measures The PROMIS Sexual Function and Satisfaction (PROMIS SexFS) measures version 1.0 include 79 items in 11 domains: interest in sexual activity, lubrication, vaginal discomfort, erectile function, global satisfaction with sex life, orgasm, anal discomfort, therapeutic aids, sexual activities, interfering factors, and screener questions. Results In addition to content validity (patients indicate that items cover important aspects of their experiences) and face validity (patients indicate that items measure sexual function and satisfaction), the measure shows evidence for discriminant validity (domains discriminate between groups expected to be different), convergent validity (strong correlations between scores on PROMIS and scores on conceptually-similar older measures of sexual function), as well as favorable test-retest reliability among people not expected to change (inter-class correlations from 2 administrations of the instrument, 1 month apart). Conclusions The PROMIS SexFS offers researchers a reliable and valid set of tools to measure self-reported sexual function and satisfaction among diverse men and women. The measures are customizable; researchers can select the relevant domains and items comprising those domains for their study. PMID:23387911
[Development of skill scale for communication skill measurement of pharmacist].
Teramachi, Hitomi; Komada, Natsuki; Tanizawa, Katsuya; Kuzuya, Yumi; Tsuchiya, Teruo
2011-04-01
To purpose of this study was to develop a pharmacist communication skill scale. A 38 items scale was made and 283 pharmacists responded. The original questionnaire consisted of 38 items, with 1-5 graded Likert scale. Completed responses of 228 pharmacists data were used for testing the reliability and the validity of this scale. The first group of items from the original questionnaire were 38, and finally 38 original items were chosen for investigation of content validity, correlation coefficient and commonality. From factor analysis, four factors were chosen among the 31 items as follows: patient respect reception skill, problem discovery and solution skill, positive approach skill, feelings processing skill. The correlation coefficient between this original scale and the KiSS-18 (Social Skill) received high score (r=0.694). The reliability of this scale showed high internal consistency (Cronbach α coefficient=0.951), so the result of test for the validity of this scale supports high content validity. Thus we propose adoption of pharmacist communication skill scale to carry a brief eponymous name as TePSS-31. The above findings indicate that this developed scale possess adequate validity and reliability for practical use.
Yun, Young Ho; Sim, Jin Ah; Lim, Ye Jin; Lim, Cheol Il; Kang, Sung-Choon; Kang, Joon-Ho; Park, Jun Dong; Noh, Dong Young
2016-06-01
The objective of this study was to develop the Worksite Health Index (WHI) and validate its psychometric properties. The development of the WHI questionnaire included item generation, item construction, and field testing. To assess the instrument's reliability and validity, we recruited 30 different Korean worksites. We developed the WHI questionnaire of 136 items categorized into five domains, namely Governance and Infrastructure, Need Assessment and Planning, Health Prevention and Promotion Program, Occupational Safety, and Monitoring and Feedback. All WHI domains demonstrated a high reliability with good internal consistency. The total WHI scores differentiated worksite groups effectively according to firm size. Each domain was associated significantly with employees' health status, absence, and financial outcome. The WHI can assess comprehensive worksite health programs. This tool is publicly available for addressing the growing need for worksite health programs.
Kayser, Lars; Karnoe, Astrid; Furstrand, Dorthe; Batterham, Roy; Christensen, Karl Bang; Elsworth, Gerald; Osborne, Richard H
2018-02-12
For people to be able to access, understand, and benefit from the increasing digitalization of health services, it is critical that services are provided in a way that meets the user's needs, resources, and competence. The objective of the study was to develop a questionnaire that captures the 7-dimensional eHealth Literacy Framework (eHLF). Draft items were created in parallel in English and Danish. The items were generated from 450 statements collected during the conceptual development of eHLF. In all, 57 items (7 to 9 items per scale) were generated and adjusted after cognitive testing. Items were tested in 475 people recruited from settings in which the scale was intended to be used (community and health care settings) and including people with a range of chronic conditions. Measurement properties were assessed using approaches from item response theory (IRT) and classical test theory (CTT) such as confirmatory factor analysis (CFA) and reliability using composite scale reliability (CSR); potential bias due to age and sex was evaluated using differential item functioning (DIF). CFA confirmed the presence of the 7 a priori dimensions of eHLF. Following item analysis, a 35-item 7-scale questionnaire was constructed, covering (1) using technology to process health information (5 items, CSR=.84), (2) understanding of health concepts and language (5 items, CSR=.75), (3) ability to actively engage with digital services (5 items, CSR=.86), (4) feel safe and in control (5 items, CSR=.87), (5) motivated to engage with digital services (5 items, CSR=.84), (6) access to digital services that work (6 items, CSR=.77), and (7) digital services that suit individual needs (4 items, CSR=.85). A 7-factor CFA model, using small-variance priors for cross-loadings and residual correlations, had a satisfactory fit (posterior productive P value: .27, 95% CI for the difference between the observed and replicated chi-square values: -63.7 to 133.8). The CFA showed that all items loaded strongly on their respective factors. The IRT analysis showed that no items were found to have disordered thresholds. For most scales, discriminant validity was acceptable; however, 2 pairs of dimensions were highly correlated; dimensions 1 and 5 (r=.95), and dimensions 6 and 7 (r=.96). All dimensions were retained because of strong content differentiation and potential causal relationships between these dimensions. There is no evidence of DIF. The eHealth Literacy Questionnaire (eHLQ) is a multidimensional tool based on a well-defined a priori eHLF framework with robust properties. It has satisfactory evidence of construct validity and reliable measurement across a broad range of concepts (using both CTT and IRT traditions) in various groups. It is designed to be used to understand and evaluate people's interaction with digital health services. ©Lars Kayser, Astrid Karnoe, Dorthe Furstrand, Roy Batterham, Karl Bang Christensen, Gerald Elsworth, Richard H Osborne. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 12.02.2018.
Evaluation of five guidelines for option development in multiple-choice item-writing.
Martínez, Rafael J; Moreno, Rafael; Martín, Irene; Trigo, M Eva
2009-05-01
This paper evaluates certain guidelines for writing multiple-choice test items. The analysis of the responses of 5013 subjects to 630 items from 21 university classroom achievement tests suggests that an option should not differ in terms of heterogeneous content because such error has a slight but harmful effect on item discrimination. This also occurs with the "None of the above" option when it is the correct one. In contrast, results do not show the supposedly negative effects of a different-length option, the use of specific determiners, or the use of the "All of the above" option, which not only decreases difficulty but also improves discrimination when it is the correct option.
Cigarette dependence questionnaire: development and psychometric testing with male smokers.
Huang, Chih-Ling; Lin, Hsi-Hui; Wang, Hsiu-Hung
2010-10-01
This paper is a report of a study conducted to develop and test a theoretically derived Cigarette Dependence Questionnaire for adult male smokers. Fagerstrom questionnaires have been used worldwide to assess cigarette dependence. However, these assessments lack any theoretical perspective. A theory-based approach is needed to ensure valid assessment. In 2007, an initial pool of 103 Cigarette Dependence Questionnaire items was distributed to 109 adult smokers in Taiwan. Item analysis was conducted to select items for inclusion in the refined scale. The psychometric properties of the Cigarette Dependence Questionnaire were further evaluated 2007-08, when it was administered to 256 respondents and their saliva was collected and analysed for cotinine levels. Criterion validity was established through the Pearson correlation between the scale and saliva cotinine levels. Exploratory factor analysis was used to test construct validity. Reliability was determined with Cronbach's alpha coefficient and a 2-week test-retest coefficient. The selection of 30 items for seven perspectives was based on item analysis. One factor accounting for 44.9% of the variance emerged from the factor analysis. The factor was named as cigarette dependence. Cigarette Dependence Questionnaire scores were statistically significantly correlated with saliva cotinine levels (r = 0.21, P = 0.01). Cronbach's alpha was 0.95 and test-retest reliability using an intra-class correlation was 0.92. The Cigarette Dependence Questionnaire showed sound reliability and validity and could be used by nurses to set up smoking cessation interventions based on assessment of cigarette dependence. © 2010 Blackwell Publishing Ltd.
Paleologou, Victoria; Kontodimopoulos, Nick; Stamouli, Aggeliki; Aletras, Vassilis; Niakas, Dimitris
2006-09-13
In the era of cost containment, managers are constantly pursuing increased organizational performance and productivity by aiming at the obvious target, i.e. the workforce. The health care sector, in which production processes are more complicated compared to other industries, is not an exception. In light of recent legislation in Greece in which efficiency improvement and achievement of specific performance targets are identified as undisputable health system goals, the purpose of this study was to develop a reliable and valid instrument for investigating the attitudes of Greek physicians, nurses and administrative personnel towards job-related aspects, and the extent to which these motivate them to improve performance and increase productivity. A methodological exploratory design was employed in three phases: a) content development and assessment, which resulted in a 28-item instrument, b) pilot testing (N = 74) and c) field testing (N = 353). Internal consistency reliability was tested via Cronbach's alpha coefficient and factor analysis was used to identify the underlying constructs. Tests of scaling assumptions, according to the Multitrait-Multimethod Matrix, were used to confirm the hypothesized component structure. Four components, referring to intrinsic individual needs and external job-related aspects, were revealed and explain 59.61% of the variability. They were subsequently labeled: job attributes, remuneration, co-workers and achievement. Nine items not meeting item-scale criteria were removed, resulting in a 19-item instrument. Scale reliability ranged from 0.782 to 0.901 and internal item consistency and discriminant validity criteria were satisfied. Overall, the instrument appears to be a promising tool for hospital administrations in their attempt to identify job-related factors, which motivate their employees. The psychometric properties were good and warrant administration to a larger sample of employees in the Greek healthcare system.
Paleologou, Victoria; Kontodimopoulos, Nick; Stamouli, Aggeliki; Aletras, Vassilis; Niakas, Dimitris
2006-01-01
Background In the era of cost containment, managers are constantly pursuing increased organizational performance and productivity by aiming at the obvious target, i.e. the workforce. The health care sector, in which production processes are more complicated compared to other industries, is not an exception. In light of recent legislation in Greece in which efficiency improvement and achievement of specific performance targets are identified as undisputable health system goals, the purpose of this study was to develop a reliable and valid instrument for investigating the attitudes of Greek physicians, nurses and administrative personnel towards job-related aspects, and the extent to which these motivate them to improve performance and increase productivity. Methods A methodological exploratory design was employed in three phases: a) content development and assessment, which resulted in a 28-item instrument, b) pilot testing (N = 74) and c) field testing (N = 353). Internal consistency reliability was tested via Cronbach's alpha coefficient and factor analysis was used to identify the underlying constructs. Tests of scaling assumptions, according to the Multitrait-Multimethod Matrix, were used to confirm the hypothesized component structure. Results Four components, referring to intrinsic individual needs and external job-related aspects, were revealed and explain 59.61% of the variability. They were subsequently labeled: job attributes, remuneration, co-workers and achievement. Nine items not meeting item-scale criteria were removed, resulting in a 19-item instrument. Scale reliability ranged from 0.782 to 0.901 and internal item consistency and discriminant validity criteria were satisfied. Conclusion Overall, the instrument appears to be a promising tool for hospital administrations in their attempt to identify job-related factors, which motivate their employees. The psychometric properties were good and warrant administration to a larger sample of employees in the Greek healthcare system. PMID:16970823
Instructional Sensitivity Statistics Appropriate for Objectives-Based Test Items. CSE Report No. 91.
ERIC Educational Resources Information Center
Kosecoff, Jacqueline B.; Klein, Stephen P.
Two types of sensitivity indices were developed in this paper, one internal to the total test and the second external. To evaluate the success of these statistics the three criteria suggested for a satisfactory index of item quality were considered. The Internal Sensitivity Index appears to meet these demands. Certainly it is easily computed. In…
A Diagnostic Study of Pre-Service Teachers' Competency in Multiple-Choice Item Development
ERIC Educational Resources Information Center
Asim, Alice E.; Ekuri, Emmanuel E.; Eni, Eni I.
2013-01-01
Large class size is an issue in testing at all levels of Education. As a panacea to this, multiple choice test formats has become very popular. This case study was designed to diagnose pre-service teachers' competency in constructing questions (IQT); direct questions (DQT); and best answer (BAT) varieties of multiple choice items. Subjects were 88…
ERIC Educational Resources Information Center
Harnisch, Delwyn L.
The major emphasis of this paper is in the examination of test item response patterns. Tatsuoka and Tatsuoka (1980) have developed two indices of response consistency: the norm-conformity index (NCI) and the individual consistency index (ICI). The NCI provides a measure of the degree of consistency between the response pattern of an individual and…
ERIC Educational Resources Information Center
Ryan, Joseph; Brockmann, Frank
2009-01-01
Equating is an essential tool in educational assessment due the critical role it plays in several key areas: establishing validity across forms and years; fairness; test security; and, increasingly, continuity in programs that release items or require ongoing development. Although the practice of equating is rooted in long standing practices that…
Development of new selection tests for air traffic controllers.
DOT National Transportation Integrated Search
1977-12-01
This report describes the development of a new Multiplex Controller Aptitude Test for initial screening of FAA Air Traffic Controller applicants. Its content includes the traditional types of aptitude test items used for today's screening. In additio...
Faber, Irene R; Nijhuis-Van Der Sanden, Maria W G; Elferink-Gemser, Marije T; Oosterveld, Frits G J
2015-01-01
A motor skills assessment could be helpful in talent development by estimating essential perceptuo-motor skills of young players, which are considered requisite to develop excellent technical and tactical qualities. The Netherlands Table Tennis Association uses a motor skills assessment in their talent development programme consisting of eight items measuring perceptuo-motor skills specific to table tennis under varying conditions. This study aimed to investigate this assessment regarding its reproducibility, internal consistency, underlying dimensions and concurrent validity in 113 young table tennis players (6-10 years). Intraclass correlation coefficients of six test items met the criteria of 0.7 with coefficients of variation between 3% and 8%. Cronbach's alpha valued 0.853 for internal consistency. The principal components analysis distinguished two conceptually meaningful factors: "ball control" and "gross motor function." Concurrent validity analyses demonstrated moderate associations between the motor skills assessment's results and national ranking; boys r = -0.53 (P < 0.001) and girls r = -0.45 (P = 0.015). In conclusion, this evaluation demonstrated six test items with acceptable reproducibility, good internal consistency and good prospects for validity. Two test items need revision to upgrade reproducibility. Since the motor skills assessment seems to be a reproducible, objective part of a talent development programme, more longitudinal studies are required to investigate its predictive validity.
Questionnaire Adapting: Little Changes Mean a Lot.
Sousa, Vanessa E C; Matson, Jeffrey; Dunn Lopez, Karen
2017-09-01
Questionnaire development involves rigorous testing to ensure reliability and validity. Due to time and cost constraints of developing new questionnaires, researchers often adapt existing questionnaires to better fit the purpose of their study. However, the effect of such adaptations is unclear. We conducted cognitive interviews as a method to evaluate the understanding of original and adapted questionnaire items to be applied in a future study. The findings revealed that all subjects (a) comprehended the original and adapted items differently, (b) changed their scores after comparing the original to the adapted items, and (c) were unanimous in stating that the adapted items were easier to understand. Cognitive interviewing allowed us to assess the interpretation of adapted items in a useful and efficient manner before use in data collection.
What drives sleep-dependent memory consolidation: greater gain or less loss?
Fenn, Kimberly M; Hambrick, David Z
2013-06-01
When memory is tested after a delay, performance is typically better if the retention interval includes sleep. However, it is unclear what accounts for this well-established effect. It is possible that sleep enhances the retrieval of information, but it is also possible that sleep protects against memory loss that normally occurs during waking activity. We developed a new research approach to investigate these possibilities. Participants learned a list of paired-associate items and were tested on the items after a 12-h interval that included waking or sleep. We analyzed the number of items gained versus the number of items lost across time. The sleep condition showed more items gained and fewer items lost than did the wake condition. Furthermore, the difference between the conditions (favoring sleep) in lost items was greater than the difference in gain, suggesting that loss prevention may primarily account for the effect of sleep on declarative memory consolidation. This finding may serve as an empirical constraint on theories of memory consolidation.
ERIC Educational Resources Information Center
Phillips, Linda M.
The design and development of a test of inference ability in reading comprehension for grades 6, 7, and 8 (the Phillips-Patterson Test of Inference Ability in Reading Comprehension) are described. After development of a contemporary theoretical framework for the test of inference ability in reading comprehension, the design, item development, and…
A signal detection-item response theory model for evaluating neuropsychological measures.
Thomas, Michael L; Brown, Gregory G; Gur, Ruben C; Moore, Tyler M; Patt, Virginie M; Risbrough, Victoria B; Baker, Dewleen G
2018-02-05
Models from signal detection theory are commonly used to score neuropsychological test data, especially tests of recognition memory. Here we show that certain item response theory models can be formulated as signal detection theory models, thus linking two complementary but distinct methodologies. We then use the approach to evaluate the validity (construct representation) of commonly used research measures, demonstrate the impact of conditional error on neuropsychological outcomes, and evaluate measurement bias. Signal detection-item response theory (SD-IRT) models were fitted to recognition memory data for words, faces, and objects. The sample consisted of U.S. Infantry Marines and Navy Corpsmen participating in the Marine Resiliency Study. Data comprised item responses to the Penn Face Memory Test (PFMT; N = 1,338), Penn Word Memory Test (PWMT; N = 1,331), and Visual Object Learning Test (VOLT; N = 1,249), and self-report of past head injury with loss of consciousness. SD-IRT models adequately fitted recognition memory item data across all modalities. Error varied systematically with ability estimates, and distributions of residuals from the regression of memory discrimination onto self-report of past head injury were positively skewed towards regions of larger measurement error. Analyses of differential item functioning revealed little evidence of systematic bias by level of education. SD-IRT models benefit from the measurement rigor of item response theory-which permits the modeling of item difficulty and examinee ability-and from signal detection theory-which provides an interpretive framework encompassing the experimentally validated constructs of memory discrimination and response bias. We used this approach to validate the construct representation of commonly used research measures and to demonstrate how nonoptimized item parameters can lead to erroneous conclusions when interpreting neuropsychological test data. Future work might include the development of computerized adaptive tests and integration with mixture and random-effects models.
Computer-adaptive test to measure community reintegration of Veterans.
Resnik, Linda; Tian, Feng; Ni, Pengsheng; Jette, Alan
2012-01-01
The Community Reintegration of Injured Service Members (CRIS) measure consists of three scales measuring extent of, perceived limitations in, and satisfaction with community reintegration. Length of the CRIS may be a barrier to its widespread use. Using item response theory (IRT) and computer-adaptive test (CAT) methodologies, this study developed and evaluated a briefer community reintegration measure called the CRIS-CAT. Large item banks for each CRIS scale were constructed. A convenience sample of 517 Veterans responded to all items. Exploratory and confirmatory factor analyses (CFAs) were used to identify the dimensionality within each domain, and IRT methods were used to calibrate items. Accuracy and precision of CATs of different lengths were compared with the full-item bank, and data were examined for differential item functioning (DIF). CFAs supported unidimensionality of scales. Acceptable item fit statistics were found for final models. Accuracy of 10-, 15-, 20-, and variable-item CATs for all three scales was 0.88 or above. CAT precision increased with number of items administered and decreased at the upper ranges of each scale. Three items exhibited moderate DIF by sex. The CRIS-CAT demonstrated promising measurement properties and is recommended for use in community reintegration assessment.
A large-scale, long-term study of scale drift: The micro view and the macro view
NASA Astrophysics Data System (ADS)
He, W.; Li, S.; Kingsbury, G. G.
2016-11-01
The development of measurement scales for use across years and grades in educational settings provides unique challenges, as instructional approaches, instructional materials, and content standards all change periodically. This study examined the measurement stability of a set of Rasch measurement scales that have been in place for almost 40 years. In order to investigate the stability of these scales, item responses were collected from a large set of students who took operational adaptive tests using items calibrated to the measurement scales. For the four scales that were examined, item samples ranged from 2183 to 7923 items. Each item was administered to at least 500 students in each grade level, resulting in approximately 3000 responses per item. Stability was examined at the micro level analysing change in item parameter estimates that have occurred since the items were first calibrated. It was also examined at the macro level, involving groups of items and overall test scores for students. Results indicated that individual items had changes in their parameter estimates, which require further analysis and possible recalibration. At the same time, the results at the total score level indicate substantial stability in the measurement scales over the span of their use.
Oliveira, Lanuza Borges; Soares, Fernanda Amaral; Silveira, Marise Fagundes; de Pinho, Lucinéia; Caldeira, Antônio Prates; Leite, Maísa Tavares de Souza
2016-01-01
ABSTRACT Objective: to develop and validate an instrument to evaluate the knowledge of health professionals about domestic violence on children. Method: this was a study conducted with 194 physicians, nurses and dentists. A literature review was performed for preparation of the items and identification of the dimensions. Apparent and content validation was performed using analysis of three experts and 27 professors of the pediatric health discipline. For construct validation, Cronbach's alpha was used, and the Kappa test was applied to verify reproducibility. The criterion validation was conducted using the Student's t-test. Results: the final instrument included 56 items; the Cronbach alpha was 0.734, the Kappa test showed a correlation greater than 0.6 for most items, and the Student t-test showed a statistically significant value to the level of 5% for the two selected variables: years of education and using the Family Health Strategy. Conclusion: the instrument is valid and can be used as a promising tool to develop or direct actions in public health and evaluate knowledge about domestic violence on children. PMID:27556878
USDA-ARS?s Scientific Manuscript database
Little research has been conducted on the psychometrics of the very short scale (36 items) of the Children’s Behavior Questionnaire, and no one-item temperament scale has been tested for use in applied work. In this study, 237 United States caregivers completed a survey to define their child’s behav...
ERIC Educational Resources Information Center
Domyancich, John M.
2014-01-01
Multiple-choice questions are an important part of large-scale summative assessments, such as the advanced placement (AP) chemistry exam. However, past AP chemistry exam items often lacked the ability to test conceptual understanding and higher-order cognitive skills. The redesigned AP chemistry exam shows a distinctive shift in item types toward…
The Long-Term Conditions Questionnaire: conceptual framework and item development.
Peters, Michele; Potter, Caroline M; Kelly, Laura; Hunter, Cheryl; Gibbons, Elizabeth; Jenkinson, Crispin; Coulter, Angela; Forder, Julien; Towers, Ann-Marie; A'Court, Christine; Fitzpatrick, Ray
2016-01-01
To identify the main issues of importance when living with long-term conditions to refine a conceptual framework for informing the item development of a patient-reported outcome measure for long-term conditions. Semi-structured qualitative interviews (n=48) were conducted with people living with at least one long-term condition. Participants were recruited through primary care. The interviews were transcribed verbatim and analyzed by thematic analysis. The analysis served to refine the conceptual framework, based on reviews of the literature and stakeholder consultations, for developing candidate items for a new measure for long-term conditions. Three main organizing concepts were identified: impact of long-term conditions, experience of services and support, and self-care. The findings helped to refine a conceptual framework, leading to the development of 23 items that represent issues of importance in long-term conditions. The 23 candidate items formed the first draft of the measure, currently named the Long-Term Conditions Questionnaire. The aim of this study was to refine the conceptual framework and develop items for a patient-reported outcome measure for long-term conditions, including single and multiple morbidities and physical and mental health conditions. Qualitative interviews identified the key themes for assessing outcomes in long-term conditions, and these underpinned the development of the initial draft of the measure. These initial items will undergo cognitive testing to refine the items prior to further validation in a survey.
Gabriel, Adel; Violato, Claudio
2009-01-01
Background To develop and psychometrically assess a multiple choice question (MCQ) instrument to test knowledge of depression and its treatments in patients suffering from depression. Methods A total of 63 depressed patients and twelve psychiatric experts participated. Based on empirical evidence from an extensive review, theoretical knowledge and in consultations with experts, 27-item MCQ knowledge of depression and its treatment test was constructed. Data collected from the psychiatry experts were used to assess evidence of content validity for the instrument. Results Cronbach's alpha of the instrument was 0.68, and there was an overall 87.8% agreement (items are highly relevant) between experts about the relevance of the MCQs to test patient knowledge on depression and its treatments. There was an overall satisfactory patients' performance on the MCQs with 78.7% correct answers. Results of an item analysis indicated that most items had adequate difficulties and discriminations. Conclusion There was adequate reliability and evidence for content and convergent validity for the instrument. Future research should employ a lager and more heterogeneous sample from both psychiatrist and community samples, than did the present study. Meanwhile, the present study has resulted in psychometrically tested instruments for measuring knowledge of depression and its treatment of depressed patients. PMID:19754944
Development of the outcome expectancy scale for self-care among periodontal disease patients.
Kakudate, Naoki; Morita, Manabu; Fukuhara, Shunichi; Sugai, Makoto; Nagayama, Masato; Isogai, Emiko; Kawanami, Masamitsu; Chiba, Itsuo
2011-12-01
The theory of self-efficacy states that specific efficacy expectations affect behaviour. Two types of efficacy expectations are described within the theory. Self-efficacy expectations are the beliefs in the capacity to perform a specific behaviour. Outcome expectations are the beliefs that carrying out a specific behaviour will lead to a desired outcome. To develop and examine the reliability and validity of an outcome expectancy scale for self-care (OESS) among periodontal disease patients. A 34-item scale was tested on 101 patients at a dental clinic. Accuracy was improved by item analysis, and internal consistency and test-retest stability were investigated. Concurrent validity was tested by examining associations of the OESS score with the self-efficacy scale for self-care (SESS) score and plaque index score. Construct validity was examined by comparing OESS scores between periodontal patients at initial visit (group 1) and those continuing maintenance care (group 2). Item analysis identified 13 items for the OESS. Factor analysis extracted three factors: social-, oral- and self-evaluative outcome expectancy. Cronbach's alpha coefficient for the OESS was 0.90. A significant association was observed between test and retest scores, and between the OESS and SESS and plaque index scores. Further, group 2 had a significantly higher mean OESS score than group 1. We developed a 13-item OESS with high reliability and validity which may be used to assess outcome expectancy for self-care. A patient's psychological condition with regard to behaviour and affective status can be accurately evaluated using the OESS with SESS. © 2011 Blackwell Publishing Ltd.
The ontogeny of serial-order behavior in humans (Homo sapiens): representation of a list.
Guyla, Michelle; Colombo, Michael
2004-03-01
The authors trained 3-, 4-, 7-, and 10-year-old children and adults (Homo sapiens) on a nonverbal serial-order task to respond to 5 items in a specific order. Knowledge of each item's sequential position was then examined using pairwise and triplet tests. Adults and 7- and 10-year-olds performed at high levels on both tests, whereas 3- and 4-year-olds did not. The latency to respond to the first item of a test pair or triplet was linearly related to that item's position in the training series for the 7- and 10-year-olds and adults, but not for the 3- and 4-year-olds. These data suggest that older children and adults, but not younger children, developed a well-integrated internal representation of the serial list. ((c) 2004 APA, all rights reserved)
Selecting Items for Criterion-Referenced Tests.
ERIC Educational Resources Information Center
Mellenbergh, Gideon J.; van der Linden, Wim J.
1982-01-01
Three item selection methods for criterion-referenced tests are examined: the classical theory of item difficulty and item-test correlation; the latent trait theory of item characteristic curves; and a decision-theoretic approach for optimal item selection. Item contribution to the standardized expected utility of mastery testing is discussed. (CM)
Assessing psychological well-being: self-report instruments for the NIH Toolbox.
Salsman, John M; Lai, Jin-Shei; Hendrie, Hugh C; Butt, Zeeshan; Zill, Nicholas; Pilkonis, Paul A; Peterson, Christopher; Stoney, Catherine M; Brouwers, Pim; Cella, David
2014-02-01
Psychological well-being (PWB) has a significant relationship with physical and mental health. As a part of the NIH Toolbox for the Assessment of Neurological and Behavioral Function, we developed self-report item banks and short forms to assess PWB. Expert feedback and literature review informed the selection of PWB concepts and the development of item pools for positive affect, life satisfaction, and meaning and purpose. Items were tested with a community-dwelling US Internet panel sample of adults aged 18 and above (N = 552). Classical and item response theory (IRT) approaches were used to evaluate unidimensionality, fit of items to the overall measure, and calibrations of those items, including differential item function (DIF). IRT-calibrated item banks were produced for positive affect (34 items), life satisfaction (16 items), and meaning and purpose (18 items). Their psychometric properties were supported based on the results of factor analysis, fit statistics, and DIF evaluation. All banks measured the concepts precisely (reliability ≥0.90) for more than 98% of participants. These adult scales and item banks for PWB provide the flexibility, efficiency, and precision necessary to promote future epidemiological, observational, and intervention research on the relationship of PWB with physical and mental health.
NASA Astrophysics Data System (ADS)
Arif, W.; Suhandi, A.; Kaniawati, I.; Setiawan, A.
2017-02-01
The development of scaffolding for evaluation instrument construction training program on the cognitive domain for senior high school physics teacher and the same level that is specified in the test instrument has been done. This development was motivated by the low ability of the majority of physics teachers in constructing the physics learning achievement test. This situation not in accordance with the demands of Permendiknas RI no. 16 tahun 2007 concerning the standard of academic qualifications and competence of teachers, stating that teachers should have a good ability to develop instruments for assessment and evaluation of process and learning outcomes. Based on the preliminary study results, it can be seen that the main cause of the inability of teachers in developing physics achievement test is because they do not good understand of the indicators for each aspect of cognitive domains. Scaffolding development is done by using the research and development methods formulated by Thiagarajan which includes define, design and develope steps. Develop step includes build the scaffolding, validation of scaffolding by experts and the limited pilot implementations on the training activities. From the build scaffolding step, resulted the scaffolding for the construction of test instruments training program which include the process steps; description of indicators, operationalization of indicators, construction the itemsframework (items scenarios), construction the items stem, construction the items and checking the items. The results of the validation by three validator indicates that the built scaffolding are suitable for use in the construction of physics achievement test training program, especially for novice. The limited pilot implementation of the built scaffolding conducted in training activities attended by 10 senior high school physics teachers in Garut district. The results of the limited pilot implementation shows that the built scaffolding have a medium effectiveness in improving the ability of senior high school physics teachers in constructing the physic achievement test instrument that is characterized by more than 70% of trainees achieve scores of test instruments construction of about 80 or more.
Nakada, Koji; Matsuhashi, Nobuyuki; Iwakiri, Katsuhiko; Oshio, Atsushi; Joh, Takashi; Higuchi, Kazuhide; Haruma, Ken
2017-01-01
AIM To evaluate the psychometric properties of a newly developed questionnaire, known as the gastroesophageal reflux and dyspepsia therapeutic efficacy and satisfaction test (GERD-TEST), in patients with GERD. METHODS Japanese patients with predominant GERD symptoms recruited according to the Montreal definition were treated for 4 wk using a standard dose of proton pump inhibitor (PPI). The GERD-TEST and the Medical Outcome Study Short Form-8 Health Survey (SF-8) were administered at baseline and after 4 wk of treatment. The GERD-TEST contains three domains: the severity of GERD and functional dyspepsia (FD) symptoms (5 items), the level of dissatisfaction with daily life (DS) (4 items), and the therapeutic efficacy as assessed by the patients and medication compliance (4 items). RESULTS A total of 290 patients were eligible at baseline; 198 of these patients completed 4 wk of PPI therapy. The internal consistency reliability as evaluated using the Cronbach’s α values for the GERD, FD and DS subscales ranged from 0.75 to 0.82. The scores for the GERD, FD and DS items/subscales were significantly correlated with the physical and mental component summary scores of the SF-8. After 4 wk of PPI treatment, the scores for the GERD items/subscales were greatly reduced, ranging in value from 1.51 to 1.87 and with a large effect size (P < 0.0001, Cohen’s d; 1.29-1.63). Statistically significant differences in the changes in the scores for the GERD items/subscales were observed between treatment responders and non-responders (P < 0.0001). CONCLUSION The GERD-TEST has a good reliability, a good convergent and concurrent validity, and is responsive to the effects of treatment. The GERD-TEST is a simple, easy to understand, and multifaceted PRO instrument applicable to both clinical trials and the primary care of GERD patients. PMID:28811716
Nakada, Koji; Matsuhashi, Nobuyuki; Iwakiri, Katsuhiko; Oshio, Atsushi; Joh, Takashi; Higuchi, Kazuhide; Haruma, Ken
2017-07-28
To evaluate the psychometric properties of a newly developed questionnaire, known as the gastroesophageal reflux and dyspepsia therapeutic efficacy and satisfaction test (GERD-TEST), in patients with GERD. Japanese patients with predominant GERD symptoms recruited according to the Montreal definition were treated for 4 wk using a standard dose of proton pump inhibitor (PPI). The GERD-TEST and the Medical Outcome Study Short Form-8 Health Survey (SF-8) were administered at baseline and after 4 wk of treatment. The GERD-TEST contains three domains: the severity of GERD and functional dyspepsia (FD) symptoms (5 items), the level of dissatisfaction with daily life (DS) (4 items), and the therapeutic efficacy as assessed by the patients and medication compliance (4 items). A total of 290 patients were eligible at baseline; 198 of these patients completed 4 wk of PPI therapy. The internal consistency reliability as evaluated using the Cronbach's α values for the GERD, FD and DS subscales ranged from 0.75 to 0.82. The scores for the GERD, FD and DS items/subscales were significantly correlated with the physical and mental component summary scores of the SF-8. After 4 wk of PPI treatment, the scores for the GERD items/subscales were greatly reduced, ranging in value from 1.51 to 1.87 and with a large effect size ( P < 0.0001, Cohen's d ; 1.29-1.63). Statistically significant differences in the changes in the scores for the GERD items/subscales were observed between treatment responders and non-responders ( P < 0.0001). The GERD-TEST has a good reliability, a good convergent and concurrent validity, and is responsive to the effects of treatment. The GERD-TEST is a simple, easy to understand, and multifaceted PRO instrument applicable to both clinical trials and the primary care of GERD patients.
Bookbinder, Marilyn; Hugodot, Amandine; Freeman, Katherine; Homel, Peter; Santiago, Elisabeth; Riggs, Alexa; Gavin, Maggie; Chu, Alice; Brady, Ellen; Lesage, Pauline; Portenoy, Russell K
2018-02-01
Quality improvement in end-of-life care generally acquires data from charts or caregivers. "Tracer" methodology, which assesses real-time information from multiple sources, may provide complementary information. The objective of this study was to develop a valid brief audit tool that can guide assessment and rate care when used in a clinician tracer to evaluate the quality of care for the dying patient. To identify items for a brief audit tool, 248 items were created to evaluate overall quality, quality in specific content areas (e.g., symptom management), and specific practices. Collected into three instruments, these items were used to interview professional caregivers and evaluate the charts of hospitalized patients who died. Evidence that this information could be validly captured using a small number of items was obtained through factor analyses, canonical correlations, and group comparisons. A nurse manager field tested tracer methodology using candidate items to evaluate the care provided to other patients who died. The survey of 145 deaths provided chart data and data from 445 interviews (26 physicians, 108 nurses, 18 social workers, and nine chaplains). The analyses yielded evidence of construct validity for a small number of items, demonstrating significant correlations between these items and content areas identified as latent variables in factor analyses. Criterion validity was suggested by significant differences in the ratings on these items between the palliative care unit and other units. The field test evaluated 127 deaths, demonstrated the feasibility of tracer methodology, and informed reworking of the candidate items into the 14-item Tracer EoLC v1. The Tracer EoLC v1 can be used with tracer methodology to guide the assessment and rate the quality of end-of-life care. Copyright © 2017 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
Jeyashree, Kathiresan; Shewade, Hemant Deepak; Kathirvel, Soundappan
2018-04-17
Dundee Ready Educational Environment Measure (DREEM) is a 50-item tool to assess the educational environment of medical institutions as perceived by the students. This cross-sectional study developed and validated an abridged version of the DREEM-50 with an aim to have a less resource-intensive (time, manpower), yet valid and reliable, version of DREEM-50 while also avoiding respondent fatigue. A methodology similar to that used in the development of WHO-BREF was adopted to develop the abridged version of DREEM. Medical students (n = 418) from a private teaching hospital in Madurai, India, were divided into two groups. Group I (n = 277) participated in the development of the abridged version. This was performed by domain-wise selection of items that had the highest item-total correlation. Group II (n = 141) participated in the testing of the abridged version for construct validity, internal consistency and test-retest reliability. Confirmatory factor analysis was performed to assess the construct validity of DREEM-12. The abridged version had 12 items (DREEM-12) spread over all five domains in DREEM-50. DREEM-12 explained 77.4% of the variance in DREEM-50 scores. Correlation between total scores of DREEM-50 and DREEM-12 was 0.88 (p < 0.001). Confirmatory factor analysis of DREEM-12 construct was statistically significant (LR test of model vs. saturated p = 0.0006). The internal consistency of DREEM-12 was 0.83. The test-retest reliability of DREEM-12 was 0.595, p < 0.001. DREEM-12 is a valid and reliable tool for use in educational research. Future research using DREEM-12 will establish its validity and reliability across different settings.
Challet-Bouju, Gaëlle; Perrot, Bastien; Romo, Lucia; Valleur, Marc; Magalon, David; Fatséas, Mélina; Chéreau-Boudet, Isabelle; Luquiens, Amandine; Grall-Bronnec, Marie; Hardouin, Jean-Benoit
2016-01-01
Background and aims The aim of this study was to test the screening properties of several combinations of items from gambling scales, in order to harmonize screening of gambling problems in epidemiological surveys. The objective was to propose two brief screening tools (three items or less) for a use in interviews and self-administered questionnaires. Methods We tested the screening properties of combinations of items from several gambling scales, in a sample of 425 gamblers (301 non-problem gamblers and 124 disordered gamblers). Items tested included interview-based items (Pathological Gambling section of the DSM-IV, lifetime history of problem gambling, monthly expenses in gambling, and abstinence of 1 month or more) and self-report items (South Oaks Gambling Screen, Gambling Attitudes, and Beliefs Survey). The gold standard used was the diagnosis of a gambling disorder according to the DSM-5. Results Two versions of the Rapid Screener for Problem Gambling (RSPG) were developed: the RSPG-Interview (RSPG-I), being composed of two interview items (increasing bets and loss of control), and the RSPG-Self-Assessment (RSPG-SA), being composed of three self-report items (chasing, guiltiness, and perceived inability to stop). Discussion and conclusions We recommend using the RSPG-SA/I for screening problem gambling in epidemiological surveys, with the version adapted for each purpose (RSPG-I for interview-based surveys and RSPG-SA for self-administered surveys). This first triage of potential problem gamblers must be supplemented by further assessment, as it may overestimate the proportion of problem gamblers. However, a first triage has the great advantage of saving time and energy in large-scale screening for problem gambling. PMID:27348558
Goldstein, Elizabeth; Farquhar, Marybeth; Crofton, Christine; Darby, Charles; Garfinkel, Steven
2005-12-01
To describe the developmental process for the CAHPS Hospital Survey. A pilot was conducted in three states with 19,720 hospital discharges. A rigorous, multi-step process was used to develop the CAHPS Hospital Survey. It included a public call for measures, multiple Federal Register notices soliciting public input, a review of the relevant literature, meetings with hospitals, consumers and survey vendors, cognitive interviews with consumer, a large-scale pilot test in three states and consumer testing and numerous small-scale field tests. The current version of the CAHPS Hospital Survey has survey items in seven domains, two overall ratings of the hospital and five items used for adjusting for the mix of patients across hospitals and for analytical purposes. The CAHPS Hospital Survey is a core set of questions that can be administered as a stand-alone questionnaire or combined with a broader set of hospital specific items.
Preliminary development of an ultrabrief two-item bedside test for delirium.
Fick, Donna M; Inouye, Sharon K; Guess, Jamey; Ngo, Long H; Jones, Richard N; Saczynski, Jane S; Marcantonio, Edward R
2015-10-01
Delirium is common, morbid, and costly, yet is greatly under-recognized among hospitalized older adults. To identify the best single and pair of mental status test items that predict the presence of delirium. Diagnostic test evaluation study that enrolled medicine inpatients aged 75 years or older at an academic medical center. Patients underwent a clinical reference standard assessment involving a patient interview, medical record review, and interviews with family members and nurses to determine the presence or absence of Diagnostic and Statistical Manual of Mental Disorders, 4th Edition defined delirium. Participants also underwent the three-dimensional Confusion Assessment Method (3D-CAM), a brief, validated assessment for delirium. Individual items and pairs of items from the 3D-CAM were evaluated to determine sensitivity and specificity relative to the reference standard delirium diagnosis. Of the 201 participants (mean age 84 years, 62% female), 42 (21%) had delirium based on the clinical reference standard. The single item with the best test characteristics was "months of the year backwards" with a sensitivity of 83% (95% confidence interval [CI]: 69%-93%) and specificity of 69% (95% CI: 61%-76%). The best 2-item screen was the combination of "months of the year backwards" and "what is the day of the week?" with a sensitivity of 93% (95% CI: 81%-99%) and specificity of 64% (95% CI: 56%-70%). We identified a single item with >80% and pair of items with >90% sensitivity for delirium. If validated prospectively, these items will serve as an initial innovative screening step for delirium identification in hospitalized older adults. © 2015 Society of Hospital Medicine.
Tarrant, Marie; Knierim, Aimee; Hayes, Sasha K; Ware, James
2006-12-01
Multiple-choice questions are a common assessment method in nursing examinations. Few nurse educators, however, have formal preparation in constructing multiple-choice questions. Consequently, questions used in baccalaureate nursing assessments often contain item-writing flaws, or violations to accepted item-writing guidelines. In one nursing department, 2770 MCQs were collected from tests and examinations administered over a five-year period from 2001 to 2005. Questions were evaluated for 19 frequently occurring item-writing flaws, for cognitive level, for question source, and for the distribution of correct answers. Results show that almost half (46.2%) of the questions contained violations of item-writing guidelines and over 90% were written at low cognitive levels. Only a small proportion of questions were teacher generated (14.1%), while 36.2% were taken from testbanks and almost half (49.4%) had no source identified. MCQs written at a lower cognitive level were significantly more likely to contain item-writing flaws. While there was no relationship between the source of the question and item-writing flaws, teacher-generated questions were more likely to be written at higher cognitive levels (p<0.001). Correct answers were evenly distributed across all four options and no bias was noted in the placement of correct options. Further training in item-writing is recommended for all faculty members who are responsible for developing tests. Pre-test review and quality assessment is also recommended to reduce the occurrence of item-writing flaws and to improve the quality of test questions.
NASA Astrophysics Data System (ADS)
Kachchaf, Rachel Rae
The purpose of this study was to compare how English language learners (ELLs) and monolingual English speakers solved multiple-choice items administered with and without a new form of testing accommodation---vignette illustration (VI). By incorporating theories from second language acquisition, bilingualism, and sociolinguistics, this study was able to gain more accurate and comprehensive input into the ways students interacted with items. This mixed methods study used verbal protocols to elicit the thinking processes of thirty-six native Spanish-speaking English language learners (ELLs), and 36 native-English speaking non-ELLs when solving multiple-choice science items. Results from both qualitative and quantitative analyses show that ELLs used a wider variety of actions oriented to making sense of the items than non-ELLs. In contrast, non-ELLs used more problem solving strategies than ELLs. There were no statistically significant differences in student performance based on the interaction of presence of illustration and linguistic status or the main effect of presence of illustration. However, there were significant differences based on the main effect of linguistic status. An interaction between the characteristics of the students, the items, and the illustrations indicates considerable heterogeneity in the ways in which students from both linguistic groups think about and respond to science test items. The results of this study speak to the need for more research involving ELLs in the process of test development to create test items that do not require ELLs to carry out significantly more actions to make sense of the item than monolingual students.
N’Diaye, Khadim; Evans, D. Gareth; Harris, Hilary; Tibben, Aad; van Asperen, Christi; Schmidtke, Joerg; Nippert, Irmgard; Mancini, Julien; Julian-Reynier, Claire
2017-01-01
Objective To develop a generic scale for assessing attitudes towards genetic testing and to psychometrically assess these attitudes in the context of BRCA1/2 among a sample of French general practitioners, breast specialists and gyneco-obstetricians. Study design and setting Nested within the questionnaire developed for the European InCRisC (International Cancer Risk Communication Study) project were 14 items assessing expected benefits (8 items) and drawbacks (6 items) of the process of breast/ovarian genetic cancer testing (BRCA1/2). Another item assessed agreement with the statement that, overall, the expected health benefits of BRCA1/2 testing exceeded its drawbacks, thereby justifying its prescription. The questionnaire was mailed to a sample of 1,852 French doctors. Of these, 182 breast specialists, 275 general practitioners and 294 gyneco-obstetricians completed and returned the questionnaire to the research team. Principal Component Analysis, Cronbach’s α coefficient, and Pearson’s correlation coefficients were used in the statistical analyses of collected data. Results Three dimensions emerged from the respondents’ responses, and were classified under the headings: “Anxiety, Conflict and Discrimination”, “Risk Information”, and “Prevention and Surveillance”. Cronbach’s α coefficient for the 3 dimensions was 0.79, 0.76 and 0.62, respectively, and each dimension exhibited strong correlation with the overall indicator of agreement (criterion validity). Conclusions The validation process of the 15 items regarding BRCA1/2 testing revealed satisfactory psychometric properties for the creation of a new scale entitled the Attitudes Towards Genetic Testing for BRCA1/2 (ATGT-BRCA1/2) Scale. Further testing is required to confirm the validity of this tool which could be used generically in other genetic contexts. PMID:28570656
2017-01-01
Objectives Few attempts have been made to develop a generic health-related quality of life (HRQoL) instrument and to examine its validity and reliability in Korea. We aimed to do this in our present study. Methods After a literature review of existing generic HRQoL instruments, a focus group discussion, in-depth interviews, and expert consultations, we selected 30 tentative items for a new HRQoL measure. These items were evaluated by assessing their ceiling effects, difficulty, and redundancy in the first survey. To validate the HRQoL instrument that was developed, known-groups validity and convergent/discriminant validity were evaluated and its test-retest reliability was examined in the second survey. Results Of the 30 items originally assessed for the HRQoL instrument, four were excluded due to high ceiling effects and six were removed due to redundancy. We ultimately developed a HRQoL instrument with a reduced number of 20 items, known as the Health-related Quality of Life Instrument with 20 items (HINT-20), incorporating physical, mental, social, and positive health dimensions. The results of the HINT-20 for known-groups validity were poorer in women, the elderly, and those with a low income. For convergent/discriminant validity, the correlation coefficients of items (except vitality) in the physical health dimension with the physical component summary of the Short Form 36 version 2 (SF-36v2) were generally higher than the correlations of those items with the mental component summary of the SF-36v2, and vice versa. Regarding test-retest reliability, the intraclass correlation coefficient of the total HINT-20 score was 0.813 (p<0.001). Conclusions A novel generic HRQoL instrument, the HINT-20, was developed for the Korean general population and showed acceptable validity and reliability. PMID:28173686
Stack, Rebecca J; Mallen, Christian D; Deighton, Chris; Kiely, Patrick; Shaw, Karen L; Booth, Alison; Kumar, Kanta; Thomas, Susan; Rowan, Ian; Horne, Rob; Nightingale, Peter; Herron-Marx, Sandy; Jinks, Clare; Raza, Karim
2015-12-01
Early treatment for rheumatoid arthritis (RA) is vital. However, people often delay in seeking help at symptom onset. An assessment of the reasons behind patient delay is necessary to develop interventions to promote rapid consultation. Using a mixed methods design, we aimed to develop and test a questionnaire to assess the barriers to help seeking at RA onset. Questionnaire items were extracted from previous qualitative studies. Fifteen people with a lived experience of arthritis participated in focus groups to enhance the questionnaire's face validity. The questionnaire was also reviewed by groups of multidisciplinary health-care professionals. A test-retest survey of 41 patients with newly presenting RA or unclassified arthritis assessed the questionnaire items' intraclass correlations. During focus groups, participants rephrased questions, added questions and deleted items not relevant to the questionnaire's aims. Participants organized items into themes: early symptom experience, initial reactions to symptoms, self-management behaviours, causal beliefs, involvement of significant others, pre-diagnosis knowledge about RA, direct barriers to seeking help and relationship with GP. The test-retest survey identified seven items (out of 79) with low intraclass correlations which were removed from the final questionnaire. The involvement of people with a lived experience of arthritis and multidisciplinary health-care professionals in the preliminary validation of the DELAY (delays in evaluating arthritis early) questionnaire has enriched its development. Preliminary assessment established its reliability. The DELAY questionnaire provides a tool for researchers to evaluate individual, cultural and health service barriers to help-seeking behaviour at RA onset. © 2014 John Wiley & Sons Ltd.
ERIC Educational Resources Information Center
Solano-Flores, Guillermo
2014-01-01
This article addresses validity and fairness in the testing of English language learners (ELLs)--students in the United States who are developing English as a second language. It discusses limitations of current approaches to examining the linguistic features of items and their effect on the performance of ELL students. The article submits that…
Ownby, Raymond L; Acevedo, Amarilis; Waldrop-Valverde, Drenna; Jacobs, Robin J; Caballero, Joshua; Davenport, Rosemary; Homs, Ana-Maria; Czaja, Sara J; Loewenstein, David
2013-01-01
Current measures of health literacy have been criticized on a number of grounds, including use of a limited range of content, development on small and atypical patient groups, and poor psychometric characteristics. In this paper, we report the development and preliminary validation of a new computer-administered and -scored health literacy measure addressing these limitations. Items in the measure reflect a wide range of content related to health promotion and maintenance as well as care for diseases. The development process has focused on creating a measure that will be useful in both Spanish and English, while not requiring substantial time for clinician training and individual administration and scoring. The items incorporate several formats, including questions based on brief videos, which allow for the assessment of listening comprehension and the skills related to obtaining information on the Internet. In this paper, we report the interim analyses detailing the initial development and pilot testing of the items (phase 1 of the project) in groups of Spanish and English speakers. We then describe phase 2, which included a second round of testing of the items, in new groups of Spanish and English speakers, and evaluation of the new measure's reliability and validity in relation to other measures. Data are presented that show that four scales (general health literacy, numeracy, conceptual knowledge, and listening comprehension), developed through a process of item and factor analyses, have significant relations to existing measures of health literacy.
Quality of prenatal care questionnaire: instrument development and testing.
Heaman, Maureen I; Sword, Wendy A; Akhtar-Danesh, Noori; Bradford, Amanda; Tough, Suzanne; Janssen, Patricia A; Young, David C; Kingston, Dawn A; Hutton, Eileen K; Helewa, Michael E
2014-06-03
Utilization indices exist to measure quantity of prenatal care, but currently there is no published instrument to assess quality of prenatal care. The purpose of this study was to develop and test a new instrument, the Quality of Prenatal Care Questionnaire (QPCQ). Data for this instrument development study were collected in five Canadian cities. Items for the QPCQ were generated through interviews with 40 pregnant women and 40 health care providers and a review of prenatal care guidelines, followed by assessment of content validity and rating of importance of items. The preliminary 100-item QPCQ was administered to 422 postpartum women to conduct item reduction using exploratory factor analysis. The final 46-item version of the QPCQ was then administered to another 422 postpartum women to establish its construct validity, and internal consistency and test-retest reliability. Exploratory factor analysis reduced the QPCQ to 46 items, factored into 6 subscales, which subsequently were validated by confirmatory factor analysis. Construct validity was also demonstrated using a hypothesis testing approach; there was a significant positive association between women's ratings of the quality of prenatal care and their satisfaction with care (r = 0.81). Convergent validity was demonstrated by a significant positive correlation (r = 0.63) between the "Support and Respect" subscale of the QPCQ and the "Respectfulness/Emotional Support" subscale of the Prenatal Interpersonal Processes of Care instrument. The overall QPCQ had acceptable internal consistency reliability (Cronbach's alpha = 0.96), as did each of the subscales. The test-retest reliability result (Intra-class correlation coefficient = 0.88) indicated stability of the instrument on repeat administration approximately one week later. Temporal stability testing confirmed that women's ratings of their quality of prenatal care did not change as a result of giving birth or between the early postpartum period and 4 to 6 weeks postpartum. The QPCQ is a valid and reliable instrument that will be useful in future research as an outcome measure to compare quality of care across geographic regions, populations, and service delivery models, and to assess the relationship between quality of care and maternal and infant health outcomes.
ERIC Educational Resources Information Center
Samejima, Fumiko
In latent trait theory the latent space, or space of the hypothetical construct, is usually represented by some unidimensional or multi-dimensional continuum of real numbers. Like the latent space, the item response can either be treated as a discrete variable or as a continuous variable. Latent trait theory relates the item response to the latent…
Developing an item bank to measure the coping strategies of people with hereditary retinal diseases.
Prem Senthil, Mallika; Khadka, Jyoti; De Roach, John; Lamey, Tina; McLaren, Terri; Campbell, Isabella; Fenwick, Eva K; Lamoureux, Ecosse L; Pesudovs, Konrad
2018-05-05
Our understanding of the coping strategies used by people with visual impairment to manage stress related to visual loss is limited. This study aims to develop a sophisticated coping instrument in the form of an item bank implemented via Computerised adaptive testing (CAT) for hereditary retinal diseases. Items on coping were extracted from qualitative interviews with patients which were supplemented by items from a literature review. A systematic multi-stage process of item refinement was carried out followed by expert panel discussion and cognitive interviews. The final coping item bank had 30 items. Rasch analysis was used to assess the psychometric properties. A CAT simulation was carried out to estimate an average number of items required to gain precise measurement of hereditary retinal disease-related coping. One hundred eighty-nine participants answered the coping item bank (median age = 58 years). The coping scale demonstrated good precision and targeting. The standardised residual loadings for items revealed six items grouped together. Removal of the six items reduced the precision of the main coping scale and worsened the variance explained by the measure. Therefore, the six items were retained within the main scale. Our CAT simulation indicated that, on average, less than 10 items are required to gain a precise measurement of coping. This is the first study to develop a psychometrically robust coping instrument for hereditary retinal diseases. CAT simulation indicated that on an average, only four and nine items were required to gain measurement at moderate and high precision, respectively.
Development and psychometric characteristics of the SCI-QOL Pressure Ulcers scale and short form
Kisala, Pamela A.; Tulsky, David S.; Choi, Seung W.; Kirshblum, Steven C.
2015-01-01
Objective To develop a self-reported measure of the subjective impact of pressure ulcers on health-related quality of life (HRQOL) in individuals with spinal cord injury (SCI) as part of the SCI quality of life (SCI-QOL) measurement system. Design Grounded-theory based qualitative item development methods, large-scale item calibration testing, confirmatory factor analysis (CFA), and item response theory-based psychometric analysis. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Main Outcome Measures SCI-QOL Pressure Ulcers scale. Results 189 individuals with traumatic SCI who experienced a pressure ulcer within the past 7 days completed 30 items related to pressure ulcers. CFA confirmed a unidimensional pool of items. IRT analyses were conducted. A constrained Graded Response Model with a constant slope parameter was used to estimate item thresholds for the 12 retained items. Conclusions The 12-item SCI-QOL Pressure Ulcers scale is unique in that it is specifically targeted to individuals with spinal cord injury and at every stage of development has included input from individuals with SCI. Furthermore, use of CFA and IRT methods provide flexibility and precision of measurement. The scale may be administered in its entirety or as a 7-item “short form” and is available for both research and clinical practice. PMID:26010965
Quality of life in patients with Parkinson's disease: development of a questionnaire.
de Boer, A G; Wijker, W; Speelman, J D; de Haes, J C
1996-01-01
OBJECTIVES--To develop and test a questionnaire for measuring quality of life in patients with Parkinson's disease. METHODS--An item pool was developed based on the experience of patients with Parkinson's disease and of neurologists; medical literature on the problems of patients with Parkinson's disease; and other quality of life questionnaires. To reduce the item pool, 13 patients identified items that were a problem to them and rated their importance. Items which were most often chosen and rated most important were included in the Parkinson's disease quality of life questionnaire (PDQL). The PDQL consists of 37 items. To evaluate the discriminant validity of the PDQL three groups of severity of disease were compared. To test for convergent validity, the scores of the PDQL were tested for correlation with standard indices of quality of life. RESULTS--The PDQL was filled out by 384 patients with Parkinson's disease. It consisted of four subscales: parkinsonian symptoms, systemic symptoms, emotional functioning, and social functioning. The internal-consistency reliability coefficients of the PDQL subscales were high (0.80-0.87). Patients with higher disease severity had significantly lower quality of life on all PDQL subscales (P < 0.05). Almost all PDQL subscales correlated highly (P < 0.001) with the corresponding scales of the standard quality of life indices. CONCLUSION--The PDQL is a relevant, reliable, and valid measure of the quality of life of patients with Parkinson's disease. Images PMID:8676165
Davies, Louise; Donnelly, Kyla Z; Goodman, Daisy J; Ogrinc, Greg
2016-04-01
The Standards for Quality Improvement Reporting Excellence (SQUIRE) Guideline was published in 2008 (SQUIRE 1.0) and was the first publication guideline specifically designed to advance the science of healthcare improvement. Advances in the discipline of improvement prompted us to revise it. We adopted a novel approach to the revision by asking end-users to 'road test' a draft version of SQUIRE 2.0. The aim was to determine whether they understood and implemented the guidelines as intended by the developers. Forty-four participants were assigned a manuscript section (ie, introduction, methods, results, discussion) and asked to use the draft Guidelines to guide their writing process. They indicated the text that corresponded to each SQUIRE item used and submitted it along with a confidential survey. The survey examined usability of the Guidelines using Likert-scaled questions and participants' interpretation of key concepts in SQUIRE using open-ended questions. On the submitted text, we evaluated concordance between participants' item usage/interpretation and the developers' intended application. For the survey, the Likert-scaled responses were summarised using descriptive statistics and the open-ended questions were analysed by content analysis. Consistent with the SQUIRE Guidelines' recommendation that not every item be included, less than one-third (n=14) of participants applied every item in their section in full. Of the 85 instances when an item was partially used or was omitted, only 7 (8.2%) of these instances were due to participants not understanding the item. Usage of Guideline items was highest for items most similar to standard scientific reporting (ie, 'Specific aim of the improvement' (introduction), 'Description of the improvement' (methods) and 'Implications for further studies' (discussion)) and lowest (<20% of the time) for those unique to healthcare improvement (ie, 'Assessment methods for context factors that contributed to success or failure' and 'Costs and strategic trade-offs'). Items unique to healthcare improvement, specifically 'Evolution of the improvement', 'Context elements that influenced the improvement', 'The logic on which the improvement was based', 'Process and outcome measures', demonstrated poor concordance between participants' interpretation and developers' intended application. User testing of a draft version of SQUIRE 2.0 revealed which items have poor concordance between developer intent and author usage, which will inform final editing of the Guideline and development of supporting supplementary materials. It also identified the items that require special attention when teaching about scholarly writing in healthcare improvement. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Development of a Questionnaire in Order To Identify Test Anxiety in Nursing Students.
ERIC Educational Resources Information Center
Carraway, Cassandra Todd
It has been repeatedly demonstrated that persons who experience a high degree of test anxiety also experience decrements in performance in evaluative situations. A study was conducted to develop a test anxiety questionnaire for student nurses in order to identify test anxiety. A 40-item, self-report questionaire was developed by two panels of…
2013-01-01
Background In light of its epidemic proportions in developed and developing countries, obesity is considered a serious public health issue. In order to increase knowledge concerning the ability of health care professionals in caring for obese adolescents and adopt more efficient preventive and control measures, a questionnaire was developed and validated to assess non-dietitian health professionals regarding their Knowledge of Nutrition in Obese Adolescents (KNOA). Methods The development and evaluation of a questionnaire to assess the knowledge of primary care practitioners with respect to nutrition in obese adolescents was carried out in five phases, as follows: 1) definition of study dimensions 2) development of 42 questions and preliminary evaluation of the questionnaire by a panel of experts; 3) characterization and selection of primary care practitioners (35 dietitians and 265 non-dietitians) and measurement of questionnaire criteria by contrasting the responses of dietitians and non-dietitians; 4) reliability assessment by question exclusion based on item difficulty (too easy and too difficult for non-dietitian practitioners), item discrimination, internal consistency and reproducibility index determination; and 5) scoring the completed questionnaires. Results Dietitians obtained higher scores than non-dietitians (Mann–Whitney U test, P < 0.05), confirming the validity of the questionnaire criteria. Items were discriminated by correlating the score for each item with the total score, using a minimum of 0.2 as a correlation coefficient cutoff value. Item difficulty was controlled by excluding questions answered correctly by more than 90% of the non-dietitian subjects (too easy) or by less than 10% of them (too difficult). The final questionnaire contained 26 of the original 42 questions, increasing Cronbach’s α value from 0.788 to 0.807. Test-retest agreement between respondents was classified as good to very good (Kappa test, >0.60). Conclusion The KNOA questionnaire developed for primary care practitioners is a valid, consistent and suitable instrument that can be applied over time, making it a promising tool for developing and guiding public health policies. PMID:23865564
Dür, Mona; Steiner, Günter; Fialka-Moser, Veronika; Kautzky-Willer, Alexandra; Dejaco, Clemens; Prodinger, Birgit; Stoffer, Michaela Alexandra; Binder, Alexa; Smolen, Josef; Stamm, Tanja Alexandra
2014-04-05
Self-reported outcome instruments in health research have become increasingly important over the last decades. Occupational therapy interventions often focus on occupational balance. However, instruments to measure occupational balance are scarce. The aim of the study was therefore to develop a generic self-reported outcome instrument to assess occupational balance based on the experiences of patients and healthy people including an examination of its psychometric properties. We conducted a qualitative analysis of the life stories of 90 people with and without chronic autoimmune diseases to identify components of occupational balance. Based on these components, the Occupational Balance-Questionnaire (OB-Quest) was developed. Construct validity and internal consistency of the OB-Quest were examined in quantitative data. We used Rasch analyses to determine overall fit of the items to the Rasch model, person separation index and potential differential item functioning. Dimensionality testing was conducted by the use of t-tests and Cronbach's alpha. The following components emerged from the qualitative analyses: challenging and relaxing activities, activities with acknowledgement by the individual and by the sociocultural context, impact of health condition on activities, involvement in stressful activities and fewer stressing activities, rest and sleep, variety of activities, adaptation of activities according to changed living conditions and activities intended to care for oneself and for others. Based on these, the seven items of the questionnaire (OB-Quest) were developed. 251 people (132 with rheumatoid arthritis, 43 with systematic lupus erythematous and 76 healthy) filled in the OB-Quest. Dimensionality testing indicated multidimensionality of the questionnaire (t = 0.58, and 1.66 after item reduction, non-significant). The item on the component rest and sleep showed differential item functioning (health condition and age). Person separation index was 0.51. Cronbach's alpha changed from 0.38 to 0.57 after deleting two items. This questionnaire includes new items addressing components of occupational balance meaningful to patients and healthy people which have not been measured so far. The reduction of two items of the OB-Quest showed improved internal consistency. The multidimensionality of the questionnaire indicates the need for a summary of several components into subscales.
2014-01-01
Background Self-reported outcome instruments in health research have become increasingly important over the last decades. Occupational therapy interventions often focus on occupational balance. However, instruments to measure occupational balance are scarce. The aim of the study was therefore to develop a generic self-reported outcome instrument to assess occupational balance based on the experiences of patients and healthy people including an examination of its psychometric properties. Methods We conducted a qualitative analysis of the life stories of 90 people with and without chronic autoimmune diseases to identify components of occupational balance. Based on these components, the Occupational Balance-Questionnaire (OB-Quest) was developed. Construct validity and internal consistency of the OB-Quest were examined in quantitative data. We used Rasch analyses to determine overall fit of the items to the Rasch model, person separation index and potential differential item functioning. Dimensionality testing was conducted by the use of t-tests and Cronbach’s alpha. Results The following components emerged from the qualitative analyses: challenging and relaxing activities, activities with acknowledgement by the individual and by the sociocultural context, impact of health condition on activities, involvement in stressful activities and fewer stressing activities, rest and sleep, variety of activities, adaptation of activities according to changed living conditions and activities intended to care for oneself and for others. Based on these, the seven items of the questionnaire (OB-Quest) were developed. 251 people (132 with rheumatoid arthritis, 43 with systematic lupus erythematous and 76 healthy) filled in the OB-Quest. Dimensionality testing indicated multidimensionality of the questionnaire (t = 0.58, and 1.66 after item reduction, non-significant). The item on the component rest and sleep showed differential item functioning (health condition and age). Person separation index was 0.51. Cronbach’s alpha changed from 0.38 to 0.57 after deleting two items. Conclusions This questionnaire includes new items addressing components of occupational balance meaningful to patients and healthy people which have not been measured so far. The reduction of two items of the OB-Quest showed improved internal consistency. The multidimensionality of the questionnaire indicates the need for a summary of several components into subscales. PMID:24708642
ERIC Educational Resources Information Center
Burns, Daniel J.; Martens, Nicholas J.; Bertoni, Alicia A.; Sweeney, Emily J.; Lividini, Michelle D.
2006-01-01
In a repeated testing paradigm, list items receiving item-specific processing are more likely to be recovered across successive tests (item gains), whereas items receiving relational processing are likely to be forgotten progressively less on successive tests. Moreover, analysis of cumulative-recall curves has shown that item-specific processing…
A New Clinical Pain Knowledge Test for Nurses: Development and Psychometric Evaluation.
Bernhofer, Esther I; St Marie, Barbara; Bena, James F
2017-08-01
All nurses care for patients with pain, and pain management knowledge and attitude surveys for nurses have been around since 1987. However, no validated knowledge test exists to measure postlicensure clinicians' knowledge of the core competencies of pain management in current complex patient populations. To develop and test the psychometric properties of an instrument designed to measure pain management knowledge of postlicensure nurses. Psychometric instrument validation. Four large Midwestern U.S. hospitals. Registered nurses employed full time and part time August 2015 to April 2016, aged M = 43.25 years; time as RN, M = 16.13 years. Prospective survey design using e-mail to invite nurses to take an electronic multiple choice pain knowledge test. Content validity of initial 36-item test "very good" (95.1% agreement). Completed tests that met analysis criteria, N = 747. Mean initial test score, 69.4% correct (range 27.8-97.2). After revision/removal of 13 unacceptable questions, mean test score was 50.4% correct (range 8.7-82.6). Initial test item percent difficulty range was 15.2%-98.1%; discrimination values range, 0.03-0.50; final test item percent difficulty range, 17.6%-91.1%, discrimination values range, -0.04 to 1.04. Split-half reliability final test was 0.66. A high decision consistency reliability was identified, with test cut-score of 75%. The final 23-item Clinical Pain Knowledge Test has acceptable discrimination, difficulty, decision consistency, reliability, and validity in the general clinical inpatient nurse population. This instrument will be useful in assessing pain management knowledge of clinical nurses to determine gaps in education, evaluate knowledge after pain management education, and measure research outcomes. Copyright © 2017 American Society for Pain Management Nursing. Published by Elsevier Inc. All rights reserved.
Validation of Physics Standardized Test Items
NASA Astrophysics Data System (ADS)
Marshall, Jill
2008-10-01
The Texas Physics Assessment Team (TPAT) examined the Texas Assessment of Knowledge and Skills (TAKS) to determine whether it is a valid indicator of physics preparation for future course work and employment, and of the knowledge and skills needed to act as an informed citizen in a technological society. We categorized science items from the 2003 and 2004 10th and 11th grade TAKS by content area(s) covered, knowledge and skills required to select the correct answer, and overall quality. We also analyzed a 5000 student sample of item-level results from the 2004 11th grade exam using standard statistical methods employed by test developers (factor analysis and Item Response Theory). Triangulation of our results revealed strengths and weaknesses of the different methods of analysis. The TAKS was found to be only weakly indicative of physics preparation and we make recommendations for increasing the validity of standardized physics testing..
The development of a quality appraisal tool for studies of diagnostic reliability (QAREL).
Lucas, Nicholas P; Macaskill, Petra; Irwig, Les; Bogduk, Nikolai
2010-08-01
In systematic reviews of the reliability of diagnostic tests, no quality assessment tool has been used consistently. The aim of this study was to develop a specific quality appraisal tool for studies of diagnostic reliability. Key principles for the quality of studies of diagnostic reliability were identified with reference to epidemiologic principles, existing quality appraisal checklists, and the Standards for Reporting of Diagnostic Accuracy (STARD) and Quality Assessment of Diagnostic Accuracy Studies (QUADAS) resources. Specific items that encompassed each of the principles were developed. Experts in diagnostic research provided feedback on the items that were to form the appraisal tool. This process was iterative and continued until consensus among experts was reached. The Quality Appraisal of Reliability Studies (QAREL) checklist includes 11 items that explore seven principles. Items cover the spectrum of subjects, spectrum of examiners, examiner blinding, order effects of examination, suitability of the time interval among repeated measurements, appropriate test application and interpretation, and appropriate statistical analysis. QAREL has been developed as a specific quality appraisal tool for studies of diagnostic reliability. The reliability of this tool in different contexts needs to be evaluated. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Development of a health literacy assessment for young adult college students: a pilot study.
Harper, Raquel
2014-01-01
The purpose of this study was to develop a comprehensive health literacy assessment tool for young adult college students. Participants were 144 undergraduate students. Two hundred and twenty-nine questions were developed, which were based on concepts identified by the US Department of Health and Human Services, the World Health Organization, and health communication scholars. Four health education experts reviewed this pool of items and helped select 87 questions for testing. Students completed an online assessment consisting of these 87 questions in June and October of 2012. Item response theory and goodness-of-fit values were used to help eliminate nonperforming questions. Fifty-one questions were selected based on good item response theory discrimination parameter values. The instrument has 51 questions that look promising for measuring health literacy in college students, but needs additional testing with a larger student population to see how these questions continue to perform.
Development of the Systems Thinking Scale for Adolescent Behavior Change.
Moore, Shirley M; Komton, Vilailert; Adegbite-Adeniyi, Clara; Dolansky, Mary A; Hardin, Heather K; Borawski, Elaine A
2018-03-01
This report describes the development and psychometric testing of the Systems Thinking Scale for Adolescent Behavior Change (STS-AB). Following item development, initial assessments of understandability and stability of the STS-AB were conducted in a sample of nine adolescents enrolled in a weight management program. Exploratory factor analysis of the 16-item STS-AB and internal consistency assessments were then done with 359 adolescents enrolled in a weight management program. Test-retest reliability of the STS-AB was .71, p = .03; internal consistency reliability was .87. Factor analysis of the 16-item STS-AB indicated a one-factor solution with good factor loadings, ranging from .40 to .67. Evidence of construct validity was supported by significant correlations with established measures of variables associated with health behavior change. We provide beginning evidence of the reliability and validity of the STS-AB to measure systems thinking for health behavior change in young adolescents.
Ni, Pengsheng; McDonough, Christine M.; Jette, Alan M.; Bogusz, Kara; Marfeo, Elizabeth E.; Rasch, Elizabeth K.; Brandt, Diane E.; Meterko, Mark; Chan, Leighton
2014-01-01
Objectives To develop and test an instrument to assess physical function (PF) for Social Security Administration (SSA) disability programs, the SSA-PF. Item Response Theory (IRT) analyses were used to 1) create a calibrated item bank for each of the factors identified in prior factor analyses, 2) assess the fit of the items within each scale, 3) develop separate Computer-Adaptive Test (CAT) instruments for each scale, and 4) conduct initial psychometric testing. Design Cross-sectional data collection; IRT analyses; CAT simulation. Setting Telephone and internet survey. Participants Two samples: 1,017 SSA claimants, and 999 adults from the US general population. Interventions None. Main Outcome Measure Model fit statistics, correlation and reliability coefficients, Results IRT analyses resulted in five unidimensional SSA-PF scales: Changing & Maintaining Body Position, Whole Body Mobility, Upper Body Function, Upper Extremity Fine Motor, and Wheelchair Mobility for a total of 102 items. High CAT accuracy was demonstrated by strong correlations between simulated CAT scores and those from the full item banks. Comparing the simulated CATs to the full item banks, very little loss of reliability or precision was noted, except at the lower and upper ranges of each scale. No difference in response patterns by age or sex was noted. The distributions of claimant scores were shifted to the lower end of each scale compared to those of a sample of US adults. Conclusions The SSA-PF instrument contributes important new methodology for measuring the physical function of adults applying to the SSA disability programs. Initial evaluation revealed that the SSA-PF instrument achieved considerable breadth of coverage in each content domain and demonstrated noteworthy psychometric properties. PMID:23578594
Development and validation of the German version of the Orofacial Esthetic Scale.
Reissmann, Daniel R; Benecke, Andreas W; Aarabi, Ghazal; Sierwald, Ira
2015-07-01
This study aimed to develop the German version of the Orofacial Esthetic Scale (OES-G) and to assess its psychometric properties. The OES is an eight-item instrument with seven items directly addressing esthetic impacts of the orofacial region and an eighth item for a global assessment. It applies an 11-point ordinal rating scale, with summary scores ranging from 0 (worst) to 70 (best). The original OES items were translated into German using a forward-backward method. A de novo development of German items (n = 21 patients) and a cross-cultural adaptation after pilot testing (n = 15 patients) established content validity. Internal consistency and construct validity (structural, convergent, known-groups) of the OES-G were assessed in a sample of 165 prosthodontic patients. The OES was applied in 42 patients on two occasions, with a temporal distance of 2-4 weeks apart to determine test-retest reliability. Internal consistency of the OES-G was considered as satisfactory (Cronbach's alpha 0.94; average inter-item correlation 0.64). Intraclass correlation coefficient of 0.95 (95 % confidence interval 0.92-0.98) indicated excellent test-retest reliability. Correlation matrix and exploratory factor analysis provided support for unidimensionality of the measured construct. The OES-G summary score was correlated with the patients' global assessment of their esthetics (r = 0.87) and external ratings of the expert group (r = 0.55) and discriminated patients with treatment need (39.4 points) from patients without (58.4 points; p < 0.001) and with a large effect size. The OES-G has good psychometric properties and is a valuable instrument for the assessment of self-perceived orofacial esthetics.
Ni, Pengsheng; McDonough, Christine M; Jette, Alan M; Bogusz, Kara; Marfeo, Elizabeth E; Rasch, Elizabeth K; Brandt, Diane E; Meterko, Mark; Haley, Stephen M; Chan, Leighton
2013-09-01
To develop and test an instrument to assess physical function for Social Security Administration (SSA) disability programs, the SSA-Physical Function (SSA-PF) instrument. Item response theory (IRT) analyses were used to (1) create a calibrated item bank for each of the factors identified in prior factor analyses, (2) assess the fit of the items within each scale, (3) develop separate computer-adaptive testing (CAT) instruments for each scale, and (4) conduct initial psychometric testing. Cross-sectional data collection; IRT analyses; CAT simulation. Telephone and Internet survey. Two samples: SSA claimants (n=1017) and adults from the U.S. general population (n=999). None. Model fit statistics, correlation, and reliability coefficients. IRT analyses resulted in 5 unidimensional SSA-PF scales: Changing & Maintaining Body Position, Whole Body Mobility, Upper Body Function, Upper Extremity Fine Motor, and Wheelchair Mobility for a total of 102 items. High CAT accuracy was demonstrated by strong correlations between simulated CAT scores and those from the full item banks. On comparing the simulated CATs with the full item banks, very little loss of reliability or precision was noted, except at the lower and upper ranges of each scale. No difference in response patterns by age or sex was noted. The distributions of claimant scores were shifted to the lower end of each scale compared with those of a sample of U.S. adults. The SSA-PF instrument contributes important new methodology for measuring the physical function of adults applying to the SSA disability programs. Initial evaluation revealed that the SSA-PF instrument achieved considerable breadth of coverage in each content domain and demonstrated noteworthy psychometric properties. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Development and Validation of the Spanish Numeracy Understanding in Medicine Instrument.
Jacobs, Elizabeth A; Walker, Cindy M; Miller, Tamara; Fletcher, Kathlyn E; Ganschow, Pamela S; Imbert, Diana; O'Connell, Maria; Neuner, Joan M; Schapira, Marilyn M
2016-11-01
The Spanish-speaking population in the U.S. is large and growing and is known to have lower health literacy than the English-speaking population. Less is known about the health numeracy of this population due to a lack of health numeracy measures in Spanish. we aimed to develop and validate a short and easy to use measure of health numeracy for Spanish-speaking adults: the Spanish Numeracy Understanding in Medicine Instrument (Spanish-NUMi). Items were generated based on qualitative studies in English- and Spanish-speaking adults and translated into Spanish using a group translation and consensus process. Candidate items for the Spanish NUMi were selected from an eight-item validated English Short NUMi. Differential Item Functioning (DIF) was conducted to evaluate equivalence between English and Spanish items. Cronbach's alpha was computed as a measure of reliability and a Pearson's correlation was used to evaluate the association between test scores and the Spanish Test of Functional Health Literacy (S-TOFHLA) and education level. Two-hundred and thirty-two Spanish-speaking Chicago residents were included in the study. The study population was diverse in age, gender, and level of education and 70 % reported Mexico as their country of origin. Two items of the English eight-item Short NUMi demonstrated DIF and were dropped. The resulting six-item test had a Cronbach's alpha of 0.72, a range of difficulty using classical test statistics (percent correct: 0.48 to 0.86), and adequate discrimination (item-total score correlation: 0.34-0.49). Scores were positively correlated with print literacy as measured by the S- TOFHLA (r = 0.67; p < 0.001) and varied as predicted across grade level; mean scores for up to eighth grade, ninth through twelfth grade, and some college experience or more, respectively, were 2.48 (SD ± 1.64), 4.15 (SD ± 1.45), and 4.82 (SD ± 0.37). The Spanish NUMi is a reliable and valid measure of important numerical concepts used in communicating health information.
Lohse, Barbara; Satter, Ellyn; Arnold, Kristen
2014-04-01
Accurate early assessment and targeted intervention with problematic parent/child feeding dynamics is critical for the prevention and treatment of child obesity. The division of responsibility in feeding (sDOR), articulated by the Satter Feeding Dynamics Model (fdSatter), has been demonstrated clinically as an effective approach to reduce child feeding problems, including those leading to obesity. Lack of a tested instrument to examine adherence to fdSatter stimulated initial construction of the Satter Feeding Dynamics Inventory (fdSI). The aim of this project was to refine the item pool to establish translational validity, making the fdSI suitable for advanced psychometric analysis. Cognitive interviews (n = 80) with caregivers of varied socioeconomic strata informed revisions that demonstrated face and content validity. fdSI responses were mapped to interviews using an iterative, multi-phase thematic approach to provide an instrument ready for construct validation. fdSI development required five interview phases over 32 months: Foundational; Refinement; Transitional; Assurance; and Launching. Each phase was associated with item reduction and revision. Thirteen items were removed from the 38-item Foundational phase and seven were revised in the Refinement phase. Revisions, deletions, and additions prompted by Transitional and Assurance phase interviews resulted in the 15-item Launching phase fdSI. Only one Foundational phase item was carried through all development phases, emphasizing the need to test for item comprehension and interpretation before psychometric analyses. Psychometric studies of item pools without encrypted meanings will facilitate progress toward a tool that accurately detects adherence to sDOR. Ability to measure sDOR will facilitate focus on feeding behaviors associated with reduced risk of childhood obesity.
Petrillo, Jennifer; Cano, Stefan J; McLeod, Lori D; Coon, Cheryl D
2015-01-01
To provide comparisons and a worked example of item- and scale-level evaluations based on three psychometric methods used in patient-reported outcome development-classical test theory (CTT), item response theory (IRT), and Rasch measurement theory (RMT)-in an analysis of the National Eye Institute Visual Functioning Questionnaire (VFQ-25). Baseline VFQ-25 data from 240 participants with diabetic macular edema from a randomized, double-masked, multicenter clinical trial were used to evaluate the VFQ at the total score level. CTT, RMT, and IRT evaluations were conducted, and results were assessed in a head-to-head comparison. Results were similar across the three methods, with IRT and RMT providing more detailed diagnostic information on how to improve the scale. CTT led to the identification of two problematic items that threaten the validity of the overall scale score, sets of redundant items, and skewed response categories. IRT and RMT additionally identified poor fit for one item, many locally dependent items, poor targeting, and disordering of over half the response categories. Selection of a psychometric approach depends on many factors. Researchers should justify their evaluation method and consider the intended audience. If the instrument is being developed for descriptive purposes and on a restricted budget, a cursory examination of the CTT-based psychometric properties may be all that is possible. In a high-stakes situation, such as the development of a patient-reported outcome instrument for consideration in pharmaceutical labeling, however, a thorough psychometric evaluation including IRT or RMT should be considered, with final item-level decisions made on the basis of both quantitative and qualitative results. Copyright © 2015. Published by Elsevier Inc.
Rosenfeld, Barry; Pessin, Hayley; Lewis, Charles; Abbey, Jennifer; Olden, Megan; Sachs, Emily; Amakawa, Lia; Kolva, Elissa; Brescia, Robert; Breitbart, William
2013-01-01
Hopelessness has become an increasingly important construct in palliative care research, yet concerns exist regarding the utility of existing measures when applied to patients with a terminal illness. This article describes a series of studies focused on the exploration, development, and analysis of a measure of hopelessness specifically intended for use with terminally ill cancer patients. The 1st stage of measure development involved interviews with 13 palliative care experts and 30 terminally ill patients. Qualitative analysis of the patient interviews culminated in the development of a set of potential questionnaire items. In the 2nd study phase, we evaluated these preliminary items with a sample of 314 participants, using item response theory and classical test theory to identify optimal items and response format. These analyses generated an 8-item measure that we tested in a final study phase, using a 3rd sample (n = 228) to assess reliability and concurrent validity. These analyses demonstrated strong support for the Hopelessness Assessment in Illness Questionnaire providing greater explanatory power than existing measures of hopelessness and found little evidence that this assessment was confounded by illness-related variables (e.g., prognosis). In summary, these 3 studies suggest that this brief measure of hopelessness is particularly useful for palliative care settings. Further research is needed to assess the applicability of the measure to other populations and contexts. PMID:21443366
Arraras, Juan Ignacio; Wintner, Lisa M; Sztankay, Monika; Tomaszewski, Krzysztof A; Hofmeister, Dirk; Costantini, Anna; Bredart, Anne; Young, Teresa; Kuljanic, Karin; Tomaszewska, Iwona M; Kontogianni, Meropi; Chie, Wei-Chu; Kulis, Dagmara; Greimel, Eva
2017-05-01
Communication between patients and professionals is one major aspect of the support offered to cancer patients. The European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Group (QLG) has developed a cancer-specific instrument for the measurement of different issues related to the communication between cancer patients and their health care professionals. Questionnaire development followed the EORTC QLG Module Development Guidelines. A provisional questionnaire was pre-tested (phase III) in a multicenter study within ten countries from five cultural areas (Northern and South Europe, UK, Poland and Taiwan). Patients from seven subgroups (before, during and after treatment, for localized and advanced disease each, plus palliative patients) were recruited. Structured interviews were conducted. Qualitative and quantitative analyses have been performed. One hundred forty patients were interviewed. Nine items were deleted and one shortened. Patients' comments had a key role in item selection. No item was deleted due to just quantitative criteria. Consistency was observed in patients' answers across cultural areas. The revised version of the module EORTC QLQ-COMU26 has 26 items, organized in 6 scales and 4 individual items. The EORTC COMU26 questionnaire can be used in daily clinical practice and research, in various patient groups from different cultures. The next step will be an international field test with a large heterogeneous group of cancer patients.
Sajjad, Madiha; Khan, Rehan Ahmed; Yasmeen, Rahila
2018-01-01
To develop a tool to evaluate faculty perceptions of assessment quality in an undergraduate medical program. The Assessment Implementation Measure (AIM) tool was developed by a mixed method approach. A preliminary questionnaire developed through literature review was submitted to a panel of 10 medical education experts for a three-round 'Modified Delphi technique'. Panel agreement of > 75% was considered the criterion for inclusion of items in the questionnaire. Cognitive pre-testing of five faculty members was conducted. Pilot study was done with 30 randomly selected faculty members. Content validity index (CVI) was calculated for individual items (I-CVI) and composite scale (S-CVI). Cronbach's alpha was calculated to determine the internal consistency reliability of the tool. The final AIM tool had 30 items after the Delphi process. S-CVI was 0.98 with the S-CVI/Avg method and 0.86 by S-CVI/UA method, suggesting good content validity. Cut-off value of < 0.9 I-CVI was taken as criterion for item deletion. Cognitive pre-testing revealed good item interpretation. Cronbach's alpha calculated for the AIM was 0.9, whereas Cronbach's alpha for the four domains ranged from 0.67 to 0.80. 'AIM' is a relevant and useful instrument with good content validity and reliability of results, and may be used to evaluate the teachers´ perceptions about assessment quality.
Rose, Matthias; Bjorner, Jakob B; Gandek, Barbara; Bruce, Bonnie; Fries, James F; Ware, John E
2014-05-01
To document the development and psychometric evaluation of the Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function (PF) item bank and static instruments. The items were evaluated using qualitative and quantitative methods. A total of 16,065 adults answered item subsets (n>2,200/item) on the Internet, with oversampling of the chronically ill. Classical test and item response theory methods were used to evaluate 149 PROMIS PF items plus 10 Short Form-36 and 20 Health Assessment Questionnaire-Disability Index items. A graded response model was used to estimate item parameters, which were normed to a mean of 50 (standard deviation [SD]=10) in a US general population sample. The final bank consists of 124 PROMIS items covering upper, central, and lower extremity functions and instrumental activities of daily living. In simulations, a 10-item computerized adaptive test (CAT) eliminated floor and decreased ceiling effects, achieving higher measurement precision than any comparable length static tool across four SDs of the measurement range. Improved psychometric properties were transferred to the CAT's superior ability to identify differences between age and disease groups. The item bank provides a common metric and can improve the measurement of PF by facilitating the standardization of patient-reported outcome measures and implementation of CATs for more efficient PF assessments over a larger range. Copyright © 2014. Published by Elsevier Inc.
The Long-Term Conditions Questionnaire: conceptual framework and item development
Peters, Michele; Potter, Caroline M; Kelly, Laura; Hunter, Cheryl; Gibbons, Elizabeth; Jenkinson, Crispin; Coulter, Angela; Forder, Julien; Towers, Ann-Marie; A’Court, Christine; Fitzpatrick, Ray
2016-01-01
Purpose To identify the main issues of importance when living with long-term conditions to refine a conceptual framework for informing the item development of a patient-reported outcome measure for long-term conditions. Materials and methods Semi-structured qualitative interviews (n=48) were conducted with people living with at least one long-term condition. Participants were recruited through primary care. The interviews were transcribed verbatim and analyzed by thematic analysis. The analysis served to refine the conceptual framework, based on reviews of the literature and stakeholder consultations, for developing candidate items for a new measure for long-term conditions. Results Three main organizing concepts were identified: impact of long-term conditions, experience of services and support, and self-care. The findings helped to refine a conceptual framework, leading to the development of 23 items that represent issues of importance in long-term conditions. The 23 candidate items formed the first draft of the measure, currently named the Long-Term Conditions Questionnaire. Conclusion The aim of this study was to refine the conceptual framework and develop items for a patient-reported outcome measure for long-term conditions, including single and multiple morbidities and physical and mental health conditions. Qualitative interviews identified the key themes for assessing outcomes in long-term conditions, and these underpinned the development of the initial draft of the measure. These initial items will undergo cognitive testing to refine the items prior to further validation in a survey. PMID:27621678
Joiner, Kevin L; Sternberg, Rosa Maria; Kennedy, Christine; Chen, Jyu-Lin; Fukuoka, Yoshimi; Janson, Susan L
2016-12-01
Create a Spanish-language version of the Risk Perception Survey for Developing Diabetes (RPS-DD) and assess psychometric properties. The Spanish-language version was created through translation, harmonization, and presentation to the tool's original author. It was field tested in a foreignborn Latino sample and properties evaluated in principal components analysis. Personal Control, Optimistic Bias, and Worry multi-item Likert subscale responses did not cluster together. A clean solution was obtained after removing two Personal Control subscale items. Neither the Personal Disease Risk scale nor the Environmental Health Risk scale responses loaded onto single factors. Reliabilities ranged from .54 to .88. Test of knowledge performance varied by item. This study contributes to evidence of validation of a Spanish-language RPS-DD in foreign-born Latinos.
Pancreatitis Quality of Life Instrument: Development of a new instrument
Bova, Carol; Barton, Bruce; Hartigan, Celia
2014-01-01
Objectives: The goal of this project was to develop the first disease-specific instrument for the evaluation of quality of life in chronic pancreatitis. Methods: Focus groups and interview sessions were conducted, with chronic pancreatitis patients, to identify items felt to impact quality of life which were subsequently formatted into a paper-and-pencil instrument. This instrument was used to conduct an online survey by an expert panel of pancreatologists to evaluate its content validity. Finally, the modified instrument was presented to patients during precognitive testing interviews to evaluate its clarity and appropriateness. Results: In total, 10 patients were enrolled in the focus groups and interview sessions where they identified 50 items. Once redundant items were removed, the 40 remaining items were made into a paper-and-pencil instrument referred to as the Pancreatitis Quality of Life Instrument. Through the processes of content validation and precognitive testing, the number of items in the instrument was reduced to 24. Conclusions: This marks the development of the first disease-specific instrument to evaluate quality of life in chronic pancreatitis. It includes unique features not found in generic instruments (economic factors, stigma, and spiritual factors). Although this marks a giant step forward, psychometric evaluation is still needed prior to its clinical use. PMID:26770703
Ernstmann, Nicole; Halbach, Sarah; Kowalski, Christoph; Pfaff, Holger; Ansmann, Lena
2017-04-01
Studies addressing the organizational contexts of care that may help increase the patients' ability to cope with a disease and to navigate through the health care system are still rare. Especially instruments allowing the assessment of such organizational efforts from the patients' perspective are missing. The aim of our study was to develop a survey instrument assessing organizational health literacy (HL) from the patients' perspective, i. e., health care organizations' responsiveness to patients' individual needs. A pool of 30 items was developed by a group of experts based on a literature review. The items were developed, tested and prioritized according to their importance in 11 semi-structured interviews and cognitive think-aloud interviews with cancer patients. The resulting 16 items were rated in a standardized postal survey involving a total of N=453 colon and breast cancer patients treated in cancer centers in Germany. An exploratory factor analysis, a confirmatory factor analysis and structural equation modelling were conducted. Item properties were analyzed. 83.2 % of the patients were diagnosed with breast cancer, 16.8 % had a diagnosis of colon cancer. The patients' mean age was 61 (26-88), 89.4 % were female. The most common comorbidities were hypertension (34.0 %) and cardiovascular disease (11.0 %). The final prediction model included nine items measuring the degree of health literacy-sensitivity of communication. The model showed an acceptable model fit. The nine items showed corrected item-total correlations between .622 and .762 and item difficulties between 0.77 and 0.87. Cronbach's α was .912. In a comprehensive development process, the original item pool comprising several aspects of organizational HL was reduced to a one-dimensional scale. The instrument measures an important aspect of organizational HL; i.e., the degree of health literacy-sensitivity of communication (HL-COM). HL-COM was found to impact patient enablement, mediated through the support by physicians. Future research will have to test these associations in the context of other diseases or institutions. Copyright © 2017. Published by Elsevier GmbH.
Manning, Joseph C; Walker, Gemma M; Carter, Tim; Aubeeluck, Aimee; Witchell, Miranda; Coad, Jane
2018-04-12
Currently, no standardised, evidence-based assessment tool for assessing immediate self-harm and suicide in acute paediatric inpatient settings exists. The aim of this study is to develop and test the psychometric properties of an assessment tool that identifies immediate risk of self-harm and suicide in children and young people (10-19 years) in acute paediatric hospital settings. Development phase: This phase involved a scoping review of the literature to identify and extract items from previously published suicide and self-harm risk assessment scales. Using a modified electronic Delphi approach, these items will then be rated according to their relevance for assessment of immediate suicide or self-harm risk by expert professionals. Inclusion of items will be determined by 65%-70% consensus between raters. Subsequently, a panel of expert members will convene to determine the face validity, appropriate phrasing, item order and response format for the finalised items.Psychometric testing phase: The finalised items will be tested for validity and reliability through a multicentre, psychometric evaluation. Psychometric testing will be undertaken to determine the following: internal consistency, inter-rater reliability, convergent, divergent validity and concurrent validity. Ethical approval was provided by the National Health Service East Midlands-Derby Research Ethics Committee (17/EM/0347) and full governance clearance received by the Health Research Authority and local participating sites. Findings from this study will be disseminated to professionals and the public via peer-reviewed journal publications, popular social media and conference presentations. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Massey, Kevin; Barnes, Marilyn J D; Villines, Dana; Goldstein, Julie D; Pierson, Anna Lee Hisey; Scherer, Cheryl; Vander Laan, Betty; Summerfelt, Wm Thomas
2015-01-01
Chaplains are increasingly seen as key members of interdisciplinary palliative care teams, yet the specific interventions and hoped for outcomes of their work are poorly understood. This project served to develop a standard terminology inventory for the chaplaincy field, to be called the chaplaincy taxonomy. The research team used a mixed methods approach to generate, evaluate and validate items for the taxonomy. We conducted a literature review, retrospective chart review, focus groups, self-observation, experience sampling, concept mapping, and reliability testing. Chaplaincy activities focused primarily on palliative care in an intensive care unit setting in order to capture a broad cross section of chaplaincy activities. Literature and chart review resulted in 438 taxonomy items for testing. Chaplain focus groups generated an additional 100 items and removed 421 items as duplications. Self-Observation, Experience Sampling and Concept Mapping provided validity that the taxonomy items were actual activities that chaplains perform in their spiritual care. Inter-rater reliability for chaplains to identify taxonomy items from vignettes was 0.903. The 100 item chaplaincy taxonomy provides a strong foundation for a normative inventory of chaplaincy activities and outcomes. A deliberative process is proposed to further expand and refine the taxonomy to create a standard terminological inventory for the field of chaplaincy. A standard terminology could improve the ways inter-disciplinary palliative care teams communicate about chaplaincy activities and outcomes.
Development and evaluation of the Korean Health Literacy Instrument.
Kang, Soo Jin; Lee, Tae Wha; Paasche-Orlow, Michael K; Kim, Gwang Suk; Won, Hee Kwan
2014-01-01
The purpose of this study is to develop and validate the Korean Health Literacy Instrument, which measures the capacity to understand and use health-related information and make informed health decisions in Korean adults. In Phase 1, 33 initial items were generated to measure functional, interactive, and critical health literacy with prose, document, and numeracy tasks. These items included content from health promotion, disease management, and health navigation contexts. Content validity assessment was conducted by an expert panel, and 11 items were excluded. In Phase 2, the 22 remaining items were administered to a convenience sample of 292 adults from community and clinical settings. Exploratory factor and item difficulty and discrimination analyses were conducted and four items with low discrimination were deleted. In Phase 3, the remaining 18 items were administered to a convenience sample of 315 adults 40-64 years of age from community and clinical settings. A confirmatory factor analysis was performed to test the construct validity of the instrument. The Korean Health Literacy Instrument has a range of 0 to 18. The mean score in our validation study was 11.98. The instrument exhibited an internal consistency reliability coefficient of 0.82, and a test-retest reliability of 0.89. The instrument is suitable for screening individuals who have limited health literacy skills. Future studies are needed to further define the psychometric properties and predictive validity of the Korean Health Literacy Instrument.
Kelly, Jacinta; Watson, Roger
2014-12-01
To report a pilot study for the development and validation of an instrument to measure quality in historical research papers. There are no set criteria to assess historical papers published in nursing journals. A three phase mixed method sequential confirmatory design. In 2012, we used a three-phase approach to item generation and content evaluation. In phase 1, we consulted nursing historians using an online survey comprising three open-ended questions and revised the items. In phase 2, we evaluated the revised items for relevance with expert historians using a 4-point Likert scale and Content Validity Index calculation. In phase 3, we conducted reliability testing of the instrument using a 3-point Likert scale. In phase 1, 121 responses were generated via the online survey and revised to 40 interrogatively phrased items. In phase 2, five items with an Item Content Validity Index score of ≥0·7 remained. In phase 3, responses from historians resulted in 100% agreement to questions 1, 2 and 4 and 89% and 78%, respectively, to questions 3 and 5. Items for the QSHRP have been identified, content validated and reliability tested. This scale improves on previous scales, which over-emphasized source criticism. However, a full-scale study is needed with nursing historians to increase its robustness. © 2014 John Wiley & Sons Ltd.
Pedagogy of Science Teaching Tests: Formative assessments of science teaching orientations
NASA Astrophysics Data System (ADS)
Cobern, William W.; Schuster, David; Adams, Betty; Skjold, Brandy Ann; Zeynep Muğaloğlu, Ebru; Bentz, Amy; Sparks, Kelly
2014-09-01
A critical aspect of teacher education is gaining pedagogical content knowledge of how to teach science for conceptual understanding. Given the time limitations of college methods courses, it is difficult to touch on more than a fraction of the science topics potentially taught across grades K-8, particularly in the context of relevant pedagogies. This research and development work centers on constructing a formative assessment resource to help expose pre-service teachers to a greater number of science topics within teaching episodes using various modes of instruction. To this end, 100 problem-based, science pedagogy assessment items were developed via expert group discussions and pilot testing. Each item contains a classroom vignette followed by response choices carefully crafted to include four basic pedagogies (didactic direct, active direct, guided inquiry, and open inquiry). The brief but numerous items allow a substantial increase in the number of science topics that pre-service students may consider. The intention is that students and teachers will be able to share and discuss particular responses to individual items, or else record their responses to collections of items and thereby create a snapshot profile of their teaching orientations. Subsets of items were piloted with students in pre-service science methods courses, and the quantitative results of student responses were spread sufficiently to suggest that the items can be effective for their intended purpose.
Latimer, Shane; Meade, Tanya; Tennant, Alan
2014-07-30
The purpose of this study was to investigate the application of item banking to questionnaire items intended to measure Deliberate Self-Harm (DSH) behaviours. The Rasch measurement model was used to evaluate behavioural items extracted from seven published DSH scales administered to 568 Australians aged 18-30 years (62% university students, 21% mental health patients, and 17% community members). Ninety four items were calibrated in the item bank (including 12 items with differential item functioning for gender and age). Tailored scale construction was demonstrated by extracting scales covering different combinations of DSH methods but with the same raw score for each person location on the latent DSH construct. A simulated computer adaptive test (starting with common self-harm methods to minimise presentation of extreme behaviours) demonstrated that 11 items (on average) were needed to achieve a standard error of measurement of 0.387 (corresponding to a Cronbach׳s Alpha of 0.85). This study lays the groundwork for advancing DSH measurement to an item bank approach with the flexibility to measure a specific definitional orientation (e.g., non-suicidal self-injury) or a broad continuum of self-harmful acts, as appropriate to a particular research/clinical purpose. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Pollard, Beth; Dixon, Diane; Dieppe, Paul; Johnston, Marie
2009-01-01
Background The International Classification of Functioning, Disability and Health (ICF) proposes three main health outcomes, Impairment (I), Activity Limitation (A) and Participation Restriction (P), but good measures of these constructs are needed The aim of this study was to use both Classical Test Theory (CTT) and Item Response Theory (IRT) methods to carry out an item analysis to improve measurement of these three components in patients having joint replacement surgery mainly for osteoarthritis (OA). Methods A geographical cohort of patients about to undergo lower limb joint replacement was invited to participate. Five hundred and twenty four patients completed ICF items that had been previously identified as measuring only a single ICF construct in patients with osteoarthritis. There were 13 I, 26 A and 20 P items. The SF-36 was used to explore the construct validity of the resultant I, A and P measures. The CTT and IRT analyses were run separately to identify items for inclusion or exclusion in the measurement of each construct. The results from both analyses were compared and contrasted. Results Overall, the item analysis resulted in the removal of 4 I items, 9 A items and 11 P items. CTT and IRT identified the same 14 items for removal, with CTT additionally excluding 3 items, and IRT a further 7 items. In a preliminary exploration of reliability and validity, the new measures appeared acceptable. Conclusion New measures were developed that reflect the ICF components of Impairment, Activity Limitation and Participation Restriction for patients with advanced arthritis. The resulting Aberdeen IAP measures (Ab-IAP) comprising I (Ab-I, 9 items), A (Ab-A, 17 items), and P (Ab-P, 9 items) met the criteria of conventional psychometric (CTT) analyses and the additional criteria (information and discrimination) of IRT. The use of both methods was more informative than the use of only one of these methods. Thus combining CTT and IRT appears to be a valuable tool in the development of measures. PMID:19422677
Development of a State-Wide Competency Test for Marketing Education. Final Report.
ERIC Educational Resources Information Center
Smith, Clifton L.
A project was conducted to develop a valid, competency-referenced test on the core competencies identified for the Missouri Fundamentals of Marketing curriculum. During the project: (1) multiple-choice test items based on the core competencies in the Fundamentals of Marketing curriculum were developed; (2) instructions for onsite administration of…
Fu, Sau Nga; Chin, Weng Yee; Wong, Carlos King Ho; Yeung, Vincent Tok Fai; Yiu, Ming Pong; Tsui, Hoi Yee; Chan, Ka Hung
2013-01-01
To develop and evaluate the psychometric properties of a Chinese questionnaire which assesses the barriers and enablers to commencing insulin in primary care patients with poorly controlled Type 2 diabetes. Questionnaire items were identified using literature review. Content validation was performed and items were further refined using an expert panel. Following translation, back translation and cognitive debriefing, the translated Chinese questionnaire was piloted on target patients. Exploratory factor analysis and item-scale correlations were performed to test the construct validity of the subscales and items. Internal reliability was tested by Cronbach's alpha. Twenty-seven identified items underwent content validation, translation and cognitive debriefing. The translated questionnaire was piloted on 303 insulin naïve (never taken insulin) Type 2 diabetes patients recruited from 10 government-funded primary care clinics across Hong Kong. Sufficient variability in the dataset for factor analysis was confirmed by Bartlett's Test of Sphericity (P<0.001). Using exploratory factor analysis with varimax rotation, 10 factors were generated onto which 26 items loaded with loading scores > 0.4 and Eigenvalues >1. Total variance for the 10 factors was 66.22%. Kaiser-Meyer-Olkin measure was 0.725. Cronbach's alpha coefficients for the first four factors were ≥0.6 identifying four sub-scales to which 13 items correlated. Remaining sub-scales and items with poor internal reliability were deleted. The final 13-item instrument had a four scale structure addressing: 'Self-image and stigmatization'; 'Factors promoting self-efficacy; 'Fear of pain or needles'; and 'Time and family support'. The Chinese Attitudes to Starting Insulin Questionnaire (Ch-ASIQ) appears to be a reliable and valid measure for assessing barriers to starting insulin. This short instrument is easy to administer and may be used by healthcare providers and researchers as an assessment tool for Chinese diabetic primary care patients, including the elderly, who are unwilling to start insulin.
Development and Testing of the Nurse Manager EBP Competency Scale.
Shuman, Clayton J; Ploutz-Snyder, Robert J; Titler, Marita G
2018-02-01
The purpose of this study was to develop and evaluate the validity and reliability of an instrument to measure nurse manager competencies regarding evidence-based practice (EBP). The Nurse Manager EBP Competency Scale consists of 16 items for respondents to indicate their perceived level of competency on a 0 to 3 Likert-type scale. Content validity was demonstrated through expert panel review and pilot testing. Principal axis factoring and Cronbach's alpha evaluated construct validity and internal consistency reliability, respectively. Eighty-three nurse managers completed the scale. Exploratory factor analysis resulted in a 16-item scale with two subscales, EBP Knowledge ( n = 6 items, α = .90) and EBP Activity ( n = 10 items, α = .94). Cronbach's alpha for the entire scale was .95. The Nurse Manager EBP Competency Scale is a brief measure of nurse manager EBP competency with evidence of validity and reliability. The scale can enhance our understanding in future studies regarding how nurse manager EBP competency affects implementation.
Electronic test instrumentation and techniques: A compilation
NASA Technical Reports Server (NTRS)
1974-01-01
The uses of test equipment and techniques used in space research and development programs are discussed. Modifications and adaptations to enlarge the scope of usefulness or divert the basic uses to alternate applications are analyzed. The items of equipment which have been of benefit to professional personnel in the enlargement and improvement of quality control capabilities are identified. Items which have been simplified or made more accurate in conducting measurements are described.
Development and initial validation of the appropriate antibiotic use self-efficacy scale.
Hill, Erin M; Watkins, Kaitlin
2018-06-04
While there are various medication self-efficacy scales that exist, none assess self-efficacy for appropriate antibiotic use. The Appropriate Antibiotic Use Self-Efficacy Scale (AAUSES) was developed, pilot tested, and its psychometric properties were examined. Following pilot testing of the scale, a 28-item questionnaire was examined using a sample (n = 289) recruited through the Amazon Mechanical Turk platform. Participants also completed other scales and items, which were used in assessing discriminant, convergent, and criterion-related validity. Test-retest reliability was also examined. After examining the scale and removing items that did not assess appropriate antibiotic use, an exploratory factor analysis was conducted on 13 items from the original scale. Three factors were retained that explained 65.51% of the variance. The scale and its subscales had adequate internal consistency. The scale had excellent test-retest reliability, as well as demonstrated convergent, discriminant, and criterion-related validity. The AAUSES is a valid and reliable scale that assesses three domains of appropriate antibiotic use self-efficacy. The AAUSES may have utility in clinical and research settings in understanding individuals' beliefs about appropriate antibiotic use and related behavioral correlates. Future research is needed to examine the scale's utility in these settings. Copyright © 2018 Elsevier B.V. All rights reserved.
Development of the competency scale for primary care managers in Thailand: Scale development.
Kitreerawutiwong, Keerati; Sriruecha, Chanaphol; Laohasiriwong, Wongsa
2015-12-09
The complexity of the primary care system requires a competent manager to achieve high-quality healthcare. The existing literature in the field yields little evidence of the tools to assess the competency of primary care administrators. This study aimed to develop and examine the psychometric properties of the competency scale for primary care managers in Thailand. The scale was developed using in-depth interviews and focus group discussions among policy makers, managers, practitioners, village health volunteers, and clients. The specific dimensions were extracted from 35 participants. 123 items were generated from the evidence and qualitative data. Content validity was established through the evaluation of seven experts and the original 123 items were reduced to 84 items. The pilot testing was conducted on a simple random sample of 487 primary care managers. Item analysis, reliability testing, and exploratory factor analysis were applied to establish the scale's reliability and construct validity. Exploratory factor analysis identified nine dimensions with 48 items using a five-point Likert scale. Each dimension accounted for greater than 58.61% of the total variance. The scale had strong content validity (Indices = 0.85). Each dimension of Cronbach's alpha ranged from 0.70 to 0.88. Based on these analyses, this instrument demonstrated sound psychometric properties and therefore is considered an effective tool for assessment of the primary care manager competencies. The results can be used to improve competency requirements of primary care managers, with implications for health service management workforce development.
Paap, Muirne C S; Kroeze, Karel A; Terwee, Caroline B; van der Palen, Job; Veldkamp, Bernard P
2017-11-01
Examining item usage is an important step in evaluating the performance of a computerized adaptive test (CAT). We study item usage for a newly developed multidimensional CAT which draws items from three PROMIS domains, as well as a disease-specific one. The multidimensional item bank used in the current study contained 194 items from four domains: the PROMIS domains fatigue, physical function, and ability to participate in social roles and activities, and a disease-specific domain (the COPD-SIB). The item bank was calibrated using the multidimensional graded response model and data of 795 patients with chronic obstructive pulmonary disease. To evaluate the item usage rates of all individual items in our item bank, CAT simulations were performed on responses generated based on a multivariate uniform distribution. The outcome variables included active bank size and item overuse (usage rate larger than the expected item usage rate). For average θ-values, the overall active bank size was 9-10%; this number quickly increased as θ-values became more extreme. For values of -2 and +2, the overall active bank size equaled 39-40%. There was 78% overlap between overused items and active bank size for average θ-values. For more extreme θ-values, the overused items made up a much smaller part of the active bank size: here the overlap was only 35%. Our results strengthen the claim that relatively short item banks may suffice when using polytomous items (and no content constraints/exposure control mechanisms), especially when using MCAT.
ERIC Educational Resources Information Center
Haberman, Shelby J.
2009-01-01
A regression procedure is developed to link simultaneously a very large number of item response theory (IRT) parameter estimates obtained from a large number of test forms, where each form has been separately calibrated and where forms can be linked on a pairwise basis by means of common items. An application is made to forms in which a…
Sadler, Philip M.; Coyle, Harold; Smith, Nancy Cook; Miller, Jaimie; Mintzes, Joel; Tanner, Kimberly; Murray, John
2013-01-01
We report on the development of an item test bank and associated instruments based on the National Research Council (NRC) K–8 life sciences content standards. Utilizing hundreds of studies in the science education research literature on student misconceptions, we constructed 476 unique multiple-choice items that measure the degree to which test takers hold either a misconception or an accepted scientific view. Tested nationally with 30,594 students, following their study of life science, and their 353 teachers, these items reveal a range of interesting results, particularly student difficulties in mastering the NRC standards. Teachers also answered test items and demonstrated a high level of subject matter knowledge reflecting the standards of the grade level at which they teach, but exhibiting few misconceptions of their own. In addition, teachers predicted the difficulty of each item for their students and which of the wrong answers would be the most popular. Teachers were found to generally overestimate their own students’ performance and to have a high level of awareness of the particular misconceptions that their students hold on the K–4 standards, but a low level of awareness of misconceptions related to the 5–8 standards. PMID:24006402