ERIC Educational Resources Information Center
Banerjee, Jayanti; Papageorgiou, Spiros
2016-01-01
The research reported in this article investigates differential item functioning (DIF) in a listening comprehension test. The study explores the relationship between test-taker age and the items' language domains across multiple test forms. The data comprise test-taker responses (N = 2,861) to a total of 133 unique items, 46 items of which were…
Gender-Based Differential Item Performance in Mathematics Achievement Items.
ERIC Educational Resources Information Center
Doolittle, Allen E.; Cleary, T. Anne
1987-01-01
Eight randomly equivalent samples of high school seniors were each given a unique form of the ACT Assessment Mathematics Usage Test (ACTM). Signed measures of differential item performance (DIP) were obtained for each item in the eight ACTM forms. DIP estimates were analyzed and a significant item category effect was found. (Author/LMO)
What Does a Verbal Test Measure? A New Approach to Understanding Sources of Item Difficulty.
ERIC Educational Resources Information Center
Berk, Eric J. Vanden; Lohman, David F.; Cassata, Jennifer Coyne
Assessing the construct relevance of mental test results continues to present many challenges, and it has proven to be particularly difficult to assess the construct relevance of verbal items. This study was conducted to gain a better understanding of the conceptual sources of verbal item difficulty using a unique approach that integrates…
The cost of proactive interference is constant across presentation conditions.
Endress, Ansgar D; Siddique, Aneela
2016-10-01
Proactive interference (PI) severely constrains how many items people can remember. For example, Endress and Potter (2014a) presented participants with sequences of everyday objects at 250ms/picture, followed by a yes/no recognition test. They manipulated PI by either using new images on every trial in the unique condition (thus minimizing PI among items), or by re-using images from a limited pool for all trials in the repeated condition (thus maximizing PI among items). In the low-PI unique condition, the probability of remembering an item was essentially independent of the number of memory items, showing no clear memory limitations; more traditional working memory-like memory limitations appeared only in the high-PI repeated condition. Here, we ask whether the effects of PI are modulated by the availability of long-term memory (LTM) and verbal resources. Participants viewed sequences of 21 images, followed by a yes/no recognition test. Items were presented either quickly (250ms/image) or sufficiently slowly (1500ms/image) to produce LTM representations, either with or without verbal suppression. Across conditions, participants performed better in the unique than in the repeated condition, and better for slow than for fast presentations. In contrast, verbal suppression impaired performance only with slow presentations. The relative cost of PI was remarkably constant across conditions: relative to the unique condition, performance in the repeated condition was about 15% lower in all conditions. The cost of PI thus seems to be a function of the relative strength or recency of target items and interfering items, but relatively insensitive to other experimental manipulations. Copyright © 2016 Elsevier B.V. All rights reserved.
Development and Initial Validation of Military Deployment-Related TBI Quality-of-Life Item Banks.
Toyinbo, Peter A; Vanderploeg, Rodney D; Donnell, Alison J; Mutolo, Sandra A; Cook, Karon F; Kisala, Pamela A; Tulsky, David S
2016-01-01
To investigate unique factors that affect health-related quality of life (QOL) in individuals with military deployment-related traumatic brain injury (MDR-TBI) and to develop appropriate assessment tools, consistent with the TBI-QOL/PROMIS/Neuro-QOL systems. Three focus groups from each of the 4 Veterans Administration (VA) Polytrauma Rehabilitation Centers, consisting of 20 veterans with mild to severe MDR-TBI, and 36 VA providers were involved in early stage of new item banks development. The item banks were field tested in a sample (N = 485) of veterans enrolled in VA and diagnosed with an MDR-TBI. Focus groups and survey. Developed item banks and short forms for Guilt, Posttraumatic Stress Disorder/Trauma, and Military-Related Loss. Three new item banks representing unique domains of MDR-TBI health outcomes were created: 15 new Posttraumatic Stress Disorder items plus 16 SCI-QOL legacy Trauma items, 37 new Military-Related Loss items plus 18 TBI-QOL legacy Grief/Loss items, and 33 new Guilt items. Exploratory and confirmatory factor analyses plus bifactor analysis of the items supported sufficient unidimensionality of the new item pools. Convergent and discriminant analyses results, as well as known group comparisons, provided initial support for the validity and clinical utility of the new item response theory-calibrated item banks and their short forms. This work provides a unique opportunity to identify issues specific to individuals with MDR-TBI and ensure that they are captured in QOL assessment, thus extending the existing TBI-QOL measurement system.
Development of the PROMIS positive emotional and sensory expectancies of smoking item banks.
Tucker, Joan S; Shadel, William G; Edelen, Maria Orlando; Stucky, Brian D; Li, Zhen; Hansen, Mark; Cai, Li
2014-09-01
The positive emotional and sensory expectancies of cigarette smoking include improved cognitive abilities, positive affective states, and pleasurable sensorimotor sensations. This paper describes development of Positive Emotional and Sensory Expectancies of Smoking item banks that will serve to standardize the assessment of this construct among daily and nondaily cigarette smokers. Data came from daily (N = 4,201) and nondaily (N =1,183) smokers who completed an online survey. To identify a unidimensional set of items, we conducted item factor analyses, item response theory analyses, and differential item functioning analyses. Additionally, we evaluated the performance of fixed-item short forms (SFs) and computer adaptive tests (CATs) to efficiently assess the construct. Eighteen items were included in the item banks (15 common across daily and nondaily smokers, 1 unique to daily, 2 unique to nondaily). The item banks are strongly unidimensional, highly reliable (reliability = 0.95 for both), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.86). Results from simulated CATs indicated that, on average, less than 8 items are needed to assess the construct with adequate precision using the item banks. These analyses identified a new set of items that can assess the positive emotional and sensory expectancies of smoking in a reliable and standardized manner. Considerable efficiency in assessing this construct can be achieved by using the item bank SF, employing computer adaptive tests, or selecting subsets of items tailored to specific research or clinical purposes. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Strategy Execution in Cognitive Skill Learning: An Item-Level Test of Candidate Models
ERIC Educational Resources Information Center
Rickard, Timothy C.
2004-01-01
This article investigates the transition to memory-based performance that commonly occurs with practice on tasks that initially require use of a multistep algorithm. In an alphabet arithmetic task, item response times exhibited pronounced step-function decreases after moderate practice that were uniquely predicted by T. C. Rickard's (1997)…
NASA Astrophysics Data System (ADS)
Ilich, Maria O.
Psychometricians and test developers evaluate standardized tests for potential bias against groups of test-takers by using differential item functioning (DIF). English language learners (ELLs) are a diverse group of students whose native language is not English. While they are still learning the English language, they must take their standardized tests for their school subjects, including science, in English. In this study, linguistic complexity was examined as a possible source of DIF that may result in test scores that confound science knowledge with a lack of English proficiency among ELLs. Two years of fifth-grade state science tests were analyzed for evidence of DIF using two DIF methods, Simultaneous Item Bias Test (SIBTest) and logistic regression. The tests presented a unique challenge in that the test items were grouped together into testlets---groups of items referring to a scientific scenario to measure knowledge of different science content or skills. Very large samples of 10, 256 students in 2006 and 13,571 students in 2007 were examined. Half of each sample was composed of Spanish-speaking ELLs; the balance was comprised of native English speakers. The two DIF methods were in agreement about the items that favored non-ELLs and the items that favored ELLs. Logistic regression effect sizes were all negligible, while SIBTest flagged items with low to high DIF. A decrease in socioeconomic status and Spanish-speaking ELL diversity may have led to inconsistent SIBTest effect sizes for items used in both testing years. The DIF results for the testlets suggested that ELLs lacked sufficient opportunity to learn science content. The DIF results further suggest that those constructed response test items requiring the student to draw a conclusion about a scientific investigation or to plan a new investigation tended to favor ELLs.
Tat, Michelle J; Soonsawat, Anothai; Nagle, Corinne B; Deason, Rebecca G; O'Connor, Maureen K; Budson, Andrew E
2016-11-01
Patients with Alzheimer's disease (AD) dementia exhibit high rates of memory distortions in addition to their impairments in episodic memory. Several investigations have demonstrated that when healthy individuals (young and old) engaged in an encoding strategy that emphasized the uniqueness of study items (an item-specific encoding strategy), they were able to improve their discrimination between old items and unstudied critical lure items in a false memory task. In the present study we examined if patients with AD could also improve their memory discrimination when engaging in an item-specific encoding strategy. Healthy older adult controls, patients with mild cognitive impairment (MCI) due to AD, and patients with mild AD dementia were asked to study lists of categorized words. In the Item-Specific condition, participants were asked to provide a unique detail or personal experience with each study item. In the Relational condition, they were asked to determine how each item in the list was related to the others. To assess the influence of both strategies, recall and recognition memory tests were administered. Overall, both patient groups exhibited poorer memory in both recall and recognition tests compared to controls. In terms of recognition, healthy older controls and patients with MCI due to AD exhibited improved memory discrimination in the Item-Specific condition compared to the Relational condition, whereas patients with AD dementia did not. We speculate that patients with MCI due to AD use intact frontal networks to effectively engage in this strategy. Published by Elsevier Inc.
Tat, Michelle J.; Soonsawat, Anothai; Nagle, Corinne B.; Deason, Rebecca G.; O’Connor, Maureen K.; Budson, Andrew E.
2018-01-01
Patients with Alzheimer’s disease (AD) dementia exhibit high rates of memory distortions in addition to their impairments in episodic memory. Several investigations have demonstrated that when healthy individuals (young and old) engaged in an encoding strategy that emphasized the uniqueness of study items (an item-specific encoding strategy), they were able to improve their discrimination between old items and unstudied critical lure items in a false memory task. In the present study we examined if patients with AD could also improve their memory discrimination when engaging in an item-specific encoding strategy. Healthy older adult controls, patients with mild cognitive impairment (MCI) due to AD, and patients with mild AD dementia were asked to study lists of categorized words. In the Item-Specific condition, participants were asked to provide a unique detail or personal experience with each study item. In the Relational condition, they were asked to determine how each item in the list was related to the others. To assess the influence of both strategies, recall and recognition memory tests were administered. Overall, both patient groups exhibited poorer memory in both recall and recognition tests compared to controls. In terms of recognition, healthy older controls and patients with MCI due to AD exhibited improved memory discrimination in the Item-Specific condition compared to the Relational condition, whereas patients with AD dementia did not. We speculate that patients with MCI due to AD use intact frontal networks to effectively engage in this strategy. PMID:27643951
Department of Defense Logistics Roadmap 2008. Volume 1
2008-07-01
machine readable identification mark on the Department’s tangible qualifying assets, and establishes the data management protocols needed to...uniquely identify items with a Unique Item Identifier (UII) via machine - readable information (MRI) marking represented by a two-dimensional data...property items with a machine -readable Unique Item Identifier (UII), which is a set of globally unique data elements. The UII is used in functional
ERIC Educational Resources Information Center
Sadler, Philip M.; Coyle, Harold; Cook Smith, Nancy; Miller, Jaimie; Mintzes, Joel; Tanner, Kimberly; Murray, John
2013-01-01
We report on the development of an item test bank and associated instruments based on the National Research Council (NRC) K-8 life sciences content standards. Utilizing hundreds of studies in the science education research literature on student misconceptions, we constructed 476 unique multiple-choice items that measure the degree to which test…
ERIC Educational Resources Information Center
Starns, Jeffrey J.; Rotello, Caren M.; Hautus, Michael J.
2014-01-01
We tested the dual process and unequal variance signal detection models by jointly modeling recognition and source confidence ratings. The 2 approaches make unique predictions for the slope of the recognition memory zROC function for items with correct versus incorrect source decisions. The standard bivariate Gaussian version of the unequal…
Automated Rocket Propulsion Test Management
NASA Technical Reports Server (NTRS)
Walters, Ian; Nelson, Cheryl; Jones, Helene
2007-01-01
The Rocket Propulsion Test-Automated Management System provides a central location for managing activities associated with Rocket Propulsion Test Management Board, National Rocket Propulsion Test Alliance, and the Senior Steering Group business management activities. A set of authorized users, both on-site and off-site with regard to Stennis Space Center (SSC), can access the system through a Web interface. Web-based forms are used for user input with generation and electronic distribution of reports easily accessible. Major functions managed by this software include meeting agenda management, meeting minutes, action requests, action items, directives, and recommendations. Additional functions include electronic review, approval, and signatures. A repository/library of documents is available for users, and all items are tracked in the system by unique identification numbers and status (open, closed, percent complete, etc.). The system also provides queries and version control for input of all items.
First State Fitness Test. A Measurement of Functional Health.
ERIC Educational Resources Information Center
Brown, Timothy; And Others
This test is designed to measure the functional health of young people. Functional health refers to those factors relating to personal health that can be improved with regular exercise. This test is unique in comparison to other physical fitness tests because of the absence of motor skill items which have no relationship to an individual's…
A large-scale, long-term study of scale drift: The micro view and the macro view
NASA Astrophysics Data System (ADS)
He, W.; Li, S.; Kingsbury, G. G.
2016-11-01
The development of measurement scales for use across years and grades in educational settings provides unique challenges, as instructional approaches, instructional materials, and content standards all change periodically. This study examined the measurement stability of a set of Rasch measurement scales that have been in place for almost 40 years. In order to investigate the stability of these scales, item responses were collected from a large set of students who took operational adaptive tests using items calibrated to the measurement scales. For the four scales that were examined, item samples ranged from 2183 to 7923 items. Each item was administered to at least 500 students in each grade level, resulting in approximately 3000 responses per item. Stability was examined at the micro level analysing change in item parameter estimates that have occurred since the items were first calibrated. It was also examined at the macro level, involving groups of items and overall test scores for students. Results indicated that individual items had changes in their parameter estimates, which require further analysis and possible recalibration. At the same time, the results at the total score level indicate substantial stability in the measurement scales over the span of their use.
Kisala, Pamela A; Tulsky, David S; Pace, Natalie; Victorson, David; Choi, Seung W; Heinemann, Allen W
2015-05-01
To develop a calibrated item bank and computer adaptive test (CAT) to assess the effects of stigma on health-related quality of life in individuals with spinal cord injury (SCI). Grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, and item response theory (IRT)-based psychometric analyses. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. SCI-QOL Stigma Item Bank A sample of 611 individuals with traumatic SCI completed 30 items assessing SCI-related stigma. After 7 items were iteratively removed, factor analyses confirmed a unidimensional pool of items. Graded Response Model IRT analyses were used to estimate slopes and thresholds for the final 23 items. The SCI-QOL Stigma item bank is unique not only in the assessment of SCI-related stigma but also in the inclusion of individuals with SCI in all phases of its development. Use of confirmatory factor analytic and IRT methods provide flexibility and precision of measurement. The item bank may be administered as a CAT or as a 10-item fixed-length short form and can be used for research and clinical applications.
Kisala, Pamela A.; Tulsky, David S.; Pace, Natalie; Victorson, David; Choi, Seung W.; Heinemann, Allen W.
2015-01-01
Objective To develop a calibrated item bank and computer adaptive test (CAT) to assess the effects of stigma on health-related quality of life in individuals with spinal cord injury (SCI). Design Grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, and item response theory (IRT)-based psychometric analyses. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Main Outcome Measures SCI-QOL Stigma Item Bank Results A sample of 611 individuals with traumatic SCI completed 30 items assessing SCI-related stigma. After 7 items were iteratively removed, factor analyses confirmed a unidimensional pool of items. Graded Response Model IRT analyses were used to estimate slopes and thresholds for the final 23 items. Conclusions The SCI-QOL Stigma item bank is unique not only in the assessment of SCI-related stigma but also in the inclusion of individuals with SCI in all phases of its development. Use of confirmatory factor analytic and IRT methods provide flexibility and precision of measurement. The item bank may be administered as a CAT or as a 10-item fixed-length short form and can be used for research and clinical applications. PMID:26010973
Development of the PROMIS health expectancies of smoking item banks.
Edelen, Maria Orlando; Tucker, Joan S; Shadel, William G; Stucky, Brian D; Cerully, Jennifer; Li, Zhen; Hansen, Mark; Cai, Li
2014-09-01
Smokers' health-related outcome expectancies are associated with a number of important constructs in smoking research, yet there are no measures currently available that focus exclusively on this domain. This paper describes the development and evaluation of item banks for assessing the health expectancies of smoking. Using data from a sample of daily (N = 4,201) and nondaily (N = 1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning analyses (according to gender, age, and race/ethnicity) to arrive at a unidimensional set of health expectancies items for daily and nondaily smokers. We also evaluated the performance of short forms (SFs) and computer adaptive tests (CATs) to efficiently assess health expectancies. A total of 24 items were included in the Health Expectancies item banks; 13 items are common across daily and nondaily smokers, 6 are unique to daily, and 5 are unique to nondaily. For both daily and nondaily smokers, the Health Expectancies item banks are unidimensional, reliable (reliability = 0.95 and 0.96, respectively), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.87). Results from simulated CATs showed that health expectancies can be assessed with good precision with an average of 5-6 items adaptively selected from the item banks. Health expectancies of smoking can be assessed on the basis of these item banks via SFs, CATs, or through a tailored set of items selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Development of the PROMIS nicotine dependence item banks.
Shadel, William G; Edelen, Maria Orlando; Tucker, Joan S; Stucky, Brian D; Hansen, Mark; Cai, Li
2014-09-01
Nicotine dependence is a core construct important for understanding cigarette smoking and smoking cessation behavior. This article describes analyses conducted to develop and evaluate item banks for assessing nicotine dependence among daily and nondaily smokers. Using data from a sample of daily (N = 4,201) and nondaily (N =1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning analyses (according to gender, age, and race/ethnicity) to arrive at a unidimensional set of nicotine dependence items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) to efficiently assess dependence. A total of 32 items were included in the Nicotine Dependence item banks; 22 items are common across daily and nondaily smokers, 5 are unique to daily smokers, and 5 are unique to nondaily smokers. For both daily and nondaily smokers, the Nicotine Dependence item banks are strongly unidimensional, highly reliable (reliability = 0.97 and 0.97, respectively), and perform similarly across gender, age, and race/ethnicity groups. SFs common to daily and nondaily smokers consist of 8 and 4 items (reliability = 0.91 and 0.81, respectively). Results from simulated CATs showed that dependence can be assessed with very good precision for most respondents using fewer than 6 items adaptively selected from the item banks. Nicotine dependence on cigarettes can be assessed on the basis of these item banks via one of the SFs, by using CATs, or through a tailored set of items selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Development of the PROMIS negative psychosocial expectancies of smoking item banks.
Stucky, Brian D; Edelen, Maria Orlando; Tucker, Joan S; Shadel, William G; Cerully, Jennifer; Kuhfeld, Megan; Hansen, Mark; Cai, Li
2014-09-01
Negative psychosocial expectancies of smoking include aspects of social disapproval and disappointment in oneself. This paper describes analyses conducted to develop and evaluate item banks for assessing psychosocial expectancies among daily and nondaily smokers. Using data from a sample of daily (N = 4,201) and nondaily (N =1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning analyses (according to gender, age, and race/ethnicity) to arrive at a unidimensional set of psychosocial expectancies items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) to efficiently assess psychosocial expectancies. A total of 21 items were included in the Psychosocial Expectancies item banks: 14 items are common across daily and nondaily smokers, 6 are unique to daily, and 1 is unique to nondaily. For both daily and nondaily smokers, the Psychosocial Expectancies item banks are strongly unidimensional, highly reliable (reliability = 0.95 and 0.93, respectively), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.85). Results from simulated CATs showed that, on average, fewer than 8 items are needed to assess psychosocial expectancies with adequate precision when using the item banks. Psychosocial expectancies of smoking can be assessed on the basis of these item banks via the SF, by using CAT, or through a tailored set of items selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Sadler, Philip M.; Coyle, Harold; Smith, Nancy Cook; Miller, Jaimie; Mintzes, Joel; Tanner, Kimberly; Murray, John
2013-01-01
We report on the development of an item test bank and associated instruments based on the National Research Council (NRC) K–8 life sciences content standards. Utilizing hundreds of studies in the science education research literature on student misconceptions, we constructed 476 unique multiple-choice items that measure the degree to which test takers hold either a misconception or an accepted scientific view. Tested nationally with 30,594 students, following their study of life science, and their 353 teachers, these items reveal a range of interesting results, particularly student difficulties in mastering the NRC standards. Teachers also answered test items and demonstrated a high level of subject matter knowledge reflecting the standards of the grade level at which they teach, but exhibiting few misconceptions of their own. In addition, teachers predicted the difficulty of each item for their students and which of the wrong answers would be the most popular. Teachers were found to generally overestimate their own students’ performance and to have a high level of awareness of the particular misconceptions that their students hold on the K–4 standards, but a low level of awareness of misconceptions related to the 5–8 standards. PMID:24006402
Sadler, Philip M; Coyle, Harold; Smith, Nancy Cook; Miller, Jaimie; Mintzes, Joel; Tanner, Kimberly; Murray, John
2013-01-01
We report on the development of an item test bank and associated instruments based on the National Research Council (NRC) K-8 life sciences content standards. Utilizing hundreds of studies in the science education research literature on student misconceptions, we constructed 476 unique multiple-choice items that measure the degree to which test takers hold either a misconception or an accepted scientific view. Tested nationally with 30,594 students, following their study of life science, and their 353 teachers, these items reveal a range of interesting results, particularly student difficulties in mastering the NRC standards. Teachers also answered test items and demonstrated a high level of subject matter knowledge reflecting the standards of the grade level at which they teach, but exhibiting few misconceptions of their own. In addition, teachers predicted the difficulty of each item for their students and which of the wrong answers would be the most popular. Teachers were found to generally overestimate their own students' performance and to have a high level of awareness of the particular misconceptions that their students hold on the K-4 standards, but a low level of awareness of misconceptions related to the 5-8 standards.
41 CFR 101-29.402 - Exceptions to mandatory use of Federal product descriptions.
Code of Federal Regulations, 2011 CFR
2011-07-01
... single manufacturer's design. (8) The product is unique to a single system. (9) The product (excluding...; (2) Items required for experiment, test, or research and development; or (3) Spare parts, components...
41 CFR 101-29.402 - Exceptions to mandatory use of Federal product descriptions.
Code of Federal Regulations, 2010 CFR
2010-07-01
... single manufacturer's design. (8) The product is unique to a single system. (9) The product (excluding...; (2) Items required for experiment, test, or research and development; or (3) Spare parts, components...
41 CFR 101-29.402 - Exceptions to mandatory use of Federal product descriptions.
Code of Federal Regulations, 2014 CFR
2014-07-01
... single manufacturer's design. (8) The product is unique to a single system. (9) The product (excluding...; (2) Items required for experiment, test, or research and development; or (3) Spare parts, components...
41 CFR 101-29.402 - Exceptions to mandatory use of Federal product descriptions.
Code of Federal Regulations, 2013 CFR
2013-07-01
... single manufacturer's design. (8) The product is unique to a single system. (9) The product (excluding...; (2) Items required for experiment, test, or research and development; or (3) Spare parts, components...
41 CFR 101-29.402 - Exceptions to mandatory use of Federal product descriptions.
Code of Federal Regulations, 2012 CFR
2012-07-01
... single manufacturer's design. (8) The product is unique to a single system. (9) The product (excluding...; (2) Items required for experiment, test, or research and development; or (3) Spare parts, components...
Acceleration Testing: A Better, Faster, Cheaper Alternative for Strength Qualification Testing
NASA Technical Reports Server (NTRS)
Mattiello, Carmine F.
1997-01-01
This paper addresses the advantages of utilizing a centrifuge test over the conventional static load test methods to structurally qualify aerospace structures. Three recent test cases are reviewed and used as examples to highlight these benefits. In addition, the overall capability of Goddard's High Capacity Centrifuge (HCC) is outlined along with some unique features that were designed specifically to reduce costs, test turn around time, and increase test item safety.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-08-30
....211-7003 by revising the clause date to read ``(XXX 2010)'' and, at paragraph (a), by revising the... following provision: Notice of Warranty Tracking of Serialized Items (XXX 2010) (a) Definition. ``Unique...: Warranty Tracking of Serialized Items (XXX 2010) (a) Definitions. As used in this clause-- DoD Item Unique...
Mokkink, Lidwine Brigitta; Galindo-Garre, Francisca; Uitdehaag, Bernard Mj
2016-12-01
The Multiple Sclerosis Walking Scale-12 (MSWS-12) measures walking ability from the patients' perspective. We examined the quality of the MSWS-12 using an item response theory model, the graded response model (GRM). A total of 625 unique Dutch multiple sclerosis (MS) patients were included. After testing for unidimensionality, monotonicity, and absence of local dependence, a GRM was fit and item characteristics were assessed. Differential item functioning (DIF) for the variables gender, age, duration of MS, type of MS and severity of MS, reliability, total test information, and standard error of the trait level (θ) were investigated. Confirmatory factor analysis showed a unidimensional structure of the 12 items of the scale, explaining 88% of the variance. Item 2 did not fit into the GRM model. Reliability was 0.93. Items 8 and 9 (of the 11 and 12 item version respectively) showed DIF on the variable severity, based on the Expanded Disability Status Scale (EDSS). However, the EDSS is strongly related to the content of both items. Our results confirm the good quality of the MSWS-12. The trait level (θ) scores and item parameters of both the 12- and 11-item versions were highly comparable, although we do not suggest to change the content of the MSWS-12. © The Author(s), 2016.
Rhebergen, Martijn D F; Visser, Maaike J; Verberk, Maarten M; Lenderink, Annet F; van Dijk, Frank J H; Kezic, Sanja; Hulshof, Carel T J
2012-10-01
We compared three common user involvement methods in revealing barriers and facilitators from intended users that might influence their use of a new genetic test. The study was part of the development of a new genetic test on the susceptibility to hand eczema for nurses. Eighty student nurses participated in five focus groups (n = 33), 15 interviews (n = 15) or questionnaires (n = 32). For each method, data were collected until saturation. We compared the mean number of items and relevant remarks that could influence the use of the genetic test obtained per method, divided by the number of participants in that method. Thematic content analysis was performed using MAXQDA software. The focus groups revealed 30 unique items compared to 29 in the interviews and 21 in the questionnaires. The interviews produced more items and relevant remarks per participant (1.9 and 8.4 pp) than focus groups (0.9 and 4.8 pp) or questionnaires (0.7 and 2.3 pp). All three involvement methods revealed relevant barriers and facilitators to use a new genetic test. Focus groups and interviews revealed substantially more items than questionnaires. Furthermore, this study suggests a preference for the use of interviews because the number of items per participant was higher than for focus groups and questionnaires. This conclusion may be valid for other genetic tests as well.
Improving the Validity and Reliability of a Health Promotion Survey for Physical Therapists
Stephens, Jaca L.; Lowman, John D.; Graham, Cecilia L.; Morris, David M.; Kohler, Connie L.; Waugh, Jonathan B.
2013-01-01
Purpose Physical therapists (PTs) have a unique opportunity to intervene in the area of health promotion. However, no instrument has been validated to measure PTs’ views on health promotion in physical therapy practice. The purpose of this study was to evaluate the content validity and test-retest reliability of a health promotion survey designed for PTs. Methods An expert panel of PTs assessed the content validity of “The Role of Health Promotion in Physical Therapy Survey” and provided suggestions for revision. Item content validity was assessed using the content validity ratio (CVR) as well as the modified kappa statistic. Therapists then participated in the test-retest reliability assessment of the revised health promotion survey, which was assessed using a weighted kappa statistic. Results Based on feedback from the expert panelists, significant revisions were made to the original survey. The expert panel reached at least a majority consensus agreement for all items in the revised survey and the survey-CVR improved from 0.44 to 0.66. Only one item on the revised survey had substantial test-retest agreement, with 55% of the items having moderate agreement and 43% poor agreement. Conclusions All items on the revised health promotion survey demonstrated at least fair validity, but few items had reasonable test-retest reliability. Further modifications should be made to strengthen the validity and improve the reliability of this survey. PMID:23754935
Pilkonis, Paul A; Yu, Lan; Dodds, Nathan E; Johnston, Kelly L; Lawrence, Suzanne M; Hilton, Thomas F; Daley, Dennis C; Patkar, Ashwin A; McCarty, Dennis
2017-08-01
There is a need to monitor patients receiving prescription opioids to detect possible signs of abuse. To address this need, we developed and calibrated an item bank for severity of abuse of prescription pain medication as part of the Patient-Reported Outcomes Measurement Information System (PROMIS ® ). Comprehensive literature searches yielded an initial bank of 5,310 items relevant to substance use and abuse, including abuse of prescription pain medication, from over 80 unique instruments. After qualitative item analysis (i.e., focus groups, cognitive interviewing, expert review, and item revision), 25 items for abuse of prescribed pain medication were included in field testing. Items were written in a first-person, past-tense format, with a three-month time frame and five response options reflecting frequency or severity. The calibration sample included 448 respondents, 367 from the general population (ascertained through an internet panel) and 81 from community treatment programs participating in the National Drug Abuse Treatment Clinical Trials Network. A final bank of 22 items was calibrated using the two-parameter graded response model from item response theory. A seven-item static short form was also developed. The test information curve showed that the PROMIS ® item bank for abuse of prescription pain medication provided substantial information in a broad range of severity. The initial psychometric characteristics of the item bank support its use as a computerized adaptive test or short form, with either version providing a brief, precise, and efficient measure relevant to both clinical and community samples. © 2016 American Academy of Pain Medicine. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
48 CFR 252.211-7003 - Item unique identification and valuation.
Code of Federal Regulations, 2014 CFR
2014-10-01
... reader or interrogator, used to retrieve data encoded on machine-readable media. Concatenated unique item... identifier. Item means a single hardware article or a single unit formed by a grouping of subassemblies... manufactured under identical conditions. Machine-readable means an automatic identification technology media...
Herschbach, Peter; Berg, Petra; Dankert, Andrea; Duran, Gabriele; Engst-Hastreiter, Ursula; Waadt, Sabine; Keller, Monika; Ukat, Robert; Henrich, Gerhard
2005-06-01
The aim of this study was the development and psychometric testing of a new psychological questionnaire to measure the fear of progression (FoP) in chronically ill patients (cancer, diabetes mellitus and rheumatic diseases). The Fear of Progression Questionnaire (FoP-Q) was developed in four phases: (1) generation of items (65 interviews); (2) reduction of items--the initial version of the questionnaire (87 items) was presented to 411 patients, to construct subscales and test the reliability; (3) testing the convergent and discriminative validity of the reduced test version (43 items) within a new sample (n=439); (4) translation--German to English. The scale comprised five factors (Cronbach's alpha >.70): affective reactions (13 items), partnership/family (7), occupation (7), loss of autonomy (7) and coping with anxiety (9). The test-retest reliability coefficients varied between .77 and .94. There was only a medium relationship to traditional anxiety scales. This is an indication of the independence of the FoP. Significant relationships between the FoP-Q and the patient's illness behaviour indicate discriminative validity. The FoP-Q is a new and unique questionnaire developed for the chronically ill. A major problem and source of stress for this patient group has been measuring both specifically and economically the FoP of an illness. The FoP-Q was designed to resolve this problem, fulfill this need and reduce this stress.
Capturing specific abilities as a window into human individuality: the example of face recognition.
Wilmer, Jeremy B; Germine, Laura; Chabris, Christopher F; Chatterjee, Garga; Gerbasi, Margaret; Nakayama, Ken
2012-01-01
Proper characterization of each individual's unique pattern of strengths and weaknesses requires good measures of diverse abilities. Here, we advocate combining our growing understanding of neural and cognitive mechanisms with modern psychometric methods in a renewed effort to capture human individuality through a consideration of specific abilities. We articulate five criteria for the isolation and measurement of specific abilities, then apply these criteria to face recognition. We cleanly dissociate face recognition from more general visual and verbal recognition. This dissociation stretches across ability as well as disability, suggesting that specific developmental face recognition deficits are a special case of a broader specificity that spans the entire spectrum of human face recognition performance. Item-by-item results from 1,471 web-tested participants, included as supplementary information, fuel item analyses, validation, norming, and item response theory (IRT) analyses of our three tests: (a) the widely used Cambridge Face Memory Test (CFMT); (b) an Abstract Art Memory Test (AAMT), and (c) a Verbal Paired-Associates Memory Test (VPMT). The availability of this data set provides a solid foundation for interpreting future scores on these tests. We argue that the allied fields of experimental psychology, cognitive neuroscience, and vision science could fuel the discovery of additional specific abilities to add to face recognition, thereby providing new perspectives on human individuality.
Project Morpheus Main Engine Development and Preliminary Flight Testing
NASA Technical Reports Server (NTRS)
Morehead, Robert L.
2011-01-01
A LOX/Methane rocket engine was developed for a prototype terrestrial lander and then used to fly the lander at Johnson Space Center. The development path of this engine is outlined, including unique items such as variable acoustic damping and variable film cooling.
Using more different and more familiar targets improves the detection of concealed information.
Suchotzki, Kristina; De Houwer, Jan; Kleinberg, Bennett; Verschuere, Bruno
2018-04-01
When embedded among a number of plausible irrelevant options, the presentation of critical (e.g., crime-related or autobiographical) information is associated with a marked increase in response time (RT). This RT effect crucially depends on the inclusion of a target/non-target discrimination task with targets being a dedicated set of items that require a unique response (press YES; for all other items press NO). Targets may be essential because they share a feature - familiarity - with the critical items. Whereas irrelevant items have not been encountered before, critical items are known from the event or the facts of the investigation. Target items are usually learned before the test, and thereby made familiar to the participants. Hence, familiarity-based responding needs to be inhibited on the critical items and may therefore explain the RT increase on the critical items. This leads to the hypothesis that the more participants rely on familiarity, the more pronounced the RT increase on critical items may be. We explored two ways to increase familiarity-based responding: (1) Increasing the number of different target items, and (2) using familiar targets. In two web-based studies (n = 357 and n = 499), both the number of different targets and the use of familiar targets facilitated concealed information detection. The effect of the number of different targets was small yet consistent across both studies, the effect of target familiarity was large in both studies. Our results support the role of familiarity-based responding in the Concealed Information Test and point to ways on how to improve validity of the Concealed Information Test. Copyright © 2018 Elsevier B.V. All rights reserved.
Acquisition of generic memory in amnesia.
Verfaellie, M; Cermak, L S
1994-06-01
Amnesic patients' ability to acquire generic, semantic information was assessed relative to their own level of episodic memory. Patients studied a list of words in which some items were presented twice and others once. Upon each presentation, the words were tagged episodically by presenting them in a unique color. Recall of the colors in which words were presented suggested that individual presentations of repeated items were less likely to be recalled than presentations of nonrepeated items; however, actual recall of repeated items exceeded that of nonrepeated items. This outcome demonstrated that amnesics can recall some items generically without recalling either of their individual presentations. However, amnesics' recall of twice-presented items remained far below that of the control group, even when their recall of once-presented items was matched by testing the control group after a delay. This finding suggests that amnesic patients can acquire new generic knowledge but do so much less efficiently than do normal individuals. Furthermore, this deficit occurs independently of the amnesics' episodic memory impairments, reflecting instead a disruption in semantic learning per se.
The EORTC CAT Core-The computer adaptive version of the EORTC QLQ-C30 questionnaire.
Petersen, Morten Aa; Aaronson, Neil K; Arraras, Juan I; Chie, Wei-Chu; Conroy, Thierry; Costantini, Anna; Dirven, Linda; Fayers, Peter; Gamper, Eva-Maria; Giesinger, Johannes M; Habets, Esther J J; Hammerlid, Eva; Helbostad, Jorunn; Hjermstad, Marianne J; Holzner, Bernhard; Johnson, Colin; Kemmler, Georg; King, Madeleine T; Kaasa, Stein; Loge, Jon H; Reijneveld, Jaap C; Singer, Susanne; Taphoorn, Martin J B; Thamsborg, Lise H; Tomaszewski, Krzysztof A; Velikova, Galina; Verdonck-de Leeuw, Irma M; Young, Teresa; Groenvold, Mogens
2018-06-21
To optimise measurement precision, relevance to patients and flexibility, patient-reported outcome measures (PROMs) should ideally be adapted to the individual patient/study while retaining direct comparability of scores across patients/studies. This is achievable using item banks and computerised adaptive tests (CATs). The European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Questionnaire Core 30 (QLQ-C30) is one of the most widely used PROMs in cancer research and clinical practice. Here we provide an overview of the research program to develop CAT versions of the QLQ-C30's 14 functional and symptom domains. The EORTC Quality of Life Group's strategy for developing CAT item banks consists of: literature search to identify potential candidate items; formulation of new items compatible with the QLQ-C30 item style; expert evaluations and patient interviews; field-testing and psychometric analyses, including factor analysis, item response theory calibration and simulation of measurement properties. In addition, software for setting up, running and scoring CAT has been developed. Across eight rounds of data collections, 9782 patients were recruited from 12 countries for the field-testing. The four phases of development resulted in a total of 260 unique items across the 14 domains. Each item bank consists of 7-34 items. Psychometric evaluations indicated higher measurement precision and increased statistical power of the CAT measures compared to the QLQ-C30 scales. Using CAT, sample size requirements may be reduced by approximately 20-35% on average without loss of power. The EORTC CAT Core represents a more precise, powerful and flexible measurement system than the QLQ-C30. It is currently being validated in a large independent, international sample of cancer patients. Copyright © 2018 Elsevier Ltd. All rights reserved.
Capturing specific abilities as a window into human individuality: The example of face recognition
Wilmer, Jeremy B.; Germine, Laura; Chabris, Christopher F.; Chatterjee, Garga; Gerbasi, Margaret; Nakayama, Ken
2013-01-01
Proper characterization of each individual's unique pattern of strengths and weaknesses requires good measures of diverse abilities. Here, we advocate combining our growing understanding of neural and cognitive mechanisms with modern psychometric methods in a renewed effort to capture human individuality through a consideration of specific abilities. We articulate five criteria for the isolation and measurement of specific abilities, then apply these criteria to face recognition. We cleanly dissociate face recognition from more general visual and verbal recognition. This dissociation stretches across ability as well as disability, suggesting that specific developmental face recognition deficits are a special case of a broader specificity that spans the entire spectrum of human face recognition performance. Item-by-item results from 1,471 web-tested participants, included as supplementary information, fuel item analyses, validation, norming, and item response theory (IRT) analyses of our three tests: (a) the widely used Cambridge Face Memory Test (CFMT); (b) an Abstract Art Memory Test (AAMT), and (c) a Verbal Paired-Associates Memory Test (VPMT). The availability of this data set provides a solid foundation for interpreting future scores on these tests. We argue that the allied fields of experimental psychology, cognitive neuroscience, and vision science could fuel the discovery of additional specific abilities to add to face recognition, thereby providing new perspectives on human individuality. PMID:23428079
Davies, Louise; Donnelly, Kyla Z; Goodman, Daisy J; Ogrinc, Greg
2016-04-01
The Standards for Quality Improvement Reporting Excellence (SQUIRE) Guideline was published in 2008 (SQUIRE 1.0) and was the first publication guideline specifically designed to advance the science of healthcare improvement. Advances in the discipline of improvement prompted us to revise it. We adopted a novel approach to the revision by asking end-users to 'road test' a draft version of SQUIRE 2.0. The aim was to determine whether they understood and implemented the guidelines as intended by the developers. Forty-four participants were assigned a manuscript section (ie, introduction, methods, results, discussion) and asked to use the draft Guidelines to guide their writing process. They indicated the text that corresponded to each SQUIRE item used and submitted it along with a confidential survey. The survey examined usability of the Guidelines using Likert-scaled questions and participants' interpretation of key concepts in SQUIRE using open-ended questions. On the submitted text, we evaluated concordance between participants' item usage/interpretation and the developers' intended application. For the survey, the Likert-scaled responses were summarised using descriptive statistics and the open-ended questions were analysed by content analysis. Consistent with the SQUIRE Guidelines' recommendation that not every item be included, less than one-third (n=14) of participants applied every item in their section in full. Of the 85 instances when an item was partially used or was omitted, only 7 (8.2%) of these instances were due to participants not understanding the item. Usage of Guideline items was highest for items most similar to standard scientific reporting (ie, 'Specific aim of the improvement' (introduction), 'Description of the improvement' (methods) and 'Implications for further studies' (discussion)) and lowest (<20% of the time) for those unique to healthcare improvement (ie, 'Assessment methods for context factors that contributed to success or failure' and 'Costs and strategic trade-offs'). Items unique to healthcare improvement, specifically 'Evolution of the improvement', 'Context elements that influenced the improvement', 'The logic on which the improvement was based', 'Process and outcome measures', demonstrated poor concordance between participants' interpretation and developers' intended application. User testing of a draft version of SQUIRE 2.0 revealed which items have poor concordance between developer intent and author usage, which will inform final editing of the Guideline and development of supporting supplementary materials. It also identified the items that require special attention when teaching about scholarly writing in healthcare improvement. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Predicting fatty acid profiles in blood based on food intake and the FADS1 rs174546 SNP.
Hallmann, Jacqueline; Kolossa, Silvia; Gedrich, Kurt; Celis-Morales, Carlos; Forster, Hannah; O'Donovan, Clare B; Woolhead, Clara; Macready, Anna L; Fallaize, Rosalind; Marsaux, Cyril F M; Lambrinou, Christina-Paulina; Mavrogianni, Christina; Moschonis, George; Navas-Carretero, Santiago; San-Cristobal, Rodrigo; Godlewska, Magdalena; Surwiłło, Agnieszka; Mathers, John C; Gibney, Eileen R; Brennan, Lorraine; Walsh, Marianne C; Lovegrove, Julie A; Saris, Wim H M; Manios, Yannis; Martinez, Jose Alfredo; Traczyk, Iwona; Gibney, Michael J; Daniel, Hannelore
2015-12-01
A high intake of n-3 PUFA provides health benefits via changes in the n-6/n-3 ratio in blood. In addition to such dietary PUFAs, variants in the fatty acid desaturase 1 (FADS1) gene are also associated with altered PUFA profiles. We used mathematical modeling to predict levels of PUFA in whole blood, based on multiple hypothesis testing and bootstrapped LASSO selected food items, anthropometric and lifestyle factors, and the rs174546 genotypes in FADS1 from 1607 participants (Food4Me Study). The models were developed using data from the first reported time point (training set) and their predictive power was evaluated using data from the last reported time point (test set). Among other food items, fish, pizza, chicken, and cereals were identified as being associated with the PUFA profiles. Using these food items and the rs174546 genotypes as predictors, models explained 26-43% of the variability in PUFA concentrations in the training set and 22-33% in the test set. Selecting food items using multiple hypothesis testing is a valuable contribution to determine predictors, as our models' predictive power is higher compared to analogue studies. As unique feature, we additionally confirmed our models' power based on a test set. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Clinton-McHarg, Tara; Carey, Mariko; Sanson-Fisher, Rob; D'Este, Catherine; Shakeshaft, Anthony
2012-01-30
Adolescents and young adult (AYA) cancer survivors may have unique physical, psychological and social needs due to their cancer occurring at a critical phase of development. The aim of this study was to develop a psychometrically rigorous measure of unmet need to capture the specific needs of this group. Items were developed following a comprehensive literature review, focus groups with AYAs, and feedback from health care providers, researchers and other professionals. The measure was pilot tested with 32 AYA cancer survivors recruited through a state-based cancer registry to establish face and content validity. A main sample of 139 AYA cancer patients and survivors were recruited through seven treatment centres and invited to complete the questionnaire. To establish test-retest reliability, a sub-sample of 34 participants completed the measure a second time. Exploratory factor analysis was performed and the measure was assessed for internal consistency, discriminative validity, potential responsiveness and acceptability. The Cancer Needs Questionnaire - Young People (CNQ-YP) has established face and content validity, and acceptability. The final measure has 70 items and six factors: Treatment Environment and Care (33 items); Feelings and Relationships (14 items); Daily Life (12 items); Information and Activities (5 items); Education (3 items); and Work (3 items). All domains achieved Cronbach's alpha values greater than 0.80. Item-to-item test-retest reliability was also high, with all but four items reaching weighted kappa values above 0.60. The CNQ-YP is the first multi-dimensional measure of unmet need which has been developed specifically for AYA cancer patients and survivors. The measure displays a strong factor structure, and excellent internal consistency and test-retest reliability. However, the small sample size has implications for the reliability of the statistical analyses undertaken, particularly the exploratory factor analysis. Future studies with a larger sample are recommended to confirm the factor structure of the measure. Longitudinal studies to establish responsiveness and predictive validity should also be undertaken.
2012-01-01
Background Adolescents and young adult (AYA) cancer survivors may have unique physical, psychological and social needs due to their cancer occurring at a critical phase of development. The aim of this study was to develop a psychometrically rigorous measure of unmet need to capture the specific needs of this group. Methods Items were developed following a comprehensive literature review, focus groups with AYAs, and feedback from health care providers, researchers and other professionals. The measure was pilot tested with 32 AYA cancer survivors recruited through a state-based cancer registry to establish face and content validity. A main sample of 139 AYA cancer patients and survivors were recruited through seven treatment centres and invited to complete the questionnaire. To establish test-retest reliability, a sub-sample of 34 participants completed the measure a second time. Exploratory factor analysis was performed and the measure was assessed for internal consistency, discriminative validity, potential responsiveness and acceptability. Results The Cancer Needs Questionnaire - Young People (CNQ-YP) has established face and content validity, and acceptability. The final measure has 70 items and six factors: Treatment Environment and Care (33 items); Feelings and Relationships (14 items); Daily Life (12 items); Information and Activities (5 items); Education (3 items); and Work (3 items). All domains achieved Cronbach's alpha values greater than 0.80. Item-to-item test-retest reliability was also high, with all but four items reaching weighted kappa values above 0.60. Conclusions The CNQ-YP is the first multi-dimensional measure of unmet need which has been developed specifically for AYA cancer patients and survivors. The measure displays a strong factor structure, and excellent internal consistency and test-retest reliability. However, the small sample size has implications for the reliability of the statistical analyses undertaken, particularly the exploratory factor analysis. Future studies with a larger sample are recommended to confirm the factor structure of the measure. Longitudinal studies to establish responsiveness and predictive validity should also be undertaken. PMID:22284545
Davies, Louise; Donnelly, Kyla Z; Goodman, Daisy J; Ogrinc, Greg
2016-01-01
Background The Standards for Quality Improvement Reporting Excellence (SQUIRE) Guideline was published in 2008 (SQUIRE 1.0) and was the first publication guideline specifically designed to advance the science of healthcare improvement. Advances in the discipline of improvement prompted us to revise it. We adopted a novel approach to the revision by asking end-users to ‘road test’ a draft version of SQUIRE 2.0. The aim was to determine whether they understood and implemented the guidelines as intended by the developers. Methods Forty-four participants were assigned a manuscript section (ie, introduction, methods, results, discussion) and asked to use the draft Guidelines to guide their writing process. They indicated the text that corresponded to each SQUIRE item used and submitted it along with a confidential survey. The survey examined usability of the Guidelines using Likert-scaled questions and participants’ interpretation of key concepts in SQUIRE using open-ended questions. On the submitted text, we evaluated concordance between participants’ item usage/interpretation and the developers’ intended application. For the survey, the Likert-scaled responses were summarised using descriptive statistics and the open-ended questions were analysed by content analysis. Results Consistent with the SQUIRE Guidelines’ recommendation that not every item be included, less than one-third (n=14) of participants applied every item in their section in full. Of the 85 instances when an item was partially used or was omitted, only 7 (8.2%) of these instances were due to participants not understanding the item. Usage of Guideline items was highest for items most similar to standard scientific reporting (ie, ‘Specific aim of the improvement’ (introduction), ‘Description of the improvement’ (methods) and ‘Implications for further studies’ (discussion)) and lowest (<20% of the time) for those unique to healthcare improvement (ie, ‘Assessment methods for context factors that contributed to success or failure’ and ‘Costs and strategic trade-offs’). Items unique to healthcare improvement, specifically ‘Evolution of the improvement’, ‘Context elements that influenced the improvement’, ‘The logic on which the improvement was based’, ‘Process and outcome measures’, demonstrated poor concordance between participants’ interpretation and developers’ intended application. Conclusions User testing of a draft version of SQUIRE 2.0 revealed which items have poor concordance between developer intent and author usage, which will inform final editing of the Guideline and development of supporting supplementary materials. It also identified the items that require special attention when teaching about scholarly writing in healthcare improvement. PMID:26263916
Stepp, Stephanie D; Yu, Lan; Miller, Joshua D; Hallquist, Michael N; Trull, Timothy J; Pilkonis, Paul A
2012-04-01
Mounting evidence suggests that several inventories assessing both normal personality and personality disorders measure common dimensional personality traits (i.e., Antagonism, Constraint, Emotional Instability, Extraversion, and Unconventionality), albeit providing unique information along the underlying trait continuum. We used Widiger and Simonsen's (2005) pantheoretical integrative model of dimensional personality assessment as a guide to create item pools. We then used Item Response Theory (IRT) to compare the assessment of these five personality traits across three established dimensional measures of personality: the Schedule for Nonadaptive and Adaptive Personality (SNAP), the Temperament and Character Inventory (TCI), and the Revised NEO Personality Inventory (NEO PI-R). We found that items from each inventory map onto these five common personality traits in predictable ways. The IRT analyses, however, documented considerable variability in the item and test information derived from each inventory. Our findings support the notion that the integration of multiple perspectives will provide greater information about personality while minimizing the weaknesses of any single instrument.
Stepp, Stephanie D.; Yu, Lan; Miller, Joshua D.; Hallquist, Michael N.; Trull, Timothy J.; Pilkonis, Paul A.
2013-01-01
Mounting evidence suggests that several inventories assessing both normal personality and personality disorders measure common dimensional personality traits (i.e., Antagonism, Constraint, Emotional Instability, Extraversion, and Unconventionality), albeit providing unique information along the underlying trait continuum. We used Widiger and Simonsen’s (2005) pantheoretical integrative model of dimensional personality assessment as a guide to create item pools. We then used Item Response Theory (IRT) to compare the assessment of these five personality traits across three established dimensional measures of personality: the Schedule for Nonadaptive and Adaptive Personality (SNAP), the Temperament and Character Inventory (TCI), and the Revised NEO Personality Inventory (NEO PI-R). We found that items from each inventory map onto these five common personality traits in predictable ways. The IRT analyses, however, documented considerable variability in the item and test information derived from each inventory. Our findings support the notion that the integration of multiple perspectives will provide greater information about personality while minimizing the weaknesses of any single instrument. PMID:22452759
Rats Remember Items in Context Using Episodic Memory.
Panoz-Brown, Danielle; Corbin, Hannah E; Dalecki, Stefan J; Gentry, Meredith; Brotheridge, Sydney; Sluka, Christina M; Wu, Jie-En; Crystal, Jonathon D
2016-10-24
Vivid episodic memories in people have been characterized as the replay of unique events in sequential order [1-3]. Animal models of episodic memory have successfully documented episodic memory of a single event (e.g., [4-8]). However, a fundamental feature of episodic memory in people is that it involves multiple events, and notably, episodic memory impairments in human diseases are not limited to a single event. Critically, it is not known whether animals remember many unique events using episodic memory. Here, we show that rats remember many unique events and the contexts in which the events occurred using episodic memory. We used an olfactory memory assessment in which new (but not old) odors were rewarded using 32 items. Rats were presented with 16 odors in one context and the same odors in a second context. To attain high accuracy, the rats needed to remember item in context because each odor was rewarded as a new item in each context. The demands on item-in-context memory were varied by assessing memory with 2, 3, 5, or 15 unpredictable transitions between contexts, and item-in-context memory survived a 45 min retention interval challenge. When the memory of item in context was put in conflict with non-episodic familiarity cues, rats relied on item in context using episodic memory. Our findings suggest that rats remember multiple unique events and the contexts in which these events occurred using episodic memory and support the view that rats may be used to model fundamental aspects of human cognition. Copyright © 2016 Elsevier Ltd. All rights reserved.
48 CFR 211.274-2 - Policy for item unique identification.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 48 Federal Acquisition Regulations System 3 2014-10-01 2014-10-01 false Policy for item unique identification. 211.274-2 Section 211.274-2 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS SYSTEM, DEPARTMENT OF DEFENSE ACQUISITION PLANNING DESCRIBING AGENCY NEEDS Using and Maintaining...
48 CFR 211.274-2 - Policy for unique item identification.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 48 Federal Acquisition Regulations System 3 2011-10-01 2011-10-01 false Policy for unique item identification. 211.274-2 Section 211.274-2 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS SYSTEM, DEPARTMENT OF DEFENSE ACQUISITION PLANNING DESCRIBING AGENCY NEEDS Using and Maintaining...
48 CFR 211.274-2 - Policy for unique item identification.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 48 Federal Acquisition Regulations System 3 2010-10-01 2010-10-01 false Policy for unique item identification. 211.274-2 Section 211.274-2 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS SYSTEM, DEPARTMENT OF DEFENSE ACQUISITION PLANNING DESCRIBING AGENCY NEEDS Using and Maintaining...
48 CFR 211.274-2 - Policy for unique item identification.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 48 Federal Acquisition Regulations System 3 2013-10-01 2013-10-01 false Policy for unique item identification. 211.274-2 Section 211.274-2 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS SYSTEM, DEPARTMENT OF DEFENSE ACQUISITION PLANNING DESCRIBING AGENCY NEEDS Using and Maintaining...
48 CFR 211.274-2 - Policy for unique item identification.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 48 Federal Acquisition Regulations System 3 2012-10-01 2012-10-01 false Policy for unique item identification. 211.274-2 Section 211.274-2 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS SYSTEM, DEPARTMENT OF DEFENSE ACQUISITION PLANNING DESCRIBING AGENCY NEEDS Using and Maintaining...
Development and psychometric characteristics of the SCI-QOL Pressure Ulcers scale and short form.
Kisala, Pamela A; Tulsky, David S; Choi, Seung W; Kirshblum, Steven C
2015-05-01
To develop a self-reported measure of the subjective impact of pressure ulcers on health-related quality of life (HRQOL) in individuals with spinal cord injury (SCI) as part of the SCI quality of life (SCI-QOL) measurement system. Grounded-theory based qualitative item development methods, large-scale item calibration testing, confirmatory factor analysis (CFA), and item response theory-based psychometric analysis. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. SCI-QOL Pressure Ulcers scale. 189 individuals with traumatic SCI who experienced a pressure ulcer within the past 7 days completed 30 items related to pressure ulcers. CFA confirmed a unidimensional pool of items. IRT analyses were conducted. A constrained Graded Response Model with a constant slope parameter was used to estimate item thresholds for the 12 retained items. The 12-item SCI-QOL Pressure Ulcers scale is unique in that it is specifically targeted to individuals with spinal cord injury and at every stage of development has included input from individuals with SCI. Furthermore, use of CFA and IRT methods provide flexibility and precision of measurement. The scale may be administered in its entirety or as a 7-item "short form" and is available for both research and clinical practice.
Federal Register 2010, 2011, 2012, 2013, 2014
2012-06-15
... DEPARTMENT OF DEFENSE Defense Acquisition Regulations System 48 CFR Parts 211, 212, 218, 246, 252 and Appendix F to Chapter 2 RIN 0750-AH64 Defense Federal Acquisition Regulation Supplement: Item Unique Identifier Update (DFARS Case 2011-D055) AGENCY: Defense Acquisition Regulations System...
Pancreatitis Quality of Life Instrument: Development of a new instrument
Bova, Carol; Barton, Bruce; Hartigan, Celia
2014-01-01
Objectives: The goal of this project was to develop the first disease-specific instrument for the evaluation of quality of life in chronic pancreatitis. Methods: Focus groups and interview sessions were conducted, with chronic pancreatitis patients, to identify items felt to impact quality of life which were subsequently formatted into a paper-and-pencil instrument. This instrument was used to conduct an online survey by an expert panel of pancreatologists to evaluate its content validity. Finally, the modified instrument was presented to patients during precognitive testing interviews to evaluate its clarity and appropriateness. Results: In total, 10 patients were enrolled in the focus groups and interview sessions where they identified 50 items. Once redundant items were removed, the 40 remaining items were made into a paper-and-pencil instrument referred to as the Pancreatitis Quality of Life Instrument. Through the processes of content validation and precognitive testing, the number of items in the instrument was reduced to 24. Conclusions: This marks the development of the first disease-specific instrument to evaluate quality of life in chronic pancreatitis. It includes unique features not found in generic instruments (economic factors, stigma, and spiritual factors). Although this marks a giant step forward, psychometric evaluation is still needed prior to its clinical use. PMID:26770703
Kertesz, Stefan. G.; Pollio, David E.; Jones, Richard N.; Steward, Jocelyn; Stringfellow, Erin J.; Gordon, Adam J.; Johnson, Nancy K.; Kim, Theresa A.; Granstaff, Unita; Austin, Erika L.; Young, Alexander S.; Golden, Joya; Davis, Lori L.; Roth, David L.; Holt, Cheryl L.
2015-01-01
Background Homeless patients face unique challenges in obtaining primary care responsive to their needs and context. Patient experience questionnaires could permit assessment of patient-centered medical homes for this population, but standard instruments may not reflect homeless patients' priorities and concerns. Objectives This report describes (a) the content and psychometric properties of a new primary care questionnaire for homeless patients and (b) the methods utilized in its development. Methods Starting with quality-related constructs from the Institute of Medicine, we identified relevant themes by interviewing homeless patients and experts in their care. A multidisciplinary team drafted a preliminary set of 78 items. This was administered to homeless-experienced clients (n=563) across 3 VA facilities and 1 non-VA Health Care for the Homeless Program. Using Item Response Theory, we examined Test Information Function curves to eliminate less informative items and devise plausibly distinct subscales. Results The resulting 33-item instrument (Primary Care Quality-Homeless, PCQ-H) has four subscales: Patient-Clinician Relationship (15 items), Cooperation among Clinicians (3 items), Access/Coordination (11 items) and Homeless-Specific Needs (4 items). Evidence for divergent and convergent validity is provided. Test Information Function (TIF) graphs showed adequate informational value to permit inferences about groups for 3 subscales (Relationship, Cooperation and Access/Coordination). The 3-item Cooperation subscale had lower informational value (TIF<5) but had good internal consistency (alpha=0.75) and patients frequently reported problems in this aspect of care. Conclusions Systematic application of qualitative and quantitative methods supported the development of a brief patient-reported questionnaire focused on the primary care of homeless patients and offers guidance for future population-specific instrument development. PMID:25023918
Brockmole, James R; Boot, Walter R
2009-06-01
Distinctive aspects of a scene can capture attention even when they are irrelevant to one's goals. The authors address whether visually unique, unexpected, but task-irrelevant features also tend to hold attention. Observers searched through displays in which the color of each item was irrelevant. At the start of search, all objects changed color. Critically, the foveated item changed to an unexpected color (it was novel), became a color singleton (it was unique), or both. Saccade latency revealed the time required to disengage overt attention from this object. Singletons resulted in longer latencies, but only if they were unexpected. Conversely, unexpected items only delayed disengagement if they were singletons. Thus, the time spent overtly attending to an object is determined, at least in part, by task-irrelevant stimulus properties, but this depends on the confluence of expectation and visual salience. (c) 2009 APA, all rights reserved.
Measuring Sports Class Learning Climates: The Development of the Sports Class Environment Scale
ERIC Educational Resources Information Center
Dowdell, Trevor; Tomson, L. Mich; Davies, Michael
2011-01-01
The development and validation of a new and unique learning climate instrument, the Sports Class Environment Scale (SCES), was the focus of this study. We began with a consolidation of the dimensions and items of the Perceived Motivational Climate in Sport Questionnaire-2 and the Classroom Environment Scale. Field-testing of the SCES involved 204…
Testing effects of free recall on organization in whole/part and part/whole transfer.
Bacso, Sarah A; Marmurek, Harvey H C
2016-11-01
Testing of to-be-learned material facilitates subsequent learning of new material. We investigated this forward effect of testing in two experiments using the whole/part and part/whole transfer paradigms with categorized word lists. Learning was assessed for recall of individual words, higher order categories, and category clustering. In each experiment participants learned two lists in which the number of tests on the first list was varied. The first list contained either twice as many items as the second list (whole/part paradigm) or half as many items as the second list (part/whole paradigm). In the experimental condition, the part list contained half the items of the whole list. In the control condition, the two lists were unique. In the whole/part paradigm, learning of the part list was poorer in the experimental than in the control condition. Although testing during whole list learning facilitated learning of the part list, it did not moderate the negative transfer effect. In the part/whole paradigm, learning of the whole list was better in the experimental than in the control condition, and this positive transfer effect was strengthened by repeated testing of the part list. The findings are discussed in the context of discrimination and encoding explanations of the forward effect of testing. Copyright © 2016 Elsevier B.V. All rights reserved.
Evaluation of item candidates for a diabetic retinopathy quality of life item bank.
Fenwick, Eva K; Pesudovs, Konrad; Khadka, Jyoti; Rees, Gwyn; Wong, Tien Y; Lamoureux, Ecosse L
2013-09-01
We are developing an item bank assessing the impact of diabetic retinopathy (DR) on quality of life (QoL) using a rigorous multi-staged process combining qualitative and quantitative methods. We describe here the first two qualitative phases: content development and item evaluation. After a comprehensive literature review, items were generated from four sources: (1) 34 previously validated patient-reported outcome measures; (2) five published qualitative articles; (3) eight focus groups and 18 semi-structured interviews with 57 DR patients; and (4) seven semi-structured interviews with diabetes or ophthalmic experts. Items were then evaluated during 3 stages, namely binning (grouping) and winnowing (reduction) based on key criteria and panel consensus; development of item stems and response options; and pre-testing of items via cognitive interviews with patients. The content development phase yielded 1,165 unique items across 7 QoL domains. After 3 sessions of binning and winnowing, items were reduced to a minimally representative set (n = 312) across 9 domains of QoL: visual symptoms; ocular surface symptoms; activity limitation; mobility; emotional; health concerns; social; convenience; and economic. After 8 cognitive interviews, 42 items were amended resulting in a final set of 314 items. We have employed a systematic approach to develop items for a DR-specific QoL item bank. The psychometric properties of the nine QoL subscales will be assessed using Rasch analysis. The resulting validated item bank will allow clinicians and researchers to better understand the QoL impact of DR and DR therapies from the patient's perspective.
Development and psychometric characteristics of the SCI-QOL Pressure Ulcers scale and short form
Kisala, Pamela A.; Tulsky, David S.; Choi, Seung W.; Kirshblum, Steven C.
2015-01-01
Objective To develop a self-reported measure of the subjective impact of pressure ulcers on health-related quality of life (HRQOL) in individuals with spinal cord injury (SCI) as part of the SCI quality of life (SCI-QOL) measurement system. Design Grounded-theory based qualitative item development methods, large-scale item calibration testing, confirmatory factor analysis (CFA), and item response theory-based psychometric analysis. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Main Outcome Measures SCI-QOL Pressure Ulcers scale. Results 189 individuals with traumatic SCI who experienced a pressure ulcer within the past 7 days completed 30 items related to pressure ulcers. CFA confirmed a unidimensional pool of items. IRT analyses were conducted. A constrained Graded Response Model with a constant slope parameter was used to estimate item thresholds for the 12 retained items. Conclusions The 12-item SCI-QOL Pressure Ulcers scale is unique in that it is specifically targeted to individuals with spinal cord injury and at every stage of development has included input from individuals with SCI. Furthermore, use of CFA and IRT methods provide flexibility and precision of measurement. The scale may be administered in its entirety or as a 7-item “short form” and is available for both research and clinical practice. PMID:26010965
Storbeck, Justin
2013-01-01
I investigated whether negative affective states enhance encoding of and memory for item-specific information reducing false memories. Positive, negative, and neutral moods were induced, and participants then completed a Deese-Roediger-McDermott (DRM) false-memory task. List items were presented in unique spatial locations or unique fonts to serve as measures for item-specific encoding. The negative mood conditions had more accurate memories for item-specific information, and they also had fewer false memories. The final experiment used a manipulation that drew attention to distinctive information, which aided learning for DRM words, but also promoted item-specific encoding. For the condition that promoted item-specific encoding, false memories were reduced for positive and neutral mood conditions to a rate similar to that of the negative mood condition. These experiments demonstrated that negative affective cues promote item-specific processing reducing false memories. People in positive and negative moods encode events differently creating different memories for the same event.
Benefits of flexible prioritization in working memory can arise without costs.
Myers, Nicholas E; Chekroud, Sammi R; Stokes, Mark G; Nobre, Anna C
2018-03-01
Most recent models conceptualize working memory (WM) as a continuous resource, divided up according to task demands. When an increasing number of items need to be remembered, each item receives a smaller chunk of the memory resource. These models predict that the allocation of attention to high-priority WM items during the retention interval should be a zero-sum game: improvements in remembering cued items come at the expense of uncued items because resources are dynamically transferred from uncued to cued representations. The current study provides empirical data challenging this model. Four precision retrocueing WM experiments assessed cued and uncued items on every trial. This permitted a test for trade-off of the memory resource. We found no evidence for trade-offs in memory across trials. Moreover, robust improvements in WM performance for cued items came at little or no cost to uncued items that were probed afterward, thereby increasing the net capacity of WM relative to neutral cueing conditions. An alternative mechanism of prioritization proposes that cued items are transferred into a privileged state within a response-gating bottleneck, in which an item uniquely controls upcoming behavior. We found evidence consistent with this alternative. When an uncued item was probed first, report of its orientation was biased away from the cued orientation to be subsequently reported. We interpret this bias as competition for behavioral control in the output-driving bottleneck. Other items in WM did not bias each other, making this result difficult to explain with a shared resource model. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Items Supporting the Hanford Internal Dosimetry Program Implementation of the IMBA Computer Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carbaugh, Eugene H.; Bihl, Donald E.
2008-01-07
The Hanford Internal Dosimetry Program has adopted the computer code IMBA (Integrated Modules for Bioassay Analysis) as its primary code for bioassay data evaluation and dose assessment using methodologies of ICRP Publications 60, 66, 67, 68, and 78. The adoption of this code was part of the implementation plan for the June 8, 2007 amendments to 10 CFR 835. This information release includes action items unique to IMBA that were required by PNNL quality assurance standards for implementation of safety software. Copie of the IMBA software verification test plan and the outline of the briefing given to new users aremore » also included.« less
NASA Technical Reports Server (NTRS)
1999-01-01
The full complement of EDOMP investigations called for a broad spectrum of flight hardware ranging from commercial items, modified for spaceflight, to custom designed hardware made to meet the unique requirements of testing in the space environment. In addition, baseline data collection before and after spaceflight required numerous items of ground-based hardware. Two basic categories of ground-based hardware were used in EDOMP testing before and after flight: (1) hardware used for medical baseline testing and analysis, and (2) flight-like hardware used both for astronaut training and medical testing. To ensure post-landing data collection, hardware was required at both the Kennedy Space Center (KSC) and the Dryden Flight Research Center (DFRC) landing sites. Items that were very large or sensitive to the rigors of shipping were housed permanently at the landing site test facilities. Therefore, multiple sets of hardware were required to adequately support the prime and backup landing sites plus the Johnson Space Center (JSC) laboratories. Development of flight hardware was a major element of the EDOMP. The challenges included obtaining or developing equipment that met the following criteria: (1) compact (small size and light weight), (2) battery-operated or requiring minimal spacecraft power, (3) sturdy enough to survive the rigors of spaceflight, (4) quiet enough to pass acoustics limitations, (5) shielded and filtered adequately to assure electromagnetic compatibility with spacecraft systems, (6) user-friendly in a microgravity environment, and (7) accurate and efficient operation to meet medical investigative requirements.
Identifying content for the glaucoma-specific item bank to measure quality-of-life parameters.
Khadka, Jyoti; McAlinden, Colm; Craig, Jamie E; Fenwick, Eva K; Lamoureux, Ecosse L; Pesudovs, Konrad
2015-01-01
Patient-reported outcomes (PROs) have become essential clinical trial end points. However, a comprehensive, multidimensional, patient-relevant, and precise glaucoma-specific PRO instrument is not available. Therefore, the purpose of this study was to identify content for a new, glaucoma-specific, quality-of-life (QOL) item bank. Content identification was undertaken in 5 phases: (1) identification of extant items in glaucoma-specific instruments and the qualitative literature; (2) focus groups and interviews with glaucoma patients; (3) item classification and selection; (4) expert review and revision of items; and (5) cognitive interviews with patients. A total of 737 unique items (extant items from PRO instruments, 247; qualitative articles, 14 items; focus groups and semistructured interviews, 476 items) were identified. These items were classified into 10 QOL domains. Four criteria (item redundancy, item inconsistent with domain definition, item content too narrow to have wider applicability, and item clarity) were used to remove and refine the items. After the cognitive interviews, the final minimally representative item set had a total of 342 unique items belonging to 10 domains: activity limitation (88), mobility (20), visual symptoms (19), ocular surface symptoms (22), general symptoms (15), convenience (39), health concerns (45), emotional well-being (49), social issues (23), and economic issues (22). The systematic content identification process identified 10 QOL domains, which were important to patients with glaucoma. The majority of the items were identified from the patient-specific focus groups and semistructured interviews suggesting that the existing PRO instruments do not adequately address QOL issues relevant to individuals with glaucoma.
ERIC Educational Resources Information Center
Brockmole, James R.; Boot, Walter R.
2009-01-01
Distinctive aspects of a scene can capture attention even when they are irrelevant to one's goals. The authors address whether visually unique, unexpected, but task-irrelevant features also tend to hold attention. Observers searched through displays in which the color of each item was irrelevant. At the start of search, all objects changed color.…
48 CFR 252.211-7003 - Item identification and valuation.
Code of Federal Regulations, 2013 CFR
2013-10-01
..., used to retrieve data encoded on machine-readable media. Concatenated unique item identifier means— (1... (or controlling) authority for the enterprise identifier. Item means a single hardware article or a...-readable means an automatic identification technology media, such as bar codes, contact memory buttons...
48 CFR 252.211-7003 - Item identification and valuation.
Code of Federal Regulations, 2011 CFR
2011-10-01
..., used to retrieve data encoded on machine-readable media. Concatenated unique item identifier means— (1... (or controlling) authority for the enterprise identifier. Item means a single hardware article or a...-readable means an automatic identification technology media, such as bar codes, contact memory buttons...
48 CFR 252.211-7003 - Item identification and valuation.
Code of Federal Regulations, 2012 CFR
2012-10-01
..., used to retrieve data encoded on machine-readable media. Concatenated unique item identifier means— (1... (or controlling) authority for the enterprise identifier. Item means a single hardware article or a...-readable means an automatic identification technology media, such as bar codes, contact memory buttons...
Wang, Chun; Zheng, Yi; Chang, Hua-Hua
2014-01-01
With the advent of web-based technology, online testing is becoming a mainstream mode in large-scale educational assessments. Most online tests are administered continuously in a testing window, which may post test security problems because examinees who take the test earlier may share information with those who take the test later. Researchers have proposed various statistical indices to assess the test security, and one most often used index is the average test-overlap rate, which was further generalized to the item pooling index (Chang & Zhang, 2002, 2003). These indices, however, are all defined as the means (that is, the expected proportion of common items among examinees) and they were originally proposed for computerized adaptive testing (CAT). Recently, multistage testing (MST) has become a popular alternative to CAT. The unique features of MST make it important to report not only the mean, but also the standard deviation (SD) of test overlap rate, as we advocate in this paper. The standard deviation of test overlap rate adds important information to the test security profile, because for the same mean, a large SD reflects that certain groups of examinees share more common items than other groups. In this study, we analytically derived the lower bounds of the SD under MST, with the results under CAT as a benchmark. It is shown that when the mean overlap rate is the same between MST and CAT, the SD of test overlap tends to be larger in MST. A simulation study was conducted to provide empirical evidence. We also compared the security of MST under the single-pool versus the multiple-pool designs; both analytical and simulation studies show that the non-overlapping multiple-pool design will slightly increase the security risk.
ERIC Educational Resources Information Center
Carlson, James E.
2014-01-01
A little-known theorem, a generalization of Pythagoras's theorem, due to Pappus, is used to present a geometric explanation of various definitions of the contribution of component tests to their composite. I show that an unambiguous definition of the unique contribution of a component to the composite score variance is present if and only if the…
Studying Irony Detection Beyond Ironic Criticism: Let's Include Ironic Praise
Bruntsch, Richard; Ruch, Willibald
2017-01-01
Studies of irony detection have commonly used ironic criticisms (i.e., mock positive evaluation of negative circumstances) as stimulus materials. Another basic type of verbal irony, ironic praise (i.e., mock negative evaluation of positive circumstances) is largely absent from studies on individuals' aptitude to detect verbal irony. However, it can be argued that ironic praise needs to be considered in order to investigate the detection of irony in the variety of its facets. To explore whether the detection ironic praise has a benefit beyond ironic criticism, three studies were conducted. In Study 1, an instrument (Test of Verbal Irony Detection Aptitude; TOVIDA) was constructed and its factorial structure was tested using N = 311 subjects. The TOVIDA contains 26 scenario-based items and contains two scales for the detection of ironic criticism vs. ironic praise. To validate the measurement method, the two scales of the TOVIDA were experimentally evaluated with N = 154 subjects in Study 2. In Study 3, N = 183 subjects were tested to explore personality and ability correlates of the two TOVIDA scales. Results indicate that the co-variance between the ironic TOVIDA items was organized by two inter-correlated but distinct factors: one representing ironic praise detection aptitude and one representing ironic criticism detection aptitude. Experimental validation showed that the TOVIDA items truly contain irony and that item scores reflect irony detection. Trait bad mood and benevolent humor (as a facet of the sense of humor) were found as joint correlates for both ironic criticism and ironic praise detection scores. In contrast, intelligence, trait cheerfulness, and corrective humor were found as unique correlates of ironic praise detection scores, even when statistically controlling for the aptitude to detect ironic criticism. Our results indicate that the aptitude to detect ironic praise can be seen as distinct from the aptitude to detect ironic criticism. Generating unique variance in irony detection, ironic praise can be postulated as worthwhile to include in future studies—especially when studying the role of mental ability, personality, and humor in irony detection. PMID:28484409
48 CFR 252.211-7003 - Item identification and valuation.
Code of Federal Regulations, 2010 CFR
2010-10-01
..., used to retrieve data encoded on machine-readable media. Concatenated unique item identifier means— (1... Defense Logistics Information System (DLIS) Commercial and Government Entity (CAGE) Code). Issuing agency... identifier. Item means a single hardware article or a single unit formed by a grouping of subassemblies...
Hippocampus is required for paired associate memory with neither delay nor trial uniqueness
Yoon, Jinah; Seo, Yeran; Kim, Jangjin; Lee, Inah
2012-01-01
Cued retrieval of memory is typically examined with delay when testing hippocampal functions, as in delayed matching-to-sample tasks. Equally emphasized in the literature, on the other hand, is the hippocampal involvement in making arbitrary associations. Paired associate memory tasks are widely used for examining this function. However, the two variables (i.e., delay and paired association) were often mixed in paired associate tasks, and this makes it difficult to localize the cognitive source of deficits with hippocampal perturbation. Specifically, a few studies have recently shown that rats can learn arbitrary paired associations between certain locations and nonspatial items (e.g., object or flavor) and later can retrieve the paired location when cued by the item remotely. Such tasks involve both (1) delay between sampling the cue and retrieving the target location and (2) arbitrary association between the cueing object and its paired location. Here, we tested whether delay was necessary in a cued paired associate task by using a task in which no delay existed between object cueing and the choice of its paired associate. Moreover, fixed associative relationships between the cueing objects and their paired locations were repeatedly used, thus involving no trial-unique association. Nevertheless, inactivations of the dorsal hippocampus with muscimol severely disrupted retrieval of paired associates, whereas the same manipulations did not affect discriminating individual objects or locations. The results powerfully demonstrate that the hippocampus is inherently required for retrieving paired associations between objects and places, and that delay and trial uniqueness of the paired associates are not necessarily required. PMID:22174309
Reward associations impact both iconic and visual working memory.
Infanti, Elisa; Hickey, Clayton; Turatto, Massimo
2015-02-01
Reward plays a fundamental role in human behavior. A growing number of studies have shown that stimuli associated with reward become salient and attract attention. The aim of the present study was to extend these results into the investigation of iconic memory and visual working memory. In two experiments we asked participants to perform a visual-search task where different colors of the target stimuli were paired with high or low reward. We then tested whether the pre-established feature-reward association affected performance on a subsequent visual memory task, in which no reward was provided. In this test phase participants viewed arrays of 8 objects, one of which had unique color that could match the color associated with reward during the previous visual-search task. A probe appeared at varying intervals after stimulus offset to identify the to-be-reported item. Our results suggest that reward biases the encoding of visual information such that items characterized by a reward-associated feature interfere with mnemonic representations of other items in the test display. These results extend current knowledge regarding the influence of reward on early cognitive processes, suggesting that feature-reward associations automatically interact with the encoding and storage of visual information, both in iconic memory and visual working memory. Copyright © 2014 Elsevier Ltd. All rights reserved.
Mazefsky, Carla A; Yu, Lan; White, Susan W; Siegel, Matthew; Pilkonis, Paul A
2018-06-01
Individuals with autism spectrum disorder (ASD) often present with prominent emotion dysregulation that requires treatment but can be difficult to measure. The Emotion Dysregulation Inventory (EDI) was created using methods developed by the Patient-Reported Outcomes Measurement Information System (PROMIS ® ) to capture observable indicators of poor emotion regulation. Caregivers of 1,755 youth with ASD completed 66 candidate EDI items, and the final 30 items were selected based on classical test theory and item response theory (IRT) analyses. The analyses identified two factors: (a) Reactivity, characterized by intense, rapidly escalating, sustained, and poorly regulated negative emotional reactions, and (b) Dysphoria, characterized by anhedonia, sadness, and nervousness. The final items did not show differential item functioning (DIF) based on gender, age, intellectual ability, or verbal ability. Because the final items were calibrated using IRT, even a small number of items offers high precision, minimizing respondent burden. IRT co-calibration of the EDI with related measures demonstrated its superiority in assessing the severity of emotion dysregulation with as few as seven items. Validity of the EDI was supported by expert review, its association with related constructs (e.g., anxiety and depression symptoms, aggression), higher scores in psychiatric inpatients with ASD compared to a community ASD sample, and demonstration of test-retest stability and sensitivity to change. In sum, the EDI provides an efficient and sensitive method to measure emotion dysregulation for clinical assessment, monitoring, and research in youth with ASD of any level of cognitive or verbal ability. Autism Res 2018, 11: 928-941. © 2018 International Society for Autism Research, Wiley Periodicals, Inc. This paper describes a new measure of poor emotional control called the Emotion Dysregulation Inventory (EDI). Caregivers of 1,755 youth with ASD completed candidate items, and advanced statistical techniques were applied to identify the best final items. The EDI is unique because it captures common emotional problems in ASD and is appropriate for both nonverbal and verbal youth. It is an efficient and sensitive measure for use in clinical assessments, monitoring, and research with youth with ASD. © 2018 International Society for Autism Research, Wiley Periodicals, Inc.
Balsis, Steve; Choudhury, Tabina K; Geraci, Lisa; Benge, Jared F; Patrick, Christopher J
2018-04-01
Alzheimer's disease (AD) affects neurological, cognitive, and behavioral processes. Thus, to accurately assess this disease, researchers and clinicians need to combine and incorporate data across these domains. This presents not only distinct methodological and statistical challenges but also unique opportunities for the development and advancement of psychometric techniques. In this article, we describe relatively recent research using item response theory (IRT) that has been used to make progress in assessing the disease across its various symptomatic and pathological manifestations. We focus on applications of IRT to improve scoring, test development (including cross-validation and adaptation), and linking and calibration. We conclude by describing potential future multidimensional applications of IRT techniques that may improve the precision with which AD is measured.
Disposition of Chicago Pile 5 (CP-5) Converter Tubes in the 10-160B Cask
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pancake, Daniel C.; Rock, Cynthia
This paper will focus on the unique characterization, packaging, and transportation issues associated with the disposition of the two CP-5 Converter Tube assemblies from Argonne National Laboratory. The converter tubes were constructed of combinations of HEU and alloys of zirconium, and were part of the original research facilities attached to the CP-5 reactor during operating evolutions. These assemblies were heavily irradiated during their operational lifetime, and were segregated from the balance of irradiated test specimens when the reactor was deactivated and slated for Decontamination and Demolition (D&D). In addition, the substantial contribution of fissile material to the assemblies’ inventory mademore » the potential disposition pathways extremely challenging. As a result, these items became part of Argonne’s legacy “nuclear footprint”, and were added to the Nuclear Footprint Reduction Project scope for disposition. The Project was responsible for the size reduction and characterization of these items, as well as the ultimate disposition. After negotiating a disposal pathway for these tubes, there were significant transportation issues that required a small team to overcome, in order to successfully ship these items to the Nevada National Security Site (NNSS). The Project team at Argonne, technical support from transportation specialists, licensing support from the 10-160B license owner, the Savanah River National Lab (SRNL) Packaging Certification Team (PCT, and the DOE EM-33 staff contributed to license and safety analysis report amendments that eventually authorized the shipment of the material. The paper will identify the organizations, and the specific actions, required to successfully make three “one of a kind” shipments of irradiated test specimen material. This will include the unique packaging configurations, contents modification for the cask license (via the Amendment process), criticality evaluations, and associated review and approval processes.« less
O'Connor, A M; Sargeant, J M; Gardner, I A; Dickson, J S; Torrence, M E; Dewey, C E; Dohoo, I R; Evans, R B; Gray, J T; Greiner, M; Keefe, G; Lefebvre, S L; Morley, P S; Ramirez, A; Sischo, W; Smith, D R; Snedeker, K; Sofos, J; Ward, M P; Wills, R
2010-01-01
The conduct of randomized controlled trials in livestock with production, health, and food-safety outcomes presents unique challenges that may not be adequately reported in trial reports. The objective of this project was to modify the CONSORT (Consolidated Standards of Reporting Trials) statement to reflect the unique aspects of reporting these livestock trials. A two-day consensus meeting was held on November 18-19, 2008 in Chicago, IL, United States of America, to achieve the objective. Prior to the meeting, a Web-based survey was conducted to identify issues for discussion. The 24 attendees were biostatisticians, epidemiologists, food-safety researchers, livestock-production specialists, journal editors, assistant editors, and associate editors. Prior to the meeting, the attendees completed a Web-based survey indicating which CONSORT statement items may need to be modified to address unique issues for livestock trials. The consensus meeting resulted in the production of the REFLECT (Reporting Guidelines For Randomized Control Trials) statement for livestock and food safety (LFS) and 22-item checklist. Fourteen items were modified from the CONSORT checklist, and an additional sub-item was proposed to address challenge trials. The REFLECT statement proposes new terminology, more consistent with common usage in livestock production, to describe study subjects. Evidence was not always available to support modification to or inclusion of an item. The use of the REFLECT statement, which addresses issues unique to livestock trials, should improve the quality of reporting and design for trials reporting production, health, and food-safety outcomes.
O'Connor, A M; Sargeant, J M; Gardner, I A; Dickson, J S; Torrence, M E; Dewey, C E; Dohoo, I R; Evans, R B; Gray, J T; Greiner, M; Keefe, G; Lefebvre, S L; Morley, P S; Ramirez, A; Sischo, W; Smith, D R; Snedeker, K; Sofos, J; Ward, M P; Wills, R
2010-03-01
The conduct of randomized controlled trials in livestock with production, health and food-safety outcomes presents unique challenges that may not be adequately reported in trial reports. The objective of this project was to modify the CONSORT (Consolidated Standards of Reporting Trials) statement to reflect the unique aspects of reporting these livestock trials. A 2-day consensus meeting was held on 18-19 November 2008 in Chicago, IL, USA, to achieve the objective. Prior to the meeting, a Web-based survey was conducted to identify issues for discussion. The 24 attendees were biostatisticians, epidemiologists, food-safety researchers, livestock-production specialists, journal editors, assistant editors and associate editors. Prior to the meeting, the attendees completed a Web-based survey indicating which CONSORT statement items may need to be modified to address unique issues for livestock trials. The consensus meeting resulted in the production of the REFLECT (Reporting Guidelines for Randomized Control Trials) statement for livestock and food safety and 22-item checklist. Fourteen items were modified from the CONSORT checklist and an additional sub-item was proposed to address challenge trials. The REFLECT statement proposes new terminology, more consistent with common usage in livestock production, to describe study subjects. Evidence was not always available to support modification to or inclusion of an item. The use of the REFLECT statement, which addresses issues unique to livestock trials, should improve the quality of reporting and design for trials reporting production, health and food-safety outcomes.
O'Connor, A M; Sargeant, J M; Gardner, I A; Dickson, J S; Torrence, M E; Dewey, C E; Dohoo, I R; Evans, R B; Gray, J T; Greiner, M; Keefe, G; Lefebvre, S L; Morley, P S; Ramirez, A; Sischo, W; Smith, D R; Snedeker, K; Sofos, J N; Ward, M P; Wills, R
2010-01-01
The conduct of randomized controlled trials in livestock with production, health, and food-safety outcomes presents unique challenges that may not be adequately reported in trial reports. The objective of this project was to modify the CONSORT (Consolidated Standards of Reporting Trials) statement to reflect the unique aspects of reporting these livestock trials. A two-day consensus meeting was held on November 18-19, 2008 in Chicago, Ill, United States of America, to achieve the objective. Prior to the meeting, a Web-based survey was conducted to identify issues for discussion. The 24 attendees were biostatisticians, epidemiologists, food-safety researchers, livestock production specialists, journal editors, assistant editors, and associate editors. Prior to the meeting, the attendees completed a Web-based survey indicating which CONSORT statement items may need to be modified to address unique issues for livestock trials. The consensus meeting resulted in the production of the REFLECT (Reporting Guidelines for Randomized Control Trials) statement for livestock and food safety (LFS) and 22-item checklist. Fourteen items were modified from the CONSORT checklist, and an additional sub-item was proposed to address challenge trials. The REFLECT statement proposes new terminology, more consistent with common usage in livestock production, to describe study subjects. Evidence was not always available to support modification to or inclusion of an item. The use of the REFLECT statement, which addresses issues unique to livestock trials, should improve the quality of reporting and design for trials reporting production, health, and food-safety outcomes.
The focus of attention is similar to other memory systems rather than uniquely different
Beaudry, Olivia; Neath, Ian; Surprenant, Aimée M.; Tehan, Gerald
2014-01-01
According to some current theories, the focus of attention (FOA), part of working memory, represents items in a privileged state that is more accessible than items stored in other memory systems. One line of evidence supporting the distinction between the FOA and other memory systems is the finding that items in the FOA are immune to proactive interference (when something learned earlier impairs the ability to remember something learned more recently). The FOA, then, is held to be unique: it is the only memory system that is not susceptible to proactive interference. We review the literature used to support this claim, and although there are many studies in which proactive interference was not observed, we found more studies in which it was observed. We conclude that the FOA is not immune to proactive interference: items in the FOA are susceptible to proactive interference just like items in every other memory system. And, just as in all other memory systems, it is how the items are represented and processed that plays a critical role in determining whether proactive interference will be observed. PMID:24574996
Testing and Selection of Fire-Resistant Materials for Spacecraft Use
NASA Technical Reports Server (NTRS)
Friedman, Robert; Jackson, Brian; Olson, Sandra
2000-01-01
Spacecraft fire-safety strategy emphasizes prevention, mostly through the selection of onboard items classified accord- ing to their fire resistance. The principal NASA acceptance tests described in this paper assess the flammability of materials and components under "worst-case" normal-gravity conditions of upward flame spread in controlled-oxygen atmospheres. Tests conducted on the ground, however, cannot duplicate the unique fire characteristics in the nonbuoyant low-gravity environment of orbiting spacecraft. Research shows that flammability an fire-spread rates in low gravity are sensitive to forced convection (ventilation flows) and atmospheric-oxygen concentration. These research results are helping to define new material-screening test methods that will better evaluate material performance in spacecraft.
Weidmer, Beverly A; Brach, Cindy; Hays, Ron D
2012-09-01
The complexity of health information often exceeds patients' skills to understand and use it. To develop survey items assessing how well healthcare providers communicate health information. Domains and items for the Consumer Assessment of Healthcare Providers and Systems (CAHPS) Item Set for Addressing Health Literacy were identified through an environmental scan and input from stakeholders. The draft item set was translated into Spanish and pretested in both English and Spanish. The revised item set was field tested with a randomly selected sample of adult patients from 2 sites using mail and telephonic data collection. Item-scale correlations, confirmatory factor analysis, and internal consistency reliability estimates were estimated to assess how well the survey items performed and identify composite measures. Finally, we regressed the CAHPS global rating of the provider item on the CAHPS core communication composite and the new health literacy composites. A total of 601 completed surveys were obtained (52% response rate). Two composite measures were identified: (1) Communication to Improve Health Literacy (16 items); and (2) How Well Providers Communicate About Medicines (6 items). These 2 composites were significantly uniquely associated with the global rating of the provider (communication to improve health literacy: P<0.001, b=0.28; and communication about medicines composite: P=0.02, b=0.04). The 2 composites and the CAHPS core communication composite accounted for 51% of the variance in the global rating of the provider. A 5-item subset of the Communication to Improve Health Literacy composite accounted for 90% of the variance of the original 16-item composite. This study provides support for reliability and validity of the CAHPS Item Set for Addressing Health Literacy. These items can serve to assess whether healthcare providers have communicated effectively with their patients and as a tool for quality improvement.
Karstoft, Karen-Inge; Andersen, Søren B; Nielsen, Anni B S
2017-06-01
Since 1998, soldiers deployed to war zones with the Danish Defense (≈31,000) have been invited to fill out a questionnaire on post-mission reactions. This provides a unique data source for studying the psychological toll of war. Here, we validate a measure of PTSD-symptoms from the questionnaire. Soldiers from two cohorts deployed to Afghanistan with the International Security Assistance Force (ISAF) in 2009 (ISAF7, N = 334) and 2013 (ISAF15, N = 278) filled out a standard questionnaire (Psychological Reactions following International Missions, PRIM) concerning a range of post-deployment reactions including symptoms of PTSD (PRIM-PTSD). They also filled out a validated measure of PTSD-symptoms in DSM-IV, the PTSD-checklist (PCL). We tested reliability of PRIM-PTSD by estimating Cronbach's alpha, and tested validity by correlating items, clusters, and overall scale with corresponding items in the PCL. Furthermore, we conducted two confirmatory factor analytic models to test the factor structure of PRIM-PTSD, and tested measurement invariance of the selected model. Finally, we established a screening and a clinical cutoff score by application of ROC analysis. We found high internal consistency of the PRIM-PTSD (Cronbach's alpha = 0.88; both cohorts), strong item-item (0.48-0.83), item-cluster (0.43-0.72), cluster-cluster (0.71-0.82) and full-scale (0.86-0.88) correlations between PRIM-PTSD and PCL. The factor analyses showed adequate fit of a one-factor model, which was also found to display strong measurement invariance across cohorts. ROC curve analysis established cutoff scores for screening (sensitivity = 1, specificity = 0.93) and clinical use (sensitivity = 0.71, specificity = 0.98). In conclusion, we find that PRIM-PTSD is a valid measure for assessing PTSD-symptoms in Danish soldiers following deployment. © 2017 Scandinavian Psychological Associations and John Wiley & Sons Ltd.
Are Various Forms of Locomotion-Speed Diverse or Unique Performance Quality?
Cavar, Mile; Corluka, Marin; Cerkez, Ivana; Culjak, Zoran; Sekulic, Damir
2013-01-01
The forward-sprint is considered to be, and is regularly performed as, a unique measure of “on-ground” linear-speed performance. Thus far, no investigation has simultaneously studied different forms of linear-speed or investigated whether different forms of linear-speed should be observed as unique performance quality. The purpose of this study was to determine (I) the achievements (i.e. execution time), and (II) the reliability and inter-relationships between various linear-speed performances. The participants were 42 male physical education students with substantial sport-specific backgrounds. We applied a total of six tests: three quadrupedal (supine backward, supine forward, and pronate backward locomotion) and three bipedal-performances (forward sprinting, backward sprinting, lateral shuffling). All of the tests showed appropriate reliability parameters (Cronbach Alpha ranged from 0.91 to 0.97; Inter-Item-R 0.78–0.92; Coefficient-of-Variation 1.3–9.1). The tests used in this study shared between 9% and 50% of the common variance. Our results suggest that different activities require activity-specific tests of linear-speed. This is particularly significant in those sports and activities in which quadrupedal locomotion patterns are highly important (wrestling, physically trained military services, law enforcement, fire and rescue, protective services). PMID:24235984
Are Attitudes Toward Writing and Reading Separable Constructs? A Study With Primary Grade Children
Graham, Steve; Berninger, Virginia; Abbott, Robert
2012-01-01
This study examined whether or not attitude towards writing is a unique and separable construct from attitude towards reading for young, beginning writers. Participants were 128 first-grade children (70 girls and 58 boys) and 113 third-grade students (57 girls and 56 boys). Each child was individually administered a 24 item attitude measure, which contained 12 items assessing attitude towards writing and 12 parallel items for reading. Students also wrote a narrative about a personal event in their life. A factor analysis of the 24 item attitude measure provided evidence that generally support the contention that writing and reading attitudes are separable constructs for young beginning writers, as it yielded three factors: a writing attitude factor with 9 items, a reading attitude factor with 9 parallel items, and an attitude about literacy interactions with others factor containing 4 items (2 items in writing and 2 parallel items in reading). Further validation that attitude towards writing is a separable construct from attitude towards reading was obtained at the third-grade level, where writing attitude made a unique and significant contribution, beyond the other two attitude measures, to the prediction of three measures of writing: quality, length, and longest correct word sequence. At the first-grade level, none of the 3 attitude measures predicted students’ writing performance. Finally, girls had more positive attitudes concerning reading and writing than boys. PMID:22736933
Experiments in materials science from household items
NASA Technical Reports Server (NTRS)
Spiegel, F. Xavier
1993-01-01
Everyday household items are used to demonstrate some unique properties of materials. A coat hanger, rubber band, balloon, and corn starch have typical properties which we often take for granted but can be truly amazing.
Phillips, Steven; Niki, Kazuhisa
2002-10-01
Working memory is affected by items stored and the relations between them. However, separating these factors has been difficult, because increased items usually accompany increased associations/relations. Hence, some have argued, relational effects are reducible to item effects. We overcome this problem by manipulating index length: the fewest number of item positions at which there is a unique item, or tuple of items (if length >1), for every instance in the relational (memory) set. Longer indexes imply greater similarity (number of shared items) between instances and higher load on encoding processes. Subjects were given lists of study pairs and asked to make a recognition judgement. The number of unique items and index length in the three list conditions were: (1) AB, CD: four/one; (2) AB, CD, EF: six/one; and (3) AB, AD, CB: four/two, respectively. Japanese letters were used in Experiments 1 (kanji-ideograms) and 2 (hiragana-phonograms); numbers in Experiment 3; and shapes generated from Fourier descriptors in Experiment 4. Across all materials, right dominant temporoparietal and middle frontal gyral activity was found with increased index length, but not items during study. In Experiment 5, a longer delay was used to isolate retention effects in the absence of visual stimuli. Increased left hemispheric activity was observed in the precuneus, middle frontal gyrus, and superior temporal gyrus with increased index length for the delay period. These results show that relational load is not reducible to item load.
Jørgensen, L; Garne, J P; Søgaard, M; Laursen, B S
2015-04-01
Women with breast cancer often experience significant distress. Currently, there are no questionnaires aimed at identifying women's unique and possible changing indicators for distress in surgical continuity of care for breast cancer. We developed and tested three questionnaires specifically for this use. We first searched PubMed, CINAHL and PsycINFO to retrieve information on previously described indicators. Next, we conducted a focus group interview with 6 specialised nurses, who have extensive experience about consequences of breast cancer for women in surgical continuity of care. The questionnaire was tested on 18 women scheduled for breast cancer surgery. Subsequently, the women were debriefed to gain knowledge about comprehensibility, readability and relevance of items, and the time needed to complete the questionnaire. After adjustment, the questionnaires were field-tested concomitantly with a clinical study, which both consisted of a survey and an interview study. Three multi-item questionnaires were developed specific to different time points in surgical continuity of care. The questionnaires share a core of statements divided into seven sub-scales: emotional and physical situation, social condition, sexuality, body image, religion and organisational factors. Besides the core of statements, each questionnaire has different statements depending on the time point of surgical continuity of care when it was to be responded to. The questionnaires contain comprehensive items that can identify indicators for distress in individual women taking part in surgical continuity of care. The items were understandable and the time used for filling in the questionnaires was reasonable. Copyright © 2014 Elsevier Ltd. All rights reserved.
History & implementation of Item Unique Identification (IUID) - Has it Improved Asset Visibility?
2012-03-27
figure 3) as of November 2011 on their progress in implementing IUID38: 10 Objectives • Policy Updates • Systems Updates (AIS and ERP ...Implement SAP Enhancement Pack – Nov 2013 • Design & build LMP IUID solution – Aug 2014 • Integrate & test with Trading Partners, Army IUID...issues Objectives • Policy Updates • Systems Updates (AIS and ERP ) • Contract Compliance Rate • Physical Marking • Use of IUID Registry IUID Scorecard
Knowledge of the ordinal position of list items in pigeons.
Scarf, Damian; Colombo, Michael
2011-10-01
Ordinal knowledge is a fundamental aspect of advanced cognition. It is self-evident that humans represent ordinal knowledge, and over the past 20 years it has become clear that nonhuman primates share this ability. In contrast, evidence that nonprimate species represent ordinal knowledge is missing from the comparative literature. To address this issue, in the present experiment we trained pigeons on three 4-item lists and then tested them with derived lists in which, relative to the training lists, the ordinal position of the items was either maintained or changed. Similar to the findings with human and nonhuman primates, our pigeons performed markedly better on the maintained lists compared to the changed lists, and displayed errors consistent with the view that they used their knowledge of ordinal position to guide responding on the derived lists. These findings demonstrate that the ability to acquire ordinal knowledge is not unique to the primate lineage. (PsycINFO Database Record (c) 2011 APA, all rights reserved).
Optimising mobility outcome measures in Huntington's disease.
Busse, Monica; Quinn, Lori; Khalil, Hanan; McEwan, Kirsten
2014-01-01
Many of the performance-based mobility measures that are currently used in Huntington's disease (HD) were developed for assessment in other neurological conditions such as stroke. We aimed to assess the individual item-response of commonly used performance-based mobility measures, with a view to optimizing the scales for specific application in Huntington's Disease (HD). Data from a larger multicentre, observational study were used. Seventy-five people with HD (11 pre-manifest & 64 manifest) were assessed on the Six-Minute Walk Test, 10-Meter Walk Test, Timed "Up & Go" Test (TUG), Berg Balance Scale (BBS), Physical Performance Test (PPT), Four Square Step Test, and Tinetti Mobility Test (TMT). The Unified Huntington's Disease Rating Scale (UHDRS) Total Motor Score, Functional Assessment Scale and Total Functional Capacity scores were recorded, alongside cognitive measures. Standard regression analysis was used to assess predictive validity. Individual item responses were investigated using a sequence of approaches to allow for gradual removal of items and the subsequent creation of shortened versions. Psychometric properties (reliability and discriminant ability) of the shortened scales were assessed. TUG (β 0.46, CI 0.20-3.47), BBS (β -0.35, CI -2.10-0.14), and TMT (β -0.45, CI -3.14-0.64) were good disease-specific mobility measures. PPT was the best measure of functional performance (β 0.42, CI 0.00-0.43 for TFC & β 0.57 CI 0.15-0.81 for FAS). Shortened versions of BBS and TMT were developed based on item analysis. The resultant BBS and TMT shortened scales were reliable for use in manifest HD. ROC analysis showed that shortened scales were able to discriminate between manifest and pre-manifest disease states. Our data suggests that the PPT is appropriate as a general measure of function in individuals with HD, and we have identified shortened versions of the BBS and TMT that measure the unique gait and balance impairments in HD. These scales, alongside the TUG, may therefore be important measures to consider in future clinical trials.
Evaluation of the reliability and validity for X16 balance testing scale for the elderly.
Ju, Jingjuan; Jiang, Yu; Zhou, Peng; Li, Lin; Ye, Xiaolei; Wu, Hongmei; Shen, Bin; Zhang, Jialei; He, Xiaoding; Niu, Chunjin; Xia, Qinghua
2018-05-10
Balance performance is considered as an indicator of functional status in the elderly, a large scale population screening and evaluation in the community context followed by proper interventions would be of great significance at public health level. However, there has been no suitable balance testing scale available for large scale studies in the unique community context of urban China. A balance scale named X16 balance testing scale was developed, which was composed of 3 domains and 16 items. A total of 1985 functionally independent and active community-dwelling elderly adults' balance abilities were tested using the X16 scale. The internal consistency, split-half reliability, content validity, construct validity, discriminant validity of X16 balance testing scale were evaluated. Factor analysis was performed to identify alternative factor structure. The Eigenvalues of factors 1, 2, and 3 were 8.53, 1.79, and 1.21, respectively, and their cumulative contribution to the total variance reached 72.0%. These 3 factors mainly represented domains static balance, postural stability, and dynamic balance. The Cronbach alpha coefficient for the scale was 0.933. The Spearman correlation coefficients between items and its corresponding domains were ranged from 0.538 to 0.964. The correlation coefficients between each item and its corresponding domain were higher than the coefficients between this item and other domains. With the increase of age, the scores of balance performance, domains static balance, postural stability, and dynamic balance in the elderly declined gradually (P < 0.001). With the increase of age, the proportion of the elderly with intact balance performance decreased gradually (P < 0.001). The reliability and validity of the X16 balance testing scale is both adequate and acceptable. Due to its simple and quick use features, it is practical to be used repeatedly and routinely especially in community setting and on large scale screening.
Convergent and Discriminant Validity of the Five Factor Form and the Sliderbar Inventory.
Rojas, Stephanie L; Widiger, Thomas A
2018-03-01
Existing measures of the five factor model (FFM) of personality are generally, if not exclusively, unipolar in their assessment of maladaptive variants of the FFM domains. However, two recently developed measures, the Five Factor Form (FFF) and the Sliderbar Inventory (SI), include items that assess for maladaptive variants at both poles of each item. This structure is unique among existing measures of personality and personality disorder, although there is a historical, infrequently used Stone Personality Trait Schema (SPTS) that had also included this item structure. To facilitate an exploration of their convergent and discriminant validity, the SI and SPTS items were reorganized into FFM scales. The convergent and discriminant validity of the FFF, SI-FFM, and SPTS-FFM scales was considered in a sample of 450 adults with current or a history of mental health treatment. The FFF, SI-FFM, and SPTS-FFM were also compared with respect to their relationship with FFM domains. Finally, the FFF items and SI-FFM scales were tested with respect to their relationship with measures of maladaptive variants of both high and low agreeableness and conscientiousness. The implications of the results are discussed with respect to the assessment of maladaptive personality functioning, and suggestions for future research are provided.
Measuring pain in the context of homelessness
Matter, Rebecca; Kline, Susan; Cook, Karon F.; Amtmann, Dagmar
2009-01-01
Purpose The primary objective of this study was to inform the development of measures of pain impact appropriate for all respondents, including homeless individuals, so that they can be used in clinical research and practice. The secondary objective was to increase understanding about the unique experience of homeless people with pain. Methods Seventeen homeless individuals with chronic health conditions (often associated with pain) participated in cognitive interviews to test the functioning of 56 pain measurement items and provided information about their experience living with and accessing treatment for pain. Results The most common problems identified with items were that they lacked clarity or were irrelevant in the context of homelessness. Items that were unclear, irrelevant and/or had other identified problems made it difficult for participants to respond. Participants also described multiple ways in which their pain was exacerbated by conditions of homelessness and identified barriers to accessing appropriate treatment. Conclusions Results suggested that the majority of items were problematic for the homeless and require substantial modifications to make the pain impact bank relevant to this population. Additional recommendations include involving homeless in future item bank development, conducting research on the topic of pain and homelessness, and using cognitive interviewing in other types of health disparities research. PMID:19582592
A conflict management scale for pharmacy.
Austin, Zubin; Gregory, Paul A; Martin, Craig
2009-11-12
To develop and establish the validity and reliability of a conflict management scale specific to pharmacy practice and education. A multistage inventory-item development process was undertaken involving 93 pharmacists and using a previously described explanatory model for conflict in pharmacy practice. A 19-item inventory was developed, field tested, and validated. The conflict management scale (CMS) demonstrated an acceptable degree of reliability and validity for use in educational or practice settings to promote self-reflection and self-awareness regarding individuals' conflict management styles. The CMS provides a unique, pharmacy-specific method for individuals to determine and reflect upon their own conflict management styles. As part of an educational program to facilitate self-reflection and heighten self-awareness, the CMS may be a useful tool to promote discussions related to an important part of pharmacy practice.
Crows spontaneously exhibit analogical reasoning.
Smirnova, Anna; Zorina, Zoya; Obozova, Tanya; Wasserman, Edward
2015-01-19
Analogical reasoning is vital to advanced cognition and behavioral adaptation. Many theorists deem analogical thinking to be uniquely human and to be foundational to categorization, creative problem solving, and scientific discovery. Comparative psychologists have long been interested in the species generality of analogical reasoning, but they initially found it difficult to obtain empirical support for such thinking in nonhuman animals (for pioneering efforts, see [2, 3]). Researchers have since mustered considerable evidence and argument that relational matching-to-sample (RMTS) effectively captures the essence of analogy, in which the relevant logical arguments are presented visually. In RMTS, choice of test pair BB would be correct if the sample pair were AA, whereas choice of test pair EF would be correct if the sample pair were CD. Critically, no items in the correct test pair physically match items in the sample pair, thus demanding that only relational sameness or differentness is available to support accurate choice responding. Initial evidence suggested that only humans and apes can successfully learn RMTS with pairs of sample and test items; however, monkeys have subsequently done so. Here, we report that crows too exhibit relational matching behavior. Even more importantly, crows spontaneously display relational responding without ever having been trained on RMTS; they had only been trained on identity matching-to-sample (IMTS). Such robust and uninstructed relational matching behavior represents the most convincing evidence yet of analogical reasoning in a nonprimate species, as apes alone have spontaneously exhibited RMTS behavior after only IMTS training. Copyright © 2015 Elsevier Ltd. All rights reserved.
Tagging insulin in microgravity
NASA Technical Reports Server (NTRS)
Dobeck, Michael; Nelson, Ronald S.
1992-01-01
Knowing the exact subcellular sites of action of insulin in the body has the potential to give basic science investigators a basis from which a cause and cure for this disease can be approached. The goal of this project is to create a test reagent that can be used to visualize these subcellular sites. The unique microgravity environment of the Shuttle will allow the creation of a reagent that has the possibility of elucidating the subcellular sites of action of insulin. Several techniques have been used in an attempt to isolate the sites of action of items such as insulin. One of these is autoradiography in which the test item is obtained from animals fed radioactive materials. What is clearly needed is to visualize individual insulin molecules at their sites of action. The insulin tagging process to be used on G-399 involves the conjugation of insulin molecules with ferritin molecules to create a reagent that will be used back on Earth in an attempt to elucidate the sites of action of insulin.
Evaluation of the automatic optical authentication technologies for control systems of objects
NASA Astrophysics Data System (ADS)
Averkin, Vladimir V.; Volegov, Peter L.; Podgornov, Vladimir A.
2000-03-01
The report considers the evaluation of the automatic optical authentication technologies for the automated integrated system of physical protection, control and accounting of nuclear materials at RFNC-VNIITF, and for providing of the nuclear materials nonproliferation regime. The report presents the nuclear object authentication objectives and strategies, the methodology of the automatic optical authentication and results of the development of pattern recognition techniques carried out under the ISTC project #772 with the purpose of identification of unique features of surface structure of a controlled object and effects of its random treatment. The current decision of following functional control tasks is described in the report: confirmation of the item authenticity (proof of the absence of its substitution by an item of similar shape), control over unforeseen change of item state, control over unauthorized access to the item. The most important distinctive feature of all techniques is not comprehensive description of some properties of controlled item, but unique identification of item using minimum necessary set of parameters, properly comprising identification attribute of the item. The main emphasis in the technical approach is made on the development of rather simple technological methods for the first time intended for use in the systems of physical protection, control and accounting of nuclear materials. The developed authentication devices and system are described.
O'Connor, A M; Sargeant, J M; Gardner, I A; Dickson, J S; Torrence, M E; Dewey, C E; Dohoo, I R; Evans, R B; Gray, J T; Greiner, M; Keefe, G; Lefebvre, S L; Morley, P S; Ramirez, A; Sischo, W; Smith, D R; Snedeker, K; Sofos, J; Ward, M P; Wills, R
2010-01-01
The conduct of randomized controlled trials in livestock with production, health, and food-safety outcomes presents unique challenges that might not be adequately reported in trial reports. The objective of this project was to modify the CONSORT (Consolidated Standards of Reporting Trials) statement to reflect the unique aspects of reporting these livestock trials. A 2-day consensus meeting was held on November 18-19, 2008 in Chicago, IL, to achieve the objective. Before the meeting, a Web-based survey was conducted to identify issues for discussion. The 24 attendees were biostatisticians, epidemiologists, food-safety researchers, livestock production specialists, journal editors, assistant editors, and associate editors. Before the meeting, the attendees completed a Web-based survey indicating which CONSORT statement items would need to be modified to address unique issues for livestock trials. The consensus meeting resulted in the production of the REFLECT (Reporting Guidelines for Randomized Control Trials) statement for livestock and food safety and 22-item checklist. Fourteen items were modified from the CONSORT checklist, and an additional subitem was proposed to address challenge trials. The REFLECT statement proposes new terminology, more consistent with common usage in livestock production, to describe study subjects. Evidence was not always available to support modification to or inclusion of an item. The use of the REFLECT statement, which addresses issues unique to livestock trials, should improve the quality of reporting and design for trials reporting production, health, and food-safety outcomes.
Hellemann, G S; Green, M F; Kern, R S; Sitarenios, G; Nuechterlein, K H
2017-10-01
Measures of social cognition are increasingly being applied to psychopathology, including studies of schizophrenia and other psychotic disorders. Tests of social cognition present unique challenges for international adaptations. The Mayer-Salovey-Caruso Emotional Intelligence Test, Managing Emotions Branch (MSCEIT-ME) is a commonly-used social cognition test that involves the evaluation of social scenarios presented in vignettes. This paper presents evaluations of translations of this test in six different languages based on representative samples from the relevant countries. The goal was to identify items from the MSCEIT-ME that show different response patterns across countries using indices of discrepancy and content validity criteria. An international version of the MSCEIT-ME scoring was developed that excludes items that showed undesirable properties across countries. We then confirmed that this new version had better performance (i.e. less discrepancy across regions) in international samples than the version based on the original norms. Additionally, it provides scores that are comparable to ratings based on local norms. This paper shows that it is possible to adapt complex social cognitive tasks so they can provide valid data across different cultural contexts.
Development and validation of the Body and Appearance Self-Conscious Emotions Scale (BASES).
Castonguay, Andrée L; Sabiston, Catherine M; Crocker, Peter R E; Mack, Diane E
2014-03-01
The purpose of these studies was to develop a psychometrically sound measure of shame, guilt, authentic pride, and hubristic pride for use in body and appearance contexts. In Study 1, 41 potential items were developed and assessed for item quality and comprehension. In Study 2, a panel of experts (N=8; M=11, SD=6.5 years of experience) reviewed the scale and items for evidence of content validity. Participants in Study 3 (n=135 males, n=300 females) completed the BASES and various body image, personality, and emotion scales. A separate sample (n=155; 35.5% male) in Study 3 completed the BASES twice using a two-week time interval. The BASES subscale scores demonstrated evidence for internal consistency, item-total correlations, concurrent, convergent, incremental, and discriminant validity, and 2-week test-retest reliability. The 4-factor solution was a good fit in confirmatory factor analysis, reflecting body-related shame, guilt, authentic and hubristic pride subscales of the BASES. The development and validation of the BASES may help advance body image and self-conscious emotion research by providing a foundation to examine the unique antecedents and outcomes of these specific emotional experiences. Copyright © 2014 Elsevier Ltd. All rights reserved.
Ragland, J. Daniel; Ranganath, Charan; Harms, Michael P.; Barch, Deanna M.; Gold, James M.; Layher, Evan; Lesh, Tyler A.; MacDonald, Angus W.; Niendam, Tara A.; Phillips, Joshua; Silverstein, Steven M.; Yonelinas, Andrew P.; Carter, Cameron S.
2015-01-01
Importance Individuals with schizophrenia (SZ) can encode item-specific information to support familiarity-based recognition, but are disproportionately impaired encoding inter-item relationships (relational encoding) and recollecting information. The Relational and Item-Specific Encoding (RiSE) paradigm has been used to disentangle these encoding and retrieval processes, which may be dependent on specific medial temporal lobe (MTL) and prefrontal cortex (PFC) subregions. Functional imaging during RiSE task performance could help to specify dysfunctional neural circuits in SZ that can be targeted for interventions to improve memory and functioning in the illness. Objectives To use functional magnetic resonance imaging (fMRI) to test the hypothesis that SZ disproportionately affects MTL and PFC subregions during relational encoding and retrieval, relative to item-specific memory processes. Imaging results from healthy comparison subjects (HC) will also be used to establish neural construct validity for RiSE. Design, Setting, and Participants This multi-site, case-control, cross-sectional fMRI study was conducted at five CNTRACS sites. The final sample included 52 clinically stable outpatients with SZ, and 57 demographically matched HC. Main Outcomes and Measures Behavioral performance speed and accuracy (d’) on item recognition and associative recognition tasks. Voxelwise statistical parametric maps for a priori MTL and PFC regions of interest (ROI), testing activation differences between relational and item-specific memory during encoding and retrieval. Results Item recognition was disproportionately impaired in SZ patients relative to controls following relational encoding. The differential deficit was accompanied by reduced dorsolateral prefrontal cortex (DLPFC) activation during relational encoding in SZ, relative to HC. Retrieval success (hits > misses) was associated with hippocampal (HI) activation in HC during relational item recognition and associative recognition conditions, and HI activation was specifically reduced in SZ for recognition of relational but not item-specific information. Conclusions In this unique, multi-site fMRI study, HC results supported RiSE construct validity by revealing expected memory effects in PFC and MTL subregions during encoding and retrieval. Comparison of SZ and HC revealed disproportionate memory deficits in SZ for relational versus item-specific information, accompanied by regionally and functionally specific deficits in DLPFC and HI activation. PMID:26200928
Code of Federal Regulations, 2011 CFR
2011-10-01
... 48 Federal Acquisition Regulations System 1 2011-10-01 2011-10-01 false Termination. 12.403 Section 12.403 Federal Acquisition Regulations System FEDERAL ACQUISITION REGULATION ACQUISITION PLANNING ACQUISITION OF COMMERCIAL ITEMS Unique Requirements Regarding Terms and Conditions for Commercial Items 12.403...
Code of Federal Regulations, 2010 CFR
2010-10-01
... means an item potentially dangerous to public safety or security if stolen, lost, or misplaced, or that shall be subject to exceptional physical security, protection, control, and accountability. Examples...
ERIC Educational Resources Information Center
Haberman, Shelby J.; von Davier, Matthias; Lee, Yi-Hsuan
2008-01-01
Multidimensional item response models can be based on multivariate normal ability distributions or on multivariate polytomous ability distributions. For the case of simple structure in which each item corresponds to a unique dimension of the ability vector, some applications of the two-parameter logistic model to empirical data are employed to…
Selecting Items for Criterion-Referenced Tests.
ERIC Educational Resources Information Center
Mellenbergh, Gideon J.; van der Linden, Wim J.
1982-01-01
Three item selection methods for criterion-referenced tests are examined: the classical theory of item difficulty and item-test correlation; the latent trait theory of item characteristic curves; and a decision-theoretic approach for optimal item selection. Item contribution to the standardized expected utility of mastery testing is discussed. (CM)
Asymmetric effects of emotion on mnemonic interference
Leal, Stephanie L.; Tighe, Sarah K.; Yassa, Michael A.
2014-01-01
Emotional experiences can strengthen memories so that they can be used to guide future behavior. Emotional arousal, mediated by the amygdala, is thought to modulate storage by the hippocampus, which may encode unique episodic memories via pattern separation – the process by which similar memories are stored using non-overlapping representations. While prior work has examined mnemonic interference due to similarity and emotional modulation of memory independently, examining the mechanisms by which emotion influences mnemonic interference has not been previously accomplished in humans. To this end, we developed an emotional memory task where emotional content and stimulus similarity were varied to examine the effect of emotion on fine mnemonic discrimination (a putative behavioral correlate of hippocampal pattern separation). When tested immediately after encoding, discrimination was reduced for similar emotional items compared to similar neutral items, consistent with a reduced bias towards pattern separation. After 24 h, recognition of emotional target items was preserved compared to neutral items, whereas similar emotional item discrimination was further diminished. This suggests a potential mechanism for the emotional modulation of memory with a selective remembering of gist, as well as a selective forgetting of detail, indicating an emotion-induced reduction in pattern separation. This can potentially increase the effective signal-to-noise ratio in any given situation to promote survival. Furthermore, we found that individuals with depressive symptoms hyper-discriminate negative items, which correlated with their symptom severity. This suggests that utilizing mnemonic discrimination paradigms allows us to tease apart the nuances of disorders with aberrant emotional mnemonic processing. PMID:24607286
A Conflict Management Scale for Pharmacy
Gregory, Paul A.; Martin, Craig
2009-01-01
Objectives To develop and establish the validity and reliability of a conflict management scale specific to pharmacy practice and education. Methods A multistage inventory-item development process was undertaken involving 93 pharmacists and using a previously described explanatory model for conflict in pharmacy practice. A 19-item inventory was developed, field tested, and validated. Results The conflict management scale (CMS) demonstrated an acceptable degree of reliability and validity for use in educational or practice settings to promote self-reflection and self-awareness regarding individuals' conflict management styles. Conclusions The CMS provides a unique, pharmacy-specific method for individuals to determine and reflect upon their own conflict management styles. As part of an educational program to facilitate self-reflection and heighten self-awareness, the CMS may be a useful tool to promote discussions related to an important part of pharmacy practice. PMID:19960081
48 CFR 12.000 - Scope of part.
Code of Federal Regulations, 2010 CFR
2010-10-01
... (Public Law 103-355) by establishing acquisition policies more closely resembling those of the commercial... ACQUISITION OF COMMERCIAL ITEMS 12.000 Scope of part. This part prescribes policies and procedures unique to the acquisition of commercial items. It implements the Federal Government's preference for the...
Heinemann, Allen W; Miskovic, Ana; Semik, Patrick; Wong, Alex; Dashner, Jessica; Baum, Carolyn; Magasi, Susan; Hammel, Joy; Tulsky, David S; Garcia, Sofia F; Jerousek, Sara; Lai, Jin-Shei; Carlozzi, Noelle E; Gray, David B
2016-12-01
To describe the unique and overlapping content of the newly developed Environmental Factors Item Banks (EFIB) and 7 legacy environmental factor instruments, and to evaluate the EFIB's construct validity by examining associations with legacy instruments. Cross-sectional, observational cohort. Community. A sample of community-dwelling adults with stroke, spinal cord injury, and traumatic brain injury (N=568). None. EFIB covering domains of the built and natural environment; systems, services, and policies; social environment; and access to information and technology; the Craig Hospital Inventory of Environmental Factors (CHIEF) short form; the Facilitators and Barriers Survey/Mobility (FABS/M) short form; the Home and Community Environment Instrument (HACE); the Measure of the Quality of the Environment (MQE) short form; and 3 of the Patient Reported Outcomes Measurement Information System's (PROMIS) Quality of Social Support measures. The EFIB and legacy instruments assess most of the International Classification of Functioning, Disability and Health (ICF) environmental factors chapters, including chapter 1 (products and technology; 75 items corresponding to 11 codes), chapter 2 (natural environment and human-made changes; 31 items corresponding to 7 codes), chapter 3 (support and relationships; 74 items corresponding to 7 codes), chapter 4 (attitudes; 83 items corresponding to 8 codes), and chapter 5 (services, systems, and policies; 72 items corresponding to 16 codes). Construct validity is provided by moderate correlations between EFIB measures and the CHIEF, MQE barriers, HACE technology mobility, FABS/M community built features, and PROMIS item banks and by small correlations with other legacy instruments. Only 5 of the 66 legacy instrument correlation coefficients are moderate, suggesting they measure unique aspects of the environment, whereas all intra-EFIB correlations were at least moderate. The EFIB measures provide a brief and focused assessment of ICF environmental factor chapters. The pattern of correlations with legacy instruments provides initial evidence of construct validity. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
The Modified Abbreviated Math Anxiety Scale: A Valid and Reliable Instrument for Use with Children.
Carey, Emma; Hill, Francesca; Devine, Amy; Szűcs, Dénes
2017-01-01
Mathematics anxiety (MA) can be observed in children from primary school age into the teenage years and adulthood, but many MA rating scales are only suitable for use with adults or older adolescents. We have adapted one such rating scale, the Abbreviated Math Anxiety Scale (AMAS), to be used with British children aged 8-13. In this study, we assess the scale's reliability, factor structure, and divergent validity. The modified AMAS (mAMAS) was administered to a very large ( n = 1746) cohort of British children and adolescents. This large sample size meant that as well as conducting confirmatory factor analysis on the scale itself, we were also able to split the sample to conduct exploratory and confirmatory factor analysis of items from the mAMAS alongside items from child test anxiety and general anxiety rating scales. Factor analysis of the mAMAS confirmed that it has the same underlying factor structure as the original AMAS, with subscales measuring anxiety about Learning and Evaluation in math. Furthermore, both exploratory and confirmatory factor analysis of the mAMAS alongside scales measuring test anxiety and general anxiety showed that mAMAS items cluster onto one factor (perceived to represent MA). The mAMAS provides a valid and reliable scale for measuring MA in children and adolescents, from a younger age than is possible with the original AMAS. Results from this study also suggest that MA is truly a unique construct, separate from both test anxiety and general anxiety, even in childhood.
Christensen, Stacy
2014-01-01
An experimental study was conducted using a 2-group randomized control pretest/ posttest design to determine if knowledge about Pap testing could be increased through use of a nurse-designed mobile smartphone app developed to educate individuals about the Pap test. A 14-item pretest survey of knowledge about Pap tests was distributed to women attending a university in New England. Participants in the intervention group were provided with an Android device on which a digital health education application on Pap testing had been downloaded. The control group was given a standard pamphlet on Pap testing., Paired t test results demonstrated that knowledge scores on the posttest increased significantly in both groups, but were significantly higher in the intervention group. User satisfaction with the app was high. The results of this study may enhance nursing care by informing nurses about a unique way of learning about Pap testing to recommend to patients.
Estimating the Nominal Response Model under Nonnormal Conditions
ERIC Educational Resources Information Center
Preston, Kathleen Suzanne Johnson; Reise, Steven Paul
2014-01-01
The nominal response model (NRM), a much understudied polytomous item response theory (IRT) model, provides researchers the unique opportunity to evaluate within-item category distinctions. Polytomous IRT models, such as the NRM, are frequently applied to psychological assessments representing constructs that are unlikely to be normally…
Contextual Variability in Free Recall
ERIC Educational Resources Information Center
Lohnas, Lynn J.; Polyn, Sean M.; Kahana, Michael J.
2011-01-01
According to contextual-variability theory, experiences encoded at different times tend to be associated with different contextual states. The gradual evolution of context implies that spaced items will be associated with more distinct contextual states, and thus have more unique retrieval cues, than items presented in proximity. Ross and Landauer…
ERIC Educational Resources Information Center
Burns, Daniel J.; Martens, Nicholas J.; Bertoni, Alicia A.; Sweeney, Emily J.; Lividini, Michelle D.
2006-01-01
In a repeated testing paradigm, list items receiving item-specific processing are more likely to be recovered across successive tests (item gains), whereas items receiving relational processing are likely to be forgotten progressively less on successive tests. Moreover, analysis of cumulative-recall curves has shown that item-specific processing…
ERIC Educational Resources Information Center
Matlock, Ki Lynn; Turner, Ronna
2016-01-01
When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…
The multilingual naming test in Alzheimer's disease: clues to the origin of naming impairments.
Ivanova, Iva; Salmon, David P; Gollan, Tamar H
2013-03-01
The current study explored the picture naming performance of patients with Alzheimer’s disease (AD). First, we evaluated the utility of the multilingual naming test (MINT; Gollan et al., 2011), which was designed to assess naming skills in speakers of multiple languages, for detecting naming impairments in monolingual AD and amnestic mild cognitive impairment (MCI). If the MINT were sensitive to linguistic impairment in AD, using it in clinical practice might have advantages over using tests exclusively designed for English monolinguals. We found that the MINT can be used with both monolinguals and bilinguals: A 32-item subset of the MINT is best for distinguishing monolingual patients from controls, while the full MINT is best for assessing degree of bilingualism and language dominance in bilinguals. We then investigated the cognitive mechanisms underlying naming impairment in AD. To this end, we explored which MINT item characteristics best predicted performance differences between monolingual patients and controls. We found that contextual diversity and imageability, but not word frequency (nor words’ number of senses), contributed unique variance to explaining naming impairments in AD. These findings suggest a semantic component to the naming impairment in AD (modulated by names’ semantic richness and network size).
NASA Astrophysics Data System (ADS)
Clary, Renee M.; Wandersee, James H.
2010-01-01
Archive-based, historical research of materials produced during the Golden Age of Geology (1788-1840) uncovered scientific caricatures (SCs) which may serve as a unique form of knowledge representation for students today. SCs played important roles in the past, stimulating critical inquiry among early geologists and fueling debates that addressed key theoretical issues. When historical SCs were utilized in a large-enrollment college Earth History course, student response was positive. Therefore, we offered SCs as an optional assessment tool. Paired t-tests that compared individual students’ performances with the SC option, as well as without the SC option, showed a significant positive difference favoring scientific caricatures ( α = 0.05). Content analysis of anonymous student survey responses revealed three consistent findings: (a) students enjoyed expressing science content correctly but creatively through SCs, (b) development of SCs required deeper knowledge integration and understanding of the content than conventional test items, and (c) students appreciated having SC item options on their examinations, whether or not they took advantage of them. We think that incorporation of SCs during assessment may effectively expand the variety of methods for probing understanding, thereby increasing the mode validity of current geoscience tests.
Development and psychometric testing of the Supportive Supervisory Scale.
McGilton, Katherine S
2010-06-01
To describe the development and psychometric testing of the Supportive Supervisory Scale (SSS). The development of the items of the scale was based on Winnicott's relationship theory and on focus groups with 26 healthcare aides (HCAs) and 30 supervisors from six long-term care (LTC) facilities in Ontario, Canada. Content validity of the 15-item instrument was established by a panel of experts. Based on a secondary analysis of data collected from 222 HCAs in 10 LTC facilities in Ontario, Canada, the SSS was subjected to principal components analysis with oblique rotation. A two-factor solution was accepted, which is consistent with the theoretical conceptualization of the instrument. Factor I was labeled Respects Uniqueness and Factor II was labeled Being Reliable. Internal consistency of Factor I was .95, and that of Factor II was .91. Discriminant validity was also established. The focus groups revealed that "being available to staff" while "recognizing the HCA as an individual, and taking a moment to get to know them" was essential to feeling supported by their supervisor. The SSS is a reliable and valid measure of supervisory support of supervisors working in LTC facilities. At the core of supportive supervision is the supervisor's ability to develop and maintain positive relationships with each HCA. It is through respecting the uniqueness of each HCA and being reliable that supervisor-HCA relationships can flourish. Supportive leadership in LTC settings is a major contributor to HCAs' job satisfaction and retention and to quality of patient care. Therefore, a tool developed and tested to measure supervisors' supportive capacities in LTC is primal to evaluate the effectiveness of supervisors in these environments.
ERIC Educational Resources Information Center
Spaan, Mary
2007-01-01
This article follows the development of test items (see "Language Assessment Quarterly", Volume 3 Issue 1, pp. 71-79 for the article "Test and Item Specifications Development"), beginning with a review of test and item specifications, then proceeding to writing and editing of items, pretesting and analysis, and finally selection of an item for a…
ERIC Educational Resources Information Center
Hewitt, Margaret A.; Homan, Susan P.
2004-01-01
Test validity issues considered by test developers and school districts rarely include individual item readability levels. In this study, items from a major standardized test were examined for individual item readability level and item difficulty. The Homan-Hewitt Readability Formula was applied to items across three grade levels. Results of…
Examining Teacher Grades Using Rasch Measurement Theory
ERIC Educational Resources Information Center
Randall, Jennifer; Engelhard, George, Jr.
2009-01-01
In this study, we present an approach to questionnaire design within educational research based on Guttman's mapping sentences and Many-Facet Rasch Measurement Theory. We designed a 54-item questionnaire using Guttman's mapping sentences to examine the grading practices of teachers. Each item in the questionnaire represented a unique student…
Decoding the content of recollection within the core recollection network and beyond.
Thakral, Preston P; Wang, Tracy H; Rugg, Michael D
2017-06-01
Recollection - retrieval of qualitative information about a past event - is associated with enhanced neural activity in a consistent set of neural regions (the 'core recollection network') seemingly regardless of the nature of the recollected content. Here, we employed multi-voxel pattern analysis (MVPA) to assess whether retrieval-related functional magnetic resonance imaging (fMRI) activity in core recollection regions - including the hippocampus, angular gyrus, medial prefrontal cortex, retrosplenial/posterior cingulate cortex, and middle temporal gyrus - contain information about studied content and thus demonstrate retrieval-related 'reinstatement' effects. During study, participants viewed objects and concrete words that were subjected to different encoding tasks. Test items included studied words, the names of studied objects, or unstudied words. Participants judged whether the items were recollected, familiar, or new by making 'remember', 'know', and 'new' responses, respectively. The study history of remembered test items could be reliably decoded using MVPA in most regions, as well as from the dorsolateral prefrontal cortex, a region where univariate recollection effects could not be detected. The findings add to evidence that members of the core recollection network, as well as at least one neural region where mean signal is insensitive to recollection success, carry information about recollected content. Importantly, the study history of recognized items endorsed with a 'know' response could be decoded with equal accuracy. The results thus demonstrate a striking dissociation between mean signal and multi-voxel indices of recollection. Moreover, they converge with prior findings in suggesting that, as it is operationalized by classification-based MVPA, reinstatement is not uniquely a signature of recollection. Copyright © 2016 Elsevier Ltd. All rights reserved.
The Effect of the Position of an Item within a Test on the Item Difficulty Value.
ERIC Educational Resources Information Center
Rubin, Lois S.; Mott, David E. W.
An investigation of the effect on the difficulty value of an item due to position placement within a test was made. Using a 60-item operational test comprised of 5 subtests, 60 items were placed as experimental items on a number of spiralled test forms in three different positions (first, middle, last) within the subtest composed of like items.…
ERIC Educational Resources Information Center
Marie, S. Maria Josephine Arokia; Edannur, Sreekala
2015-01-01
This paper focused on the analysis of test items constructed in the paper of teaching Physical Science for B.Ed. class. It involved the analysis of difficulty level and discrimination power of each test item. Item analysis allows selecting or omitting items from the test, but more importantly item analysis is a tool to help the item writer improve…
Self-perceived Coparenting of Nonresident Fathers: Scale Development and Validation.
Dyer, W Justin; Fagan, Jay; Kaufman, Rebecca; Pearson, Jessica; Cabrera, Natasha
2017-11-16
This study reports on the development and validation of the Fatherhood Research and Practice Network coparenting perceptions scale for nonresident fathers. Although other measures of coparenting have been developed, this is the first measure developed specifically for low-income, nonresident fathers. Focus groups were conducted to determine various aspects of coparenting. Based on this, a scale was created and administered to 542 nonresident fathers. Participants also responded to items used to examine convergent and predictive validity (i.e., parental responsibility, contact with the mother, father self-efficacy and satisfaction, child behavior problems, and contact and engagement with the child). Factor analyses and reliability tests revealed three distinct and reliable perceived coparenting factors: undermining, alliance, and gatekeeping. Validity tests suggest substantial overlap between the undermining and alliance factors, though undermining was uniquely related to child behavior problems. The alliance and gatekeeping factors showed strong convergent validity and evidence for predictive validity. Taken together, results suggest this relatively short measure (11 items) taps into three coparenting dimensions significantly predictive of aspects of individual and family life. © 2017 Family Process Institute.
Bustamante, Eliseo; Sanabria, Álvaro
2014-01-01
Professionalism is a subject of interest in medical schools around the world. The use of a questionnaire could be useful to assess professionalism in Colombia. To adapt The Penn State University College of Medicine Professionalism Questionnaire as a culturally valid instrument in the Spanish language. We followed recommendations from the IQOLA project and used forward and back translation with four independent translations, as well as a pilot evaluation and an evaluation of psychometric features with 250 students. We evaluated item-scale correlations and internal consistency with Chronbach's alpha test and conducted a principal components factor analysis. Global Cronbach's alpha was 0.86, the Kaiser-Meyer-Olkin measure of sampling adequacy was 0.83, and Bartlett's test of sphericity had a p >0.00001. We found six factors that explained 93% of the total variance and four new factors emerged in the factor analysis, while eight items had high uniqueness. The Penn State University College of Medicine Scale measures professionalism attitudes in medical students with good reliability. However, the structure of the scale demonstrated differences when used in the Latin American medical student population.
ERIC Educational Resources Information Center
Wang, Wei
2013-01-01
Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under…
Test item linguistic complexity and assessments for deaf students.
Cawthon, Stephanie
2011-01-01
Linguistic complexity of test items is one test format element that has been studied in the context of struggling readers and their participation in paper-and-pencil tests. The present article presents findings from an exploratory study on the potential relationship between linguistic complexity and test performance for deaf readers. A total of 64 students completed 52 multiple-choice items, 32 in mathematics and 20 in reading. These items were coded for linguistic complexity components of vocabulary, syntax, and discourse. Mathematics items had higher linguistic complexity ratings than reading items, but there were no significant relationships between item linguistic complexity scores and student performance on the test items. The discussion addresses issues related to the subject area, student proficiency levels in the test content, factors to look for in determining a "linguistic complexity effect," and areas for further research in test item development and deaf students.
NASA Astrophysics Data System (ADS)
Göttsche, Malte; Schirm, Janet; Glaser, Alexander
2016-12-01
Gamma-ray spectrometry has been successfully employed to identify unique items containing special nuclear materials. Template information barriers have been developed in the past to confirm items as warheads by comparing their gamma signature to the signature of true warheads. Their development has, however, not been fully transparent, and they may not be sensitive to some relevant evasion scenarios. We develop a fully open template information barrier concept, based on low-resolution measurements, which, by design, reduces the extent of revealed sensitive information. The concept is based on three signatures of an item to be compared to a recorded template. The similarity of the spectrum is assessed by a modification of the Kolmogorov-Smirnov test to confirm the isotopic composition. The total gamma count rate must agree with the template as a measure of the projected surface of the object. In order to detect the diversion of fissile material from the interior of an item, a polyethylene mask is placed in front of the detector. Neutrons from spontaneous and induced fission events in the item produce 2.223 MeV gamma rays from neutron capture by hydrogen-1 in the mask. This peak is detected and its intensity scales with the item's fissile mass. The analysis based on MCNP Monte Carlo simulations of various plutonium configurations suggests that this concept can distinguish a valid item from a variety of invalid ones. The concept intentionally avoids any assumptions about specific spectral features, such as looking for specific gamma peaks of specific isotopes, thereby facilitating a fully unclassified discussion. By making all aspects public and allowing interested participants to contribute to the development and benchmarking, we enable a more open and inclusive discourse on this matter.
Tóth-Király, István; Bõthe, Beáta; Rigó, Adrien; Orosz, Gábor
2017-01-01
While exploratory factor analysis (EFA) provides a more realistic presentation of the data with the allowance of item cross-loadings, confirmatory factor analysis (CFA) includes many methodological advances that the former does not. To create a synergy of the two, exploratory structural equation modeling (ESEM) was proposed as an alternative solution, incorporating the advantages of EFA and CFA. The present investigation is thus an illustrative demonstration of the applicability and flexibility of ESEM. To achieve this goal, we compared CFA and ESEM models, then thoroughly tested measurement invariance and differential item functioning through multiple-indicators-multiple-causes (MIMIC) models on the Passion Scale, the only measure of the Dualistic Model of Passion (DMP) which differentiates between harmonious and obsessive forms of passion. Moreover, a hybrid model was also created to overcome the drawbacks of the two methods. Analyses of the first large community sample (N = 7,466; 67.7% females; Mage = 26.01) revealed the superiority of the ESEM model relative to CFA in terms of improved goodness-of-fit and less correlated factors, while at the same time retaining the high definition of the factors. However, this fit was only achieved with the inclusion of three correlated uniquenesses, two of which appeared in previous studies and one of which was specific to the current investigation. These findings were replicated on a second, comprehensive sample (N = 504; 51.8% females; Mage = 39.59). After combining the two samples, complete measurement invariance (factor loadings, item intercepts, item uniquenesses, factor variances-covariances, and latent means) was achieved across gender and partial invariance across age groups and their combination. Only one item intercept was non-invariant across both multigroup and MIMIC approaches, an observation that was further corroborated by the hybrid model. While obsessive passion showed a slight decline in the hybrid model, harmonious passion did not. Overall, the ESEM framework is a viable alternative of CFA that could be used and even extended to address substantially important questions and researchers should systematically compare these two approaches to identify the most suitable one. PMID:29163325
Tóth-Király, István; Bõthe, Beáta; Rigó, Adrien; Orosz, Gábor
2017-01-01
While exploratory factor analysis (EFA) provides a more realistic presentation of the data with the allowance of item cross-loadings, confirmatory factor analysis (CFA) includes many methodological advances that the former does not. To create a synergy of the two, exploratory structural equation modeling (ESEM) was proposed as an alternative solution, incorporating the advantages of EFA and CFA. The present investigation is thus an illustrative demonstration of the applicability and flexibility of ESEM. To achieve this goal, we compared CFA and ESEM models, then thoroughly tested measurement invariance and differential item functioning through multiple-indicators-multiple-causes (MIMIC) models on the Passion Scale, the only measure of the Dualistic Model of Passion (DMP) which differentiates between harmonious and obsessive forms of passion. Moreover, a hybrid model was also created to overcome the drawbacks of the two methods. Analyses of the first large community sample ( N = 7,466; 67.7% females; M age = 26.01) revealed the superiority of the ESEM model relative to CFA in terms of improved goodness-of-fit and less correlated factors, while at the same time retaining the high definition of the factors. However, this fit was only achieved with the inclusion of three correlated uniquenesses, two of which appeared in previous studies and one of which was specific to the current investigation. These findings were replicated on a second, comprehensive sample ( N = 504; 51.8% females; M age = 39.59). After combining the two samples, complete measurement invariance (factor loadings, item intercepts, item uniquenesses, factor variances-covariances, and latent means) was achieved across gender and partial invariance across age groups and their combination. Only one item intercept was non-invariant across both multigroup and MIMIC approaches, an observation that was further corroborated by the hybrid model. While obsessive passion showed a slight decline in the hybrid model, harmonious passion did not. Overall, the ESEM framework is a viable alternative of CFA that could be used and even extended to address substantially important questions and researchers should systematically compare these two approaches to identify the most suitable one.
The Selection of Test Items for Decision Making with a Computer Adaptive Test.
ERIC Educational Resources Information Center
Spray, Judith A.; Reckase, Mark D.
The issue of test-item selection in support of decision making in adaptive testing is considered. The number of items needed to make a decision is compared for two approaches: selecting items from an item pool that are most informative at the decision point or selecting items that are most informative at the examinee's ability level. The first…
Tepe, Rodger; Tepe, Chabha
2015-03-01
To develop and psychometrically evaluate an information literacy (IL) self-efficacy survey and an IL knowledge test. In this test-retest reliability study, a 25-item IL self-efficacy survey and a 50-item IL knowledge test were developed and administered to a convenience sample of 53 chiropractic students. Item analyses were performed on all questions. The IL self-efficacy survey demonstrated good reliability (test-retest correlation = 0.81) and good/very good internal consistency (mean κ = .56 and Cronbach's α = .92). A total of 25 questions with the best item analysis characteristics were chosen from the 50-item IL knowledge test, resulting in a 25-item IL knowledge test that demonstrated good reliability (test-retest correlation = 0.87), very good internal consistency (mean κ = .69, KR20 = 0.85), and good item discrimination (mean point-biserial = 0.48). This study resulted in the development of three instruments: a 25-item IL self-efficacy survey, a 50-item IL knowledge test, and a 25-item IL knowledge test. The information literacy self-efficacy survey and the 25-item version of the information literacy knowledge test have shown preliminary evidence of adequate reliability and validity to justify continuing study with these instruments.
A New Item Selection Procedure for Mixed Item Type in Computerized Classification Testing.
ERIC Educational Resources Information Center
Lau, C. Allen; Wang, Tianyou
This paper proposes a new Information-Time index as the basis for item selection in computerized classification testing (CCT) and investigates how this new item selection algorithm can help improve test efficiency for item pools with mixed item types. It also investigates how practical constraints such as item exposure rate control, test…
A Process for Reviewing and Evaluating Generated Test Items
ERIC Educational Resources Information Center
Gierl, Mark J.; Lai, Hollis
2016-01-01
Testing organization needs large numbers of high-quality items due to the proliferation of alternative test administration methods and modern test designs. But the current demand for items far exceeds the supply. Test items, as they are currently written, evoke a process that is both time-consuming and expensive because each item is written,…
Integrating Resources in the Education Library: Trends, Issues, and Reality
ERIC Educational Resources Information Center
Osa, Justina O.
2005-01-01
Resources found in the typical education library that supports teacher education programs often include print and non-print library items, and other items that are unique to education library collections. This article attempts to share what the education library is doing to integrate all of its resources irrespective of their formats. The main…
Item validity vs. item discrimination index: a redundancy?
NASA Astrophysics Data System (ADS)
Panjaitan, R. L.; Irawati, R.; Sujana, A.; Hanifah, N.; Djuanda, D.
2018-03-01
In several literatures about evaluation and test analysis, it is common to find that there are calculations of item validity as well as item discrimination index (D) with different formula for each. Meanwhile, other resources said that item discrimination index could be obtained by calculating the correlation between the testee’s score in a particular item and the testee’s score on the overall test, which is actually the same concept as item validity. Some research reports, especially undergraduate theses tend to include both item validity and item discrimination index in the instrument analysis. It seems that these concepts might overlap for both reflect the test quality on measuring the examinees’ ability. In this paper, examples of some results of data processing on item validity and item discrimination index were compared. It would be discussed whether item validity and item discrimination index can be represented by one of them only or it should be better to present both calculations for simple test analysis, especially in undergraduate theses where test analyses were included.
A Comparison of Three Types of Test Development Procedures Using Classical and Latent Trait Methods.
ERIC Educational Resources Information Center
Benson, Jeri; Wilson, Michael
Three methods of item selection were used to select sets of 38 items from a 50-item verbal analogies test and the resulting item sets were compared for internal consistency, standard errors of measurement, item difficulty, biserial item-test correlations, and relative efficiency. Three groups of 1,500 cases each were used for item selection. First…
ERIC Educational Resources Information Center
Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem
2016-01-01
The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…
Effect of study context on item recollection.
Skinner, Erin I; Fernandes, Myra A
2010-07-01
We examined how visual context information provided during encoding, and unrelated to the target word, affected later recollection for words presented alone using a remember-know paradigm. Experiments 1A and 1B showed that participants had better overall memory-specifically, recollection-for words studied with pictures of intact faces than for words studied with pictures of scrambled or inverted faces. Experiment 2 replicated these results and showed that recollection was higher for words studied with pictures of faces than when no image accompanied the study word. In Experiment 3 participants showed equivalent memory for words studied with unique faces as for those studied with a repeatedly presented face. Results suggest that recollection benefits when visual context information high in meaningful content accompanies study words and that this benefit is not related to the uniqueness of the context. We suggest that participants use elaborative processes to integrate item and meaningful contexts into ensemble information, improving subsequent item recollection.
The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory
ERIC Educational Resources Information Center
Sahin, Alper; Anil, Duygu
2017-01-01
This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…
[Perceptions on item disclosure for the Korean medical licensing examination].
Yang, Eunbae B
2015-09-01
This study analyzed the perceptions of medical students and faculty regarding disclosure of test items on the Korean medical licensing examination. I conducted a survey of medical students from medical colleges and professional medical schools nationwide. Responses were analyzed from 718 participants as well as 69 faculty members who participated in creating the medical licensing examination item sets. Data were analyzed using descriptive statistics and the chi-square test. It is important to maintain test quality and to keep the test items unavailable to the public. There are also concerns among students that disclosure of test items would prompt increasing difficulty of test items (48.3%). Further, few students found it desirable to disclose test items regardless of any considerations (28.5%). The professors, who had experience in designing the test items, also expressed their opposition to test item disclosure (60.9%). It is desirable not to disclose the test items of the Korean medical licensing examination to the public on the condition that students are provided with a sufficient amount of information regarding the examination. This is so that the exam can appropriately identify candidates with the required qualifications.
A Review of Classical Methods of Item Analysis.
ERIC Educational Resources Information Center
French, Christine L.
Item analysis is a very important consideration in the test development process. It is a statistical procedure to analyze test items that combines methods used to evaluate the important characteristics of test items, such as difficulty, discrimination, and distractibility of the items in a test. This paper reviews some of the classical methods for…
Modeling Item-Position Effects within an IRT Framework
ERIC Educational Resources Information Center
Debeer, Dries; Janssen, Rianne
2013-01-01
Changing the order of items between alternate test forms to prevent copying and to enhance test security is a common practice in achievement testing. However, these changes in item order may affect item and test characteristics. Several procedures have been proposed for studying these item-order effects. The present study explores the use of…
ACER Chemistry Test Item Collection. ACER Chemtic Year 12.
ERIC Educational Resources Information Center
Australian Council for Educational Research, Hawthorn.
The chemistry test item banks contains 225 multiple-choice questions suitable for diagnostic and achievement testing; a three-page teacher's guide; answer key with item facilities; an answer sheet; and a 45-item sample achievement test. Although written for the new grade 12 chemistry course in Victoria, Australia, the items are widely applicable.…
Patient Safety Culture Survey in Pediatric Complex Care Settings: A Factor Analysis.
Hessels, Amanda J; Murray, Meghan; Cohen, Bevin; Larson, Elaine L
2017-04-19
Children with complex medical needs are increasing in number and demanding the services of pediatric long-term care facilities (pLTC), which require a focus on patient safety culture (PSC). However, no tool to measure PSC has been tested in this unique hybrid acute care-residential setting. The objective of this study was to evaluate the psychometric properties of the Nursing Home Survey on Patient Safety Culture tool slightly modified for use in the pLTC setting. Factor analyses were performed on data collected from 239 staff at 3 pLTC in 2012. Items were screened by principal axis factoring, and the original structure was tested using confirmatory factor analysis. Exploratory factor analysis was conducted to identify the best model fit for the pLTC data, and factor reliability was assessed by Cronbach alpha. The extracted, rotated factor solution suggested items in 4 (staffing, nonpunitive response to mistakes, communication openness, and organizational learning) of the original 12 dimensions may not be a good fit for this population. Nevertheless, in the pLTC setting, both the original and the modified factor solutions demonstrated similar reliabilities to the published consistencies of the survey when tested in adult nursing homes and the items factored nearly identically as theorized. This study demonstrates that the Nursing Home Survey on Patient Safety Culture with minimal modification may be an appropriate instrument to measure PSC in pLTC settings. Additional psychometric testing is recommended to further validate the use of this instrument in this setting, including examining the relationship to safety outcomes. Increased use will yield data for benchmarking purposes across these specialized settings to inform frontline workers and organizational leaders of areas of strength and opportunity for improvement.
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Assembling a Computerized Adaptive Testing Item Pool as a Set of Linear Tests
ERIC Educational Resources Information Center
van der Linden, Wim J.; Ariel, Adelaide; Veldkamp, Bernard P.
2006-01-01
Test-item writing efforts typically results in item pools with an undesirable correlational structure between the content attributes of the items and their statistical information. If such pools are used in computerized adaptive testing (CAT), the algorithm may be forced to select items with less than optimal information, that violate the content…
Evaluation of Northwest University, Kano Post-UTME Test Items Using Item Response Theory
ERIC Educational Resources Information Center
Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi
2016-01-01
High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…
Item Specifications, Science Grade 8. Blue Prints for Testing Minimum Performance Test.
ERIC Educational Resources Information Center
Arkansas State Dept. of Education, Little Rock.
These item specifications were developed as a part of the Arkansas "Minimum Performance Testing Program" (MPT). There is one item specification for each instructional objective included in the MPT. The purpose of an item specification is to provide an overview of the general content and format of test items used to measure an…
Item Specifications, Science Grade 6. Blue Prints for Testing Minimum Performance Test.
ERIC Educational Resources Information Center
Arkansas State Dept. of Education, Little Rock.
These item specifications were developed as a part of the Arkansas "Minimum Performance Testing Program" (MPT). There is one item specification for each instructional objective included in the MPT. The purpose of an item specification is to provide an overview of the general content and format of test items used to measure an…
Criterion-Referenced Test Items for Welding.
ERIC Educational Resources Information Center
Davis, Diane, Ed.
This test item bank on welding contains test questions based upon competencies found in the Missouri Welding Competency Profile. Some test items are keyed for multiple competencies. These criterion-referenced test items are designed to work with the Vocational Instructional Management System. Questions have been statistically sampled and validated…
Jang, Yoonhee; Wixted, John T.; Pecher, Diane; Zeelenberg, René; Huber, David E.
2012-01-01
Even without feedback, test practice enhances delayed performance compared to study practice, but the size of the effect is variable across studies. We investigated the benefit of testing, separating initially retrievable items from initially non-retrievable items. In two experiments, an initial test determined item retrievability. Retrievable or non-retrievable items were subsequently presented for repeated study or test practice. Collapsing across items, in Experiment 1, we obtained the typical crossover interaction between retention interval and practice type. For retrievable items, however, the crossover interaction was quantitatively different, with a small study benefit for an immediate test and a larger testing benefit after a delay. For non-retrievable items, there was a large study benefit for an immediate test, but one week later there was no difference between the study and test practice conditions. In Experiment 2, initially non-retrievable items were given additional study followed by either an immediate test or even more additional study, and one week later performance did not differ between the two conditions. These results indicate that the effect size of study/test practice is due to the relative contribution of retrievable and non-retrievable items. PMID:22304454
Jang, Yoonhee; Wixted, John T; Pecher, Diane; Zeelenberg, René; Huber, David E
2012-01-01
Even without feedback, test practice enhances delayed performance compared to study practice, but the size of the effect is variable across studies. We investigated the benefit of testing, separating initially retrievable items from initially nonretrievable items. In two experiments, an initial test determined item retrievability. Retrievable or nonretrievable items were subsequently presented for repeated study or test practice. Collapsing across items, in Experiment 1, we obtained the typical cross-over interaction between retention interval and practice type. For retrievable items, however, the cross-over interaction was quantitatively different, with a small study benefit for an immediate test and a larger testing benefit after a delay. For nonretrievable items, there was a large study benefit for an immediate test, but one week later there was no difference between the study and test practice conditions. In Experiment 2, initially nonretrievable items were given additional study followed by either an immediate test or even more additional study, and one week later performance did not differ between the two conditions. These results indicate that the effect size of study/test practice is due to the relative contribution of retrievable and nonretrievable items.
Optimal Test Design with Rule-Based Item Generation
ERIC Educational Resources Information Center
Geerlings, Hanneke; van der Linden, Wim J.; Glas, Cees A. W.
2013-01-01
Optimal test-design methods are applied to rule-based item generation. Three different cases of automated test design are presented: (a) test assembly from a pool of pregenerated, calibrated items; (b) test generation on the fly from a pool of calibrated item families; and (c) test generation on the fly directly from calibrated features defining…
Sargeant, J M; O'Connor, A M; Dohoo, I R; Erb, H N; Cevallos, M; Egger, M; Ersbøll, A K; Martin, S W; Nielsen, L R; Pearl, D L; Pfeiffer, D U; Sanchez, J; Torrence, M E; Vigre, H; Waldner, C; Ward, M P
2016-12-01
Reporting of observational studies in veterinary research presents challenges that often are not addressed in published reporting guidelines. Our objective was to develop an extension of the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) statement that addresses unique reporting requirements for observational studies in veterinary medicine related to health, production, welfare, and food safety. We conducted a consensus meeting with 17 experts in Mississauga, Canada. Experts completed a premeeting survey about whether items in the STROBE statement should be modified or added to address unique issues related to observational studies in animal species with health, production, welfare, or food safety outcomes. During the meeting, each STROBE item was discussed to determine whether or not rewording was recommended, and whether additions were warranted. Anonymous voting was used to determine consensus. Six items required no modifications or additions. Modifications or additions were made to the STROBE items 1 (title and abstract), 3 (objectives), 5 (setting), 6 (participants), 7 (variables), 8 (data sources and measurement), 9 (bias), 10 (study size), 12 (statistical methods), 13 (participants), 14 (descriptive data), 15 (outcome data), 16 (main results), 17 (other analyses), 19 (limitations), and 22 (funding). The methods and processes used were similar to those used for other extensions of the STROBE statement. The use of this STROBE statement extension should improve reporting of observational studies in veterinary research by recognizing unique features of observational studies involving food-producing and companion animals, products of animal origin, aquaculture, and wildlife.
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…
Criterion-Referenced Test Items for Small Engines.
ERIC Educational Resources Information Center
Herd, Amon
This notebook contains criterion-referenced test items for testing students' knowledge of small engines. The test items are based upon competencies found in the Missouri Small Engine Competency Profile. The test item bank is organized in 18 sections that cover the following duties: shop procedures; tools and equipment; fasteners; servicing fuel…
An Investigation of the Impact of Guessing on Coefficient α and Reliability
2014-01-01
Guessing is known to influence the test reliability of multiple-choice tests. Although there are many studies that have examined the impact of guessing, they used rather restrictive assumptions (e.g., parallel test assumptions, homogeneous inter-item correlations, homogeneous item difficulty, and homogeneous guessing levels across items) to evaluate the relation between guessing and test reliability. Based on the item response theory (IRT) framework, this study investigated the extent of the impact of guessing on reliability under more realistic conditions where item difficulty, item discrimination, and guessing levels actually vary across items with three different test lengths (TL). By accommodating multiple item characteristics simultaneously, this study also focused on examining interaction effects between guessing and other variables entered in the simulation to be more realistic. The simulation of the more realistic conditions and calculations of reliability and classical test theory (CTT) item statistics were facilitated by expressing CTT item statistics, coefficient α, and reliability in terms of IRT model parameters. In addition to the general negative impact of guessing on reliability, results showed interaction effects between TL and guessing and between guessing and test difficulty.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-12-06
... Federal Acquisition Regulation Supplement: Domestically Nonavailable Articles--Elimination of DoD-Unique... Supplement (DFARS) to remove the DoD-unique list of nonavailable articles because these items have been found... remove section 225.104 in its entirety, because the articles currently listed no longer qualify as an...
Unusual Suspects: The Case of Insider Theft in Research Libraries and Special Collections
ERIC Educational Resources Information Center
Samuelson, Todd; Sare, Laura; Coker, Catherine
2012-01-01
The widespread theft of collection materials, including rare and unique items, continues to be an issue of great concern to libraries of all types. The potential loss of such items threatens not only an institution's operations but, in many cases, global cultural heritage. Despite an increasingly open attitude among institutions regarding sharing…
Evidences of School Related Alienation in Elementary School Pupils.
ERIC Educational Resources Information Center
McElhinney, James H.; And Others.
In the spring of 1969 over 6,000 students in grades four through six responded to a 72 item questionnaire. Of the 72, 11 include responses which suggest possible alienation of this age group. Each school's pupils produced a unique pattern of responses to the 11 items, which suggests that the immediate school environment is one contributing factor…
Score Equating and Item Response Theory: Some Practical Considerations.
ERIC Educational Resources Information Center
Cook, Linda L.; Eignor, Daniel R.
The purposes of this paper are five-fold to discuss: (1) when item response theory (IRT) equating methods should provide better results than traditional methods; (2) which IRT model, the three-parameter logistic or the one-parameter logistic (Rasch), is the most reasonable to use; (3) what unique contributions IRT methods can offer the equating…
Code of Federal Regulations, 2011 CFR
2011-10-01
... Organization/International Electrotechnical Commission (ISO/IEC) Standard 16022: Information Technology... for sale, and does not ordinarily lose its identity or become a component part of another article when... code (for items too small to individually tag or mark). (ii) Contents (the type of information recorded...
Non-developmental item computer systems and the malicious software threat
NASA Technical Reports Server (NTRS)
Bown, Rodney L.
1991-01-01
The following subject areas are covered: a DOD development system - the Army Secure Operating System; non-development commercial computer systems; security, integrity, and assurance of service (SI and A); post delivery SI and A and malicious software; computer system unique attributes; positive feedback to commercial computer systems vendors; and NDI (Non-Development Item) computers and software safety.
Evaluating the Psychometric Characteristics of Generated Multiple-Choice Test Items
ERIC Educational Resources Information Center
Gierl, Mark J.; Lai, Hollis; Pugh, Debra; Touchie, Claire; Boulais, André-Philippe; De Champlain, André
2016-01-01
Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric…
Tepe, Rodger; Tepe, Chabha
2015-01-01
Objective To develop and psychometrically evaluate an information literacy (IL) self-efficacy survey and an IL knowledge test. Methods In this test–retest reliability study, a 25-item IL self-efficacy survey and a 50-item IL knowledge test were developed and administered to a convenience sample of 53 chiropractic students. Item analyses were performed on all questions. Results The IL self-efficacy survey demonstrated good reliability (test–retest correlation = 0.81) and good/very good internal consistency (mean κ = .56 and Cronbach's α = .92). A total of 25 questions with the best item analysis characteristics were chosen from the 50-item IL knowledge test, resulting in a 25-item IL knowledge test that demonstrated good reliability (test–retest correlation = 0.87), very good internal consistency (mean κ = .69, KR20 = 0.85), and good item discrimination (mean point-biserial = 0.48). Conclusions This study resulted in the development of three instruments: a 25-item IL self-efficacy survey, a 50-item IL knowledge test, and a 25-item IL knowledge test. The information literacy self-efficacy survey and the 25-item version of the information literacy knowledge test have shown preliminary evidence of adequate reliability and validity to justify continuing study with these instruments. PMID:25517736
Integrating Test-Form Formatting into Automated Test Assembly
ERIC Educational Resources Information Center
Diao, Qi; van der Linden, Wim J.
2013-01-01
Automated test assembly uses the methodology of mixed integer programming to select an optimal set of items from an item bank. Automated test-form generation uses the same methodology to optimally order the items and format the test form. From an optimization point of view, production of fully formatted test forms directly from the item pool using…
ERIC Educational Resources Information Center
Gierl, Mark J.; Lai, Hollis
2013-01-01
Changes to the design and development of our educational assessments are resulting in the unprecedented demand for a large and continuous supply of content-specific test items. One way to address this growing demand is with automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer…
A Procedure To Detect Test Bias Present Simultaneously in Several Items.
ERIC Educational Resources Information Center
Shealy, Robin; Stout, William
A statistical procedure is presented that is designed to test for unidirectional test bias existing simultaneously in several items of an ability test, based on the assumption that test bias is incipient within the two groups' ability differences. The proposed procedure--Simultaneous Item Bias (SIB)--is based on a multidimensional item response…
An Item Response Theory Model for Test Bias.
ERIC Educational Resources Information Center
Shealy, Robin; Stout, William
This paper presents a conceptualization of test bias for standardized ability tests which is based on multidimensional, non-parametric, item response theory. An explanation of how individually-biased items can combine through a test score to produce test bias is provided. It is contended that bias, although expressed at the item level, should be…
Sui, Jie; Humphreys, Glyn W
2013-11-01
We report data demonstrating that self-referential encoding facilitates memory performance in the absence of effects of semantic elaboration in a severely amnesic patient also suffering semantic problems. In Part 1, the patient, GA, was trained to associate items with the self or a familiar other during the encoding phase of a memory task (self-ownership decisions in Experiment 1 and self-evaluation decisions in Experiment 2). Tests of memory showed a consistent self-reference advantage, relative to a condition where the reference was another person in both experiments. The pattern of the self-reference advantage was similar to that in healthy controls. In Part 2 we demonstrate that GA showed minimal effects of semantic elaboration on memory for items he semantically classified, compared with items subject to physical size decisions; in contrast, healthy controls demonstrated enhanced memory performance after semantic relative to physical encoding. The results indicate that self-referential encoding, not semantic elaboration, improves memory in amnesia. Self-referential processing may provide a unique scaffold to help improve learning in amnesic cases. Copyright © 2013 Elsevier Ltd. All rights reserved.
Child and Adolescent Perceptions of Oral Health Over the Life Course
Maida, Carl A.; Marcus, Marvin; Hays, Ron D.; Coulter, Ian D.; Ramos-Gomez, Francisco; Lee, Steve Y.; McClory, Patricia S.; Van, Laura V.; Wang, Yan; Shen, Jie; Cai, Li; Spolsky, Vladimir W.; Crall, James J.; Liu, Honghu
2016-01-01
Purpose To elicit perceptions of oral health in children and adolescents as an initial step in the in the development of oral health item banks for the Patient-Reported Oral Health Outcomes Measurement Information System project. Methods We conducted focus groups with ethnically, socioeconomically, and geographically diverse youth (8-12, 13-17 years) to identify perceptions of oral health status. We performed content analysis, including a thematic and narrative analysis, to identify important themes. Results We identified three unique themes that the youth associated with their oral health status: 1) understanding the value of maintaining good oral health over the life course, with respect to longevity and quality of life in the adult years; 2) positive association between maintaining good oral health and interpersonal relationships at school, and dating, for older youth; and 3) knowledge of the benefits of orthodontic treatment to appearance and positive self-image, while holding a strong view as to the discomfort associated with braces. Conclusions The results provide valuable information about core domains for the oral health item banks to be developed and generated content for new items to be developed and evaluated with cognitive interviews and in a field test. PMID:26038216
Teicher, Martin H.; Parigger, Angelika
2015-01-01
There is increasing interest in childhood maltreatment as a potent stimulus that may alter trajectories of brain development, induce epigenetic modifications and enhance risk for medical and psychiatric disorders. Although a number of useful scales exist for retrospective assessment of abuse and neglect they have significant limitations. Moreover, they fail to provide detailed information on timing of exposure, which is critical for delineation of sensitive periods. The Maltreatment and Abuse Chronology of Exposure (MACE) scale was developed in a sample of 1051 participants using item response theory to gauge severity of exposure to ten types of maltreatment (emotional neglect, non-verbal emotional abuse, parental physical maltreatment, parental verbal abuse, peer emotional abuse, peer physical bullying, physical neglect, sexual abuse, witnessing interparental violence and witnessing violence to siblings) during each year of childhood. Items included in the subscales had acceptable psychometric properties based on infit and outfit mean square statistics, and each subscale passed Andersen’s Likelihood ratio test. The MACE provides an overall severity score and multiplicity score (number of types of maltreatment experienced) with excellent test-retest reliability. Each type of maltreatment showed good reliability as did severity of exposure across each year of childhood. MACE Severity correlated 0.738 with Childhood Trauma Questionnaire (CTQ) score and MACE Multiplicity correlated 0.698 with the Adverse Childhood Experiences scale (ACE). However, MACE accounted for 2.00- and 2.07-fold more of the variance, on average, in psychiatric symptom ratings than CTQ or ACE, respectively, based on variance decomposition. Different types of maltreatment had distinct and often unique developmental patterns. The 52-item MACE, a simpler Maltreatment Abuse and Exposure Scale (MAES) that only assesses overall exposure and the original test instrument (MACE-X) with several additional items plus spreadsheets and R code for scoring are provided to facilitate use and to spur further development. PMID:25714856
A new instrument to measure quality of life of heart failure family caregivers.
Nauser, Julie A; Bakas, Tamilyn; Welch, Janet L
2011-01-01
Family caregivers of heart failure (HF) patients experience poor physical and mental health leading to poor quality of life. Although several quality-of-life measures exist, they are often too generic to capture the unique experience of this population. The purpose of this study was to evaluate the psychometric properties of the Family Caregiver Quality of Life (FAMQOL) Scale that was designed to assess the physical, psychological, social, and spiritual dimensions of quality of life among caregivers of HF patients. Psychometric testing of the FAMQOL with 100 HF family caregivers was conducted using item analysis, Cronbach α, intraclass correlation, factor analysis, and hierarchical multiple regression guided by a conceptual model. Caregivers were predominately female (89%), white, (73%), and spouses (62%). Evidence of internal consistency reliability (α=.89) was provided for the FAMQOL, with item-total correlations of 0.39 to 0.74. Two-week test-retest reliability was supported by an intraclass correlation coefficient of 0.91. Using a 1-factor solution and principal axis factoring, loadings ranged from 0.31 to 0.78, with 41% of the variance explained by the first factor (eigenvalue=6.5). With hierarchical multiple regression, 56% of the FAMQOL variance was explained by model constructs (F8,91=16.56, P<.001). Criterion-related validity was supported by correlations with SF-36 General (r=0.45, P<.001) and Mental (r=0.59, P<.001) Health subscales and Bakas Caregiving Outcomes Scale (r=0.73, P<.001). Evidence of internal and test-retest reliability and construct and criterion validity was provided for physical, psychological, and social well-being subscales. The 16-item FAMQOL is a brief, easy-to-administer instrument that has evidence of reliability and validity in HF family caregivers. Physical, psychological, and social well-being can be measured with 4-item subscales. The FAMQOL scale could serve as a valuable measure in research, as well as an assessment tool to identify caregivers in need of intervention.
ERIC Educational Resources Information Center
Quaigrain, Kennedy; Arhin, Ato Kwamina
2017-01-01
Item analysis is essential in improving items which will be used again in later tests; it can also be used to eliminate misleading items in a test. The study focused on item and test quality and explored the relationship between difficulty index (p-value) and discrimination index (DI) with distractor efficiency (DE). The study was conducted among…
Yoza, Yoshiyasu; Ariyoshi, Koya; Honda, Sumihisa; Taniguchi, Hiroyuki; Senjyu, Hideaki
2009-10-01
Patients with COPD often experience restriction in their activities of daily living (ADL) due to dyspnea. This type of restriction is unique to patients with COPD and cannot be adequately evaluated by the generic ADL scales. This study developed an ADL scale (the Activity of Daily Living Dyspnea scale [ADL-D scale]) for patients with COPD and investigated its validity and internal consistency. Patients with stable COPD were recruited and completed a pilot 26-item questionnaire. Patients also performed the Incremental Shuttle Walk Test (ISWT), and completed the St George's Respiratory Questionnaire (SGRQ), and Medical Research Council (MRC) dyspnea grade. There were 83 male participants who completed the pilot questionnaire. Following the pilot, 8 items that were not undertaken by the majority of subjects, and 3 items judged to be of low clinical importance by physical therapists were removed from the pilot questionnaire. The final ADL-D scale contained 15 items. Scores obtained with the ADL-D scale were significantly correlated with the MRC dyspnea grades, distance walked on the ISWT and SGRQ scores. The ADL-D scores were significantly different across the five grades of the MRC dyspnea grade. The ADL-D scale showed high consistency (Chronbach's alpha coefficient of 0.96). The ADL-D scale is a useful scale for assessing impairments in ADL in Japanese male patients with COPD.
Development of Islamic Spiritual Health Scale (ISHS).
Khorashadizadeh, Fatemeh; Heydari, Abbas; Nabavi, Fatemeh Heshmati; Mazlom, Seyed Reza; Ebrahimi, Mahdi; Esmaili, Habibollah
2017-03-01
To develop and psychometrically assess spiritual health scale based on Islamic view in Iran. The cross-sectional study was conducted at Imam Ali and Quem hospitals in Mashhad and Imam Ali and Imam Reza hospitals in Bojnurd, Iran, from 2015 to 2016 In the first stage, an 81-item Likert-type scale was developed using a qualitative approach. The second stage comprised quantitative component. The scale's impact factor, content validity ratio, content validity index, face validity and exploratory factor analysis were calculated. Test-retest and internal consistency was used to examine the reliability of the instrument. Data analysis was done using SPSS 11. Of 81 items in the scale, those with impact factor above 1.5, content validity ratio above 0.62, and content validity index above 0.79 were considered valid and the rest were discarded, resulting in a 61-item scale. Exploratory factor analysis reduced the list of items to 30, which were divided into seven groups with a minimum eigen value of 1 for each factor. But according to scatter plot, attributes of the concept of spiritual health included love to creator, duty-based life, religious rationality, psychological balance, and attention to afterlife. Internal reliability of the scale was calculated by alpha Cronbach coefficient as 0.91. There was solid evidence of the strength factor structure and reliability of the Islamic Spiritual Health Scale which provides a unique way for spiritual health assessment of Muslims.
ERIC Educational Resources Information Center
Snyder, James
2010-01-01
This dissertation research examined the changes in item RIT calibration that occurred when adding audio to a set of currently calibrated RIT items and then placing these new items as field test items in the modified assessments on the NWEA MAP test platform. The researcher used test results from over 600 students in the Poway School District in…
Otzi, the iceman and his leather clothes.
Püntener, Alois G; Moss, Serge
2010-01-01
Over 5000 years ago, a man climbed up to the icy heights of the glacier in South Tyrol, Italy and died. He was found by accident in 1991, with his clothes and equipment, mummified and frozen: an archaeological sensation and a unique snapshot of a Copper Age man. For several years highly specialised research teams have examined the mummy and all accompanying items. This paper describes how fur and leather clothes of the iceman could have been tanned. Details of the analytical tests undertaken on the 5000 year old leather samples and what they revealed are presented.
The Reward-Based Eating Drive Scale: A Self-Report Index of Reward-Based Eating
Mason, Ashley E.; Laraia, Barbara A.; Hartman, William; Ready, Karen; Acree, Michael; Adam, Tanja C.; St. Jeor, Sachiko; Kessler, David
2014-01-01
Why are some individuals more vulnerable to persistent weight gain and obesity than are others? Some obese individuals report factors that drive overeating, including lack of control, lack of satiation, and preoccupation with food, which may stem from reward-related neural circuitry. These are normative and common symptoms and not the sole focus of any existing measures. Many eating scales capture these common behaviors, but are confounded with aspects of dysregulated eating such as binge eating or emotional overeating. Across five studies, we developed items that capture this reward-based eating drive (RED). Study 1 developed the items in lean to obese individuals (n = 327) and examined changes in weight over eight years. In Study 2, the scale was further developed and expert raters evaluated the set of items. Study 3 tested psychometric properties of the final 9 items in 400 participants. Study 4 examined psychometric properties and race invariance (n = 80 women). Study 5 examined psychometric properties and age/gender invariance (n = 381). Results showed that RED scores correlated with BMI and predicted earlier onset of obesity, greater weight fluctuations, and greater overall weight gain over eight years. Expert ratings of RED scale items indicated that the items reflected characteristics of reward-based eating. The RED scale evidenced high internal consistency and invariance across demographic factors. The RED scale, designed to tap vulnerability to reward-based eating behavior, appears to be a useful brief tool for identifying those at higher risk of weight gain over time. Given the heterogeneity of obesity, unique brief profiling of the reward-based aspect of obesity using a self-report instrument such as the RED scale may be critical for customizing effective treatments in the general population. PMID:24979216
Marsh, Herbert W; Lüdtke, Oliver; Nagengast, Benjamin; Morin, Alexandre J S; Von Davier, Matthias
2013-09-01
The present investigation has a dual focus: to evaluate problematic practice in the use of item parcels and to suggest exploratory structural equation models (ESEMs) as a viable alternative to the traditional independent clusters confirmatory factor analysis (ICM-CFA) model (with no cross-loadings, subsidiary factors, or correlated uniquenesses). Typically, it is ill-advised to (a) use item parcels when ICM-CFA models do not fit the data, and (b) retain ICM-CFA models when items cross-load on multiple factors. However, the combined use of (a) and (b) is widespread and often provides such misleadingly good fit indexes that applied researchers might believe that misspecification problems are resolved--that 2 wrongs really do make a right. Taking a pragmatist perspective, in 4 studies we demonstrate with responses to the Rosenberg Self-Esteem Inventory (Rosenberg, 1965), Big Five personality factors, and simulated data that even small cross-loadings seriously distort relations among ICM-CFA constructs or even decisions on the number of factors; although obvious in item-level analyses, this is camouflaged by the use of parcels. ESEMs provide a viable alternative to ICM-CFAs and a test for the appropriateness of parcels. The use of parcels with an ICM-CFA model is most justifiable when the fit of both ICM-CFA and ESEM models is acceptable and equally good, and when substantively important interpretations are similar. However, if the ESEM model fits the data better than the ICM-CFA model, then the use of parcels with an ICM-CFA model typically is ill-advised--particularly in studies that are also interested in scale development, latent means, and measurement invariance.
Student science achievement and the integration of Indigenous knowledge on standardized tests
NASA Astrophysics Data System (ADS)
Dupuis, Juliann; Abrams, Eleanor
2017-09-01
In this article, we examine how American Indian students in Montana performed on standardized state science assessments when a small number of test items based upon traditional science knowledge from a cultural curriculum, "Indian Education for All", were included. Montana is the first state in the US to mandate the use of a culturally relevant curriculum in all schools and to incorporate this curriculum into a portion of the standardized assessment items. This study compares White and American Indian student test scores on these particular test items to determine how White and American Indian students perform on culturally relevant test items compared to traditional standard science test items. The connections between student achievement on adapted culturally relevant science test items versus traditional items brings valuable insights to the fields of science education, research on student assessments, and Indigenous studies.
Computerized Adaptive Test (CAT) Applications and Item Response Theory Models for Polytomous Items
ERIC Educational Resources Information Center
Aybek, Eren Can; Demirtasli, R. Nukhet
2017-01-01
This article aims to provide a theoretical framework for computerized adaptive tests (CAT) and item response theory models for polytomous items. Besides that, it aims to introduce the simulation and live CAT software to the related researchers. Computerized adaptive test algorithm, assumptions of item response theory models, nominal response…
An Effect Size Measure for Raju's Differential Functioning for Items and Tests
ERIC Educational Resources Information Center
Wright, Keith D.; Oshima, T. C.
2015-01-01
This study established an effect size measure for differential functioning for items and tests' noncompensatory differential item functioning (NCDIF). The Mantel-Haenszel parameter served as the benchmark for developing NCDIF's effect size measure for reporting moderate and large differential item functioning in test items. The effect size of…
Detecting a Gender-Related DIF Using Logistic Regression and Transformed Item Difficulty
ERIC Educational Resources Information Center
Abedlaziz, Nabeel; Ismail, Wail; Hussin, Zaharah
2011-01-01
Test items are designed to provide information about the examinees. Difficult items are designed to be more demanding and easy items are less so. However, sometimes, test items carry with their demands other than those intended by the test developer (Scheuneman & Gerritz, 1990). When personal attributes such as gender systematically affect…
Influence of Fallible Item Parameters on Test Information During Adaptive Testing.
ERIC Educational Resources Information Center
Wetzel, C. Douglas; McBride, James R.
Computer simulation was used to assess the effects of item parameter estimation errors on different item selection strategies used in adaptive and conventional testing. To determine whether these effects reduced the advantages of certain optimal item selection strategies, simulations were repeated in the presence and absence of item parameter…
A Guide to Item Banking in Education. (Third Edition).
ERIC Educational Resources Information Center
Naccarato, Richard W.
The current status of banks of test items existing across the United States was determined through a survey conducted between September and December 1987. Item "bank" in this context does not imply that the test items are available in computerized form, but simply that "deposited" test items can be withdrawn for use. Emphasis…
Development and validation of an energy-balance knowledge test for fourth- and fifth-grade students.
Chen, Senlin; Zhu, Xihe; Kang, Minsoo
2017-05-01
A valid test measuring children's energy-balance (EB) knowledge is lacking in research. This study developed and validated the energy-balance knowledge test (EBKT) for fourth and fifth grade students. The original EBKT contained 25 items but was reduced to 23 items based on pilot result and intensive expert panel discussion. De-identified data were collected from 468 fourth and fifth grade students enrolled in four schools to examine the psychometric properties of the EBKT items. The Rasch model analysis was conducted using the Winstep 3.65.0 software. Differential item functioning (DIF) analysis flagged 1 item (item #4) functioning differently between boys and girls, which was deleted. The final 22-item EBKT showed desirable model-data fit indices. The items had large variability ranging from -3.58 logit (item #10, the easiest) to 1.70 logit (item #3, the hardest). The average person ability on the test was 0.28 logit (SD = .78). Additional analyses supported known-group difference validity of the EBKT scores in capturing gender- and grade-based ability differences. The test was overall valid but could be further improved by expanding test items to discern various ability levels. For lack of a better test, researchers and practitioners may use the EBKT to assess fourth- and fifth-grade students' EB knowledge.
NASA Astrophysics Data System (ADS)
Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan
2016-12-01
This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
ERIC Educational Resources Information Center
Baghaei, Purya; Ravand, Hamdollah
2016-01-01
In this study the magnitudes of local dependence generated by cloze test items and reading comprehension items were compared and their impact on parameter estimates and test precision was investigated. An advanced English as a foreign language reading comprehension test containing three reading passages and a cloze test was analyzed with a…
Machine Shop. Criterion-Referenced Test (CRT) Item Bank.
ERIC Educational Resources Information Center
Davis, Diane, Ed.
This drafting criterion-referenced test item bank is keyed to the machine shop competency profile developed by industry and education professionals in Missouri. The 16 references used for drafting the test items are listed. Test items are arranged under these categories: orientation to machine shop; performing mathematical calculations; performing…
Rescuing Computerized Testing by Breaking Zipf's Law.
ERIC Educational Resources Information Center
Wainer, Howard
2000-01-01
Suggests that because of the nonlinear relationship between item usage and item security, the problems of test security posed by continuous administration of standardized tests cannot be resolved merely by increasing the size of the item pool. Offers alternative strategies to overcome these problems, distributing test items so as to avoid the…
Item-specific processing reduces false memories.
McCabe, David P; Presmanes, Alison G; Robertson, Chuck L; Smith, Anderson D
2004-12-01
We examined the effect of item-specific and relational encoding instructions on false recognition in two experiments in which the DRM paradigm was used (Deese, 1959; Roediger & McDermott, 1995). Type of encoding (item-specific or relational) was manipulated between subjects in Experiment 1 and within subjects in Experiment 2. Decision-based explanations (e.g., the distinctiveness heuristic) predict reductions in false recognition in between-subjects designs, but not in within-subjects designs, because they are conceptualized as global shifts in decision criteria. Memory-based explanations predict reductions in false recognition in both designs, resulting from enhanced recollection of item-specific details. False recognition was reduced following item-specific encoding instructions in both experiments, favoring a memory-based explanation. These results suggest that providing unique cues for the retrieval of individual studied items results in enhanced discrimination between those studied items and critical lures. Conversely, enhancing the similarity of studied items results in poor discrimination among items within a particular list theme. These results are discussed in terms of the item-specific/ relational framework (Hunt & McDaniel, 1993).
ERIC Educational Resources Information Center
Ito, Kyoko; Sykes, Robert C.
This study investigated the practice of weighting a type of test item, such as constructed response, more than other types of items, such as selected response, to compute student scores for a mixed-item type of test. The study used data from statewide writing field tests in grades 3, 5, and 8 and considered two contexts, that in which a single…
ERIC Educational Resources Information Center
Atalmis, Erkan Hasan
2016-01-01
Multiple-choice (MC) items are commonly used in high-stake tests. Thus, each item of such tests should be meticulously constructed to increase the accuracy of decisions based on test results. Haladyna and his colleagues (2002) addressed the valid item-writing guidelines to construct high quality MC items in order to increase test reliability and…
Choi, Yoonsun; Kim, You Seung; Drankus, Dina; Kim, Hyun Jee
2012-01-01
This study aims to describe the family socialization beliefs and practices of Korean immigrant parents through testing psychometric properties of several newly developed items and scales to assess the major components of the Korean traditional concept of family socialization, ga-jung-kyo-yuk. These new measures were examined for validity and reliability. The findings show that Korean immigrant parents largely preserve their traditional and core parenting values, while also showing meaningful, yet not very dramatic, signs of adopting new cultural traits. The results also suggest that the acculturative process may not be simply bilinear but may generate a new, unique and blended value and behavior set from the two (or more) cultures involved. Culturally appropriate practice requires not only further validation of existing knowledge with minority groups, but the development of a theoretical framework of family socialization that recognizes the cultural uniqueness of immigrant families. PMID:24765236
The Relation Between Inattentive and Hyperactive/Impulsive Behaviors and Early Mathematics Skills.
Sims, Darcey M; Purpura, David J; Lonigan, Christopher J
2016-08-01
Despite strong evidence that inattentive and hyperactive/impulsive behaviors are associated with mathematical difficulties in school-age children, little research has been conducted to examine the link between these constructs before the start of formal education. The purpose of this study was to examine how different manifestations of inattentive and hyperactive/impulsive behaviors, as measured by different assessment tools, are related to early mathematics skills in preschoolers. Eighty-two preschool children completed a measure of early mathematics and the Continuous Performance Test (CPT). Teachers rated children's behaviors using the Conners' Teacher Rating Scale-15 Item. Sixty-five of these children completed mathematics assessments 1 year later. Teacher ratings of inattention were uniquely related to concurrent early mathematics skills, whereas CPT errors were uniquely predictive of early mathematics skills 1 year later. Findings have implications for the understanding and assessment of behavior problems that are associated with early mathematics difficulties. © The Author(s) 2012.
Data mining and visualization techniques
Wong, Pak Chung [Richland, WA; Whitney, Paul [Richland, WA; Thomas, Jim [Richland, WA
2004-03-23
Disclosed are association rule identification and visualization methods, systems, and apparatus. An association rule in data mining is an implication of the form X.fwdarw.Y where X is a set of antecedent items and Y is the consequent item. A unique visualization technique that provides multiple antecedent, consequent, confidence, and support information is disclosed to facilitate better presentation of large quantities of complex association rules.
ERIC Educational Resources Information Center
Solheim, Oddny Judith
2011-01-01
It has been hypothesized that students with low self-efficacy will struggle with complex reading tasks in assessment situations. In this study we examined whether perceived reading self-efficacy and reading task value uniquely predicted reading comprehension scores in two different item formats in a sample of fifth-grade students. Results showed…
Paired-Associate Learning Ability Accounts for Unique Variance in Orthographic Learning
ERIC Educational Resources Information Center
Wang, Hua-Chen; Wass, Malin; Castles, Anne
2017-01-01
Paired-associate learning is a dynamic measure of the ability to form new links between two items. This study aimed to investigate whether paired-associate learning ability is associated with success in orthographic learning, and if so, whether it accounts for unique variance beyond phonological decoding ability and orthographic knowledge. A group…
Integrated Vehicle Ground Vibration Testing in Support of Launch Vehicle Loads and Controls Analysis
NASA Technical Reports Server (NTRS)
Askins, Bruce R.; Davis, Susan R.; Salyer, Blaine H.; Tuma, Margaret L.
2008-01-01
All structural systems possess a basic set of physical characteristics unique to that system. These unique physical characteristics include items such as mass distribution and damping. When specified, they allow engineers to understand and predict how a structural system behaves under given loading conditions and different methods of control. These physical properties of launch vehicles may be predicted by analysis or measured by certain types of tests. Generally, these properties are predicted by analysis during the design phase of a launch vehicle and then verified by testing before the vehicle becomes operational. A ground vibration test (GVT) is intended to measure by test the fundamental dynamic characteristics of launch vehicles during various phases of flight. During the series of tests, properties such as natural frequencies, mode shapes, and transfer functions are measured directly. These data will then be used to calibrate loads and control systems analysis models for verifying analyses of the launch vehicle. NASA manned launch vehicles have undergone ground vibration testing leading to the development of successful launch vehicles. A GVT was not performed on the inaugural launch of the unmanned Delta III which was lost during launch. Subsequent analyses indicated had a GVT been performed, it would have identified instability issues avoiding loss of the vehicle. This discussion will address GVT planning, set-up, execution and analyses, for the Saturn and Shuttle programs, and will also focus on the current and on-going planning for the Ares I and V Integrated Vehicle Ground Vibration Test (IVGVT).
Item difficulty and item validity for the Children's Group Embedded Figures Test.
Rusch, R R; Trigg, C L; Brogan, R; Petriquin, S
1994-02-01
The validity and reliability of the Children's Group Embedded Figures Test was reported for students in Grade 2 by Cromack and Stone in 1980; however, a search of the literature indicates no evidence for internal consistency or item analysis. Hence the purpose of this study was to examine the item difficulty and item validity of the test with children in Grades 1 and 2. Confusion in the literature over development and use of this test was seemingly resolved through analysis of these descriptions and through an interview with the test developer. One early-appearing item was unreasonably difficult. Two or three other items were quite difficult and made little contribution to the total score. Caution is recommended, however, in any reordering or elimination of items based on these findings, given the limited number of subjects (n = 84).
1976-01-01
items. The items tested were the MODI-PAC, a proprietary item of Reming)on Arms Company, a standard 12 - gauge round of No. 4 lead shot, and an...to refrain from testing this item. Therefore, the final selection of items for testing were (1) the MODI-PAC, (2) a standard 12 - gauge shotgun round of...The first item evaluated was the MODI-PAC5. The MOQ1-PAC which standsfor “modified impact “ is a 12 - gauge shotgun shell loaded with approximately 320
Australian Chemistry Test Item Bank: Years 11 & 12. Volume 1.
ERIC Educational Resources Information Center
Commons, C., Ed.; Martin, P., Ed.
Volume 1 of the Australian Chemistry Test Item Bank, consisting of two volumes, contains nearly 2000 multiple-choice items related to the chemistry taught in Year 11 and Year 12 courses in Australia. Items which were written during 1979 and 1980 were initially published in the "ACER Chemistry Test Item Collection" and in the "ACER…
Australian Chemistry Test Item Bank: Years 11 and 12. Volume 2.
ERIC Educational Resources Information Center
Commons, C., Ed.; Martin, P., Ed.
The second volume of the Australian Chemistry Test Item Bank, consisting of two volumes, contains nearly 2000 multiple-choice items related to the chemistry taught in Year 11 and Year 12 courses in Australia. Items which were written during 1979 and 1980 were initially published in the "ACER Chemistry Test Item Collection" and in the…
Interactions Between Item Content And Group Membership on Achievement Test Items.
ERIC Educational Resources Information Center
Linn, Robert L.; Harnisch, Delwyn L.
The purpose of this investigation was to examine the interaction of item content and group membership on achievement test items. Estimates of the parameters of the three parameter logistic model were obtained on the 46 item math test for the sample of eighth grade students (N = 2055) participating in the Illinois Inventory of Educational Progress,…
Effects of Item Exposure for Conventional Examinations in a Continuous Testing Environment.
ERIC Educational Resources Information Center
Hertz, Norman R.; Chinn, Roberta N.
This study explored the effect of item exposure on two conventional examinations administered as computer-based tests. A principal hypothesis was that item exposure would have little or no effect on average difficulty of the items over the course of an administrative cycle. This hypothesis was tested by exploring conventional item statistics and…
McInnes, Matthew D F; Moher, David; Thombs, Brett D; McGrath, Trevor A; Bossuyt, Patrick M; Clifford, Tammy; Cohen, Jérémie F; Deeks, Jonathan J; Gatsonis, Constantine; Hooft, Lotty; Hunt, Harriet A; Hyde, Christopher J; Korevaar, Daniël A; Leeflang, Mariska M G; Macaskill, Petra; Reitsma, Johannes B; Rodin, Rachel; Rutjes, Anne W S; Salameh, Jean-Paul; Stevens, Adrienne; Takwoingi, Yemisi; Tonelli, Marcello; Weeks, Laura; Whiting, Penny; Willis, Brian H
2018-01-23
Systematic reviews of diagnostic test accuracy synthesize data from primary diagnostic studies that have evaluated the accuracy of 1 or more index tests against a reference standard, provide estimates of test performance, allow comparisons of the accuracy of different tests, and facilitate the identification of sources of variability in test accuracy. To develop the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) diagnostic test accuracy guideline as a stand-alone extension of the PRISMA statement. Modifications to the PRISMA statement reflect the specific requirements for reporting of systematic reviews and meta-analyses of diagnostic test accuracy studies and the abstracts for these reviews. Established standards from the Enhancing the Quality and Transparency of Health Research (EQUATOR) Network were followed for the development of the guideline. The original PRISMA statement was used as a framework on which to modify and add items. A group of 24 multidisciplinary experts used a systematic review of articles on existing reporting guidelines and methods, a 3-round Delphi process, a consensus meeting, pilot testing, and iterative refinement to develop the PRISMA diagnostic test accuracy guideline. The final version of the PRISMA diagnostic test accuracy guideline checklist was approved by the group. The systematic review (produced 64 items) and the Delphi process (provided feedback on 7 proposed items; 1 item was later split into 2 items) identified 71 potentially relevant items for consideration. The Delphi process reduced these to 60 items that were discussed at the consensus meeting. Following the meeting, pilot testing and iterative feedback were used to generate the 27-item PRISMA diagnostic test accuracy checklist. To reflect specific or optimal contemporary systematic review methods for diagnostic test accuracy, 8 of the 27 original PRISMA items were left unchanged, 17 were modified, 2 were added, and 2 were omitted. The 27-item PRISMA diagnostic test accuracy checklist provides specific guidance for reporting of systematic reviews. The PRISMA diagnostic test accuracy guideline can facilitate the transparent reporting of reviews, and may assist in the evaluation of validity and applicability, enhance replicability of reviews, and make the results from systematic reviews of diagnostic test accuracy studies more useful.
Cortisol mediates the effects of stress on the contextual dependency of memories.
van Ast, Vanessa A; Cornelisse, Sandra; Meeter, Martijn; Kindt, Merel
2014-03-01
Stress is known to exert considerable impact on learning and memory processes. Typically, human studies have investigated memory for single items (e.g., pictures, words), but it remains unresolved how exactly stress may alter the storage of memories into their original encoding context (i.e., memory contextualization). Since neurocircuitry underlying memory contextualization processes is sensitive to the well-known stress hormone cortisol, we here investigated whether cortisol mediates stress effects on memory contextualization. Forty healthy young men were randomly assigned to a psychosocial stress or control group. Ten minutes after stress manipulation offset, participants were instructed to learn and remember neutral and negative words, each of which was depicted against a unique background picture. Approximately 24h later, memory was tested by means of cued retrieval and recognition tasks. To assess memory contextualization half of the words were tested in intact item-contexts pairs, and half in rearranged item-context combinations. Recognition data showed that cortisol, but no other indices of stress such as heart rate or subjective stress, mediated the effects of stress on contextualization of neutral and negative memories. The mediation analysis further showed that stress resulted in increases in cortisol and that cortisol was positively related to memory contextualization, but unrelated to other measures of memory. Thus, there seems to be a specific role for cortisol in the integration of a central memory into its surrounding context. Copyright © 2013 Elsevier Ltd. All rights reserved.
An Efficiency Balanced Information Criterion for Item Selection in Computerized Adaptive Testing
ERIC Educational Resources Information Center
Han, Kyung T.
2012-01-01
Successful administration of computerized adaptive testing (CAT) programs in educational settings requires that test security and item exposure control issues be taken seriously. Developing an item selection algorithm that strikes the right balance between test precision and level of item pool utilization is the key to successful implementation…
ERIC Educational Resources Information Center
Arendasy, Martin E.; Sommer, Markus
2012-01-01
The use of new test administration technologies such as computerized adaptive testing in high-stakes educational and occupational assessments demands large item pools. Classic item construction processes and previous approaches to automatic item generation faced the problems of a considerable loss of items after the item calibration phase. In this…
Item Purification Does Not Always Improve DIF Detection: A Counterexample with Angoff's Delta Plot
ERIC Educational Resources Information Center
Magis, David; Facon, Bruno
2013-01-01
Item purification is an iterative process that is often advocated as improving the identification of items affected by differential item functioning (DIF). With test-score-based DIF detection methods, item purification iteratively removes the items currently flagged as DIF from the test scores to get purified sets of items, unaffected by DIF. The…
Jia, Lin-Zhi; Ya-Jun, Ma; Cao, Yi; Qian, Fen; Li, Xiang-Yu
2012-04-30
The quality index among "Medical Parasitology" exam papers and measured data for students in three majors from the university in 2010 were compared and analyzed. The exam papers were formed from the test item bank. The alpha reliability coefficients of the three exam papers were above 0.70. The knowledge structure and capacity structure of the exam papers were basically balanced. But the alpha reliability coefficients of the second major was the lowest, mainly due to quality of test items in the exam paper and the failure of revising the index of test item bank in time. This observation demonstrated that revising the test items and their index in the item bank according to the measured data can improve the quality of test item bank proposition and reduce the difference among exam papers.
The Role of Item Models in Automatic Item Generation
ERIC Educational Resources Information Center
Gierl, Mark J.; Lai, Hollis
2012-01-01
Automatic item generation represents a relatively new but rapidly evolving research area where cognitive and psychometric theories are used to produce tests that include items generated using computer technology. Automatic item generation requires two steps. First, test development specialists create item models, which are comparable to templates…
Item Review and the Rearrangement Procedure: Its Process and Its Results
ERIC Educational Resources Information Center
Papanastasiou, Elena C.
2005-01-01
Permitting item review is to the benefit of the examinees who typically increase their test scores with item review. However, testing companies do not prefer item review since it does not follow the logic on which adaptive tests are based, and since it is prone to cheating strategies. Consequently, item review is not permitted in many adaptive…
A Model-Based Method for Content Validation of Automatically Generated Test Items
ERIC Educational Resources Information Center
Zhang, Xinxin; Gierl, Mark
2016-01-01
The purpose of this study is to describe a methodology to recover the item model used to generate multiple-choice test items with a novel graph theory approach. Beginning with the generated test items and working backward to recover the original item model provides a model-based method for validating the content used to automatically generate test…
Testing complex animal cognition: Concept learning, proactive interference, and list memory.
Wright, Anthony A
2018-01-01
This article describes an approach for assessing and comparing complex cognition in rhesus monkeys and pigeons by training them in a sequence of synergistic tasks, each yielding a whole function for enhanced comparisons. These species were trained in similar same/different tasks with expanding training sets (8, 16, 32, 64, 128 … 1024 pictures) followed by novel-stimulus transfer eventually resulting in full abstract-concept learning. Concept-learning functions revealed better rhesus transfer throughout and full concept learning at the 128 set, versus pigeons at the 256 set. They were then tested in delayed same/different tasks for proactive interference by inserting occasional tests within trial-unique sessions where the test stimulus matched a previous sample stimulus (1, 2, 4, 8, 16 trials prior). Proactive-interference functions revealed time-based interference for pigeons (1, 10 s delays), but event-based interference for rhesus (no effect of 1, 10, 20 s delays). They were then tested in list-memory tasks by expanding the sample to four samples in trial-unique sessions (minimizing proactive interference). The four-item, list-memory functions revealed strong recency memory at short delays, gradually changing to strong primacy memory at long delays over 30 s for rhesus, and 10 s for pigeons. Other species comparisons and future directions are discussed. © 2018 Society for the Experimental Analysis of Behavior.
Optimal Bayesian Adaptive Design for Test-Item Calibration.
van der Linden, Wim J; Ren, Hao
2015-06-01
An optimal adaptive design for test-item calibration based on Bayesian optimality criteria is presented. The design adapts the choice of field-test items to the examinees taking an operational adaptive test using both the information in the posterior distributions of their ability parameters and the current posterior distributions of the field-test parameters. Different criteria of optimality based on the two types of posterior distributions are possible. The design can be implemented using an MCMC scheme with alternating stages of sampling from the posterior distributions of the test takers' ability parameters and the parameters of the field-test items while reusing samples from earlier posterior distributions of the other parameters. Results from a simulation study demonstrated the feasibility of the proposed MCMC implementation for operational item calibration. A comparison of performances for different optimality criteria showed faster calibration of substantial numbers of items for the criterion of D-optimality relative to A-optimality, a special case of c-optimality, and random assignment of items to the test takers.
State Assessment Program Item Banks: Model Language for Request for Proposals (RFP) and Contracts
ERIC Educational Resources Information Center
Swanson, Leonard C.
2010-01-01
This document provides recommendations for request for proposal (RFP) and contract language that state education agencies can use to specify their requirements for access to test item banks. An item bank is a repository for test items and data about those items. Item banks are used by state agency staff to view items and associated data; to…
The Impact of Receiving the Same Items on Consecutive Computer Adaptive Test Administrations.
ERIC Educational Resources Information Center
O'Neill, Thomas; Lunz, Mary E.; Thiede, Keith
2000-01-01
Studied item exposure in a computerized adaptive test when the item selection algorithm presents examinees with questions they were asked in a previous test administration. Results with 178 repeat examinees on a medical technologists' test indicate that the combined use of an adaptive algorithm to select items and latent trait theory to estimate…
ERIC Educational Resources Information Center
Saß, Steffani; Schütte, Kerstin
2016-01-01
Solving test items might require abilities in test-takers other than the construct the test was designed to assess. Item and student characteristics such as item format or reading comprehension can impact the test result. This experiment is based on cognitive theories of text and picture comprehension. It examines whether integration aids, which…
Uncertainties in the Item Parameter Estimates and Robust Automated Test Assembly
ERIC Educational Resources Information Center
Veldkamp, Bernard P.; Matteucci, Mariagiulia; de Jong, Martijn G.
2013-01-01
Item response theory parameters have to be estimated, and because of the estimation process, they do have uncertainty in them. In most large-scale testing programs, the parameters are stored in item banks, and automated test assembly algorithms are applied to assemble operational test forms. These algorithms treat item parameters as fixed values,…
Identifying Differential Item Functioning in Multi-Stage Computer Adaptive Testing
ERIC Educational Resources Information Center
Gierl, Mark J.; Lai, Hollis; Li, Johnson
2013-01-01
The purpose of this study is to evaluate the performance of CATSIB (Computer Adaptive Testing-Simultaneous Item Bias Test) for detecting differential item functioning (DIF) when items in the matching and studied subtest are administered adaptively in the context of a realistic multi-stage adaptive test (MST). MST was simulated using a 4-item…
Hold it! Memory affects attentional dwell time.
Parks, Emily L; Hopfinger, Joseph B
2008-12-01
The allocation of attention, including the initial orienting and the subsequent dwell time, is affected by several bottom-up and top-down factors. How item memory affects these processes, however, remains unclear. Here, we investigated whether item memory affects attentional dwell time by using a modified version of the attentional blink (AB) paradigm. Across four experiments, our results revealed that the AB was significantly affected by memory status (novel vs. old), but critically, this effect depended on the ongoing memory context. Specifically, items that were unique in terms of memory status demanded more resources, as measured by a protracted AB. The present findings suggest that a more comprehensive understanding of memory's effects on attention can be obtained by accounting for an item's memorial context, as well as its individual item memory strength. Our results provide new evidence that item memory and memory context play a significant role in the temporal allocation of attention.
A Stepwise Test Characteristic Curve Method to Detect Item Parameter Drift
ERIC Educational Resources Information Center
Guo, Rui; Zheng, Yi; Chang, Hua-Hua
2015-01-01
An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the…
Shen, Linjun; Li, Feiming; Wattleworth, Roberta; Filipetto, Frank
2010-10-01
The Comprehensive Osteopathic Medical Licensing Examination conducted a trial of multimedia items in the 2008-2009 Level 3 testing cycle to determine (1) if multimedia items were able to test additional elements of medical knowledge and skills and (2) how to develop effective multimedia items. Forty-four content-matched multimedia and text multiple-choice items were randomly delivered to Level 3 candidates. Logistic regression and paired-samples t tests were used for pairwise and group-level comparisons, respectively. Nine pairs showed significant differences in either difficulty or/and discrimination. Content analysis found that, if text narrations were less direct, multimedia materials could make items easier. When textbook terminologies were replaced by multimedia presentations, multimedia items could become more difficult. Moreover, a multimedia item was found not uniformly difficult for candidates at different ability levels, possibly because multimedia and text items tested different elements of a same concept. Multimedia items may be capable of measuring some constructs different from what text items can measure. Effective multimedia items with reasonable psychometric properties can be intentionally developed.
Koh, Bongyeun; Hong, Sunggi; Kim, Soon-Sim; Hyun, Jin-Sook; Baek, Milye; Moon, Jundong; Kwon, Hayran; Kim, Gyoungyong; Min, Seonggi; Kang, Gu-Hyun
2016-01-01
The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination.
Item Analysis in Introductory Economics Testing.
ERIC Educational Resources Information Center
Tinari, Frank D.
1979-01-01
Computerized analysis of multiple choice test items is explained. Examples of item analysis applications in the introductory economics course are discussed with respect to three objectives: to evaluate learning; to improve test items; and to help improve classroom instruction. Problems, costs and benefits of the procedures are identified. (JMD)
Children with autism spectrum disorder show reduced adaptation to number
Turi, Marco; Burr, David C.; Igliozzi, Roberta; Aagten-Murphy, David; Muratori, Filippo; Pellicano, Elizabeth
2015-01-01
Autism is known to be associated with major perceptual atypicalities. We have recently proposed a general model to account for these atypicalities in Bayesian terms, suggesting that autistic individuals underuse predictive information or priors. We tested this idea by measuring adaptation to numerosity stimuli in children diagnosed with autism spectrum disorder (ASD). After exposure to large numbers of items, stimuli with fewer items appear to be less numerous (and vice versa). We found that children with ASD adapted much less to numerosity than typically developing children, although their precision for numerosity discrimination was similar to that of the typical group. This result reinforces recent findings showing reduced adaptation to facial identity in ASD and goes on to show that reduced adaptation is not unique to faces (social stimuli with special significance in autism), but occurs more generally, for both parietal and temporal functions, probably reflecting inefficiencies in the adaptive interpretation of sensory signals. These results provide strong support for the Bayesian theories of autism. PMID:26056294
Hybrid context aware recommender systems
NASA Astrophysics Data System (ADS)
Jain, Rajshree; Tyagi, Jaya; Singh, Sandeep Kumar; Alam, Taj
2017-10-01
Recommender systems and context awareness is currently a vital field of research. Most hybrid recommendation systems implement content based and collaborative filtering techniques whereas this work combines context and collaborative filtering. The paper presents a hybrid context aware recommender system for books and movies that gives recommendations based on the user context as well as user or item similarity. It also addresses the issue of dimensionality reduction using weighted pre filtering based on dynamically entered user context and preference of context. This unique step helps to reduce the size of dataset for collaborative filtering. Bias subtracted collaborative filtering is used so as to consider the relative rating of a particular user and not the absolute values. Cosine similarity is used as a metric to determine the similarity between users or items. The unknown ratings are calculated and evaluated using MSE (Mean Squared Error) in test and train datasets. The overall process of recommendation has helped to personalize recommendations and give more accurate results with reduced complexity in collaborative filtering.
NASA Astrophysics Data System (ADS)
Wren, David A.
The research presented in this dissertation culminated in a 10-item Thermochemistry Concept Inventory (TCI). The development of the TCI can be divided into two main phases: qualitative studies and quantitative studies. Both phases focused on the primary stakeholders of the TCI, college-level general chemistry instructors and students. Each phase was designed to collect evidence for the validity of the interpretations and uses of TCI testing data. A central use of TCI testing data is to identify student conceptual misunderstandings, which are represented as incorrect options of multiple-choice TCI items. Therefore, quantitative and qualitative studies focused heavily on collecting evidence at the item-level, where important interpretations may be made by TCI users. Qualitative studies included student interviews (N = 28) and online expert surveys (N = 30). Think-aloud student interviews (N = 12) were used to identify conceptual misunderstandings used by students. Novice response process validity interviews (N = 16) helped provide information on how students interpreted and answered TCI items and were the basis of item revisions. Practicing general chemistry instructors (N = 18), or experts, defined boundaries of thermochemistry content included on the TCI. Once TCI items were in the later stages of development, an online version of the TCI was used in expert response process validity survey (N = 12), to provide expert feedback on item content, format and consensus of the correct answer for each item. Quantitative studies included three phases: beta testing of TCI items (N = 280), pilot testing of the a 12-item TCI (N = 485), and a large data collection using a 10-item TCI ( N = 1331). In addition to traditional classical test theory analysis, Rasch model analysis was also used for evaluation of testing data at the test and item level. The TCI was administered in both formative assessment (beta and pilot testing) and summative assessment (large data collection), with items performing well in both. One item, item K, did not have acceptable psychometric properties when the TCI was used as a quiz (summative assessment), but was retained in the final version of the TCI based on the acceptable psychometric properties displayed in pilot testing (formative assessment).
ERIC Educational Resources Information Center
Li, Yanmei
2012-01-01
In a common-item (anchor) equating design, the common items should be evaluated for item parameter drift. Drifted items are often removed. For a test that contains mostly dichotomous items and only a small number of polytomous items, removing some drifted polytomous anchor items may result in anchor sets that no longer resemble mini-versions of…
Sinharay, Sandip
2017-09-01
Benefiting from item preknowledge is a major type of fraudulent behavior during educational assessments. Belov suggested the posterior shift statistic for detection of item preknowledge and showed its performance to be better on average than that of seven other statistics for detection of item preknowledge for a known set of compromised items. Sinharay suggested a statistic based on the likelihood ratio test for detection of item preknowledge; the advantage of the statistic is that its null distribution is known. Results from simulated and real data and adaptive and nonadaptive tests are used to demonstrate that the Type I error rate and power of the statistic based on the likelihood ratio test are very similar to those of the posterior shift statistic. Thus, the statistic based on the likelihood ratio test appears promising in detecting item preknowledge when the set of compromised items is known.
ERIC Educational Resources Information Center
McLeod, Lori D.; Lewis, Charles; Thissen, David.
With the increased use of computerized adaptive testing, which allows for continuous testing, new concerns about test security have evolved, one being the assurance that items in an item pool are safeguarded from theft. In this paper, the risk of score inflation and procedures to detect test takers using item preknowledge are explored. When test…
Effect of Multiple Testing Adjustment in Differential Item Functioning Detection
ERIC Educational Resources Information Center
Kim, Jihye; Oshima, T. C.
2013-01-01
In a typical differential item functioning (DIF) analysis, a significance test is conducted for each item. As a test consists of multiple items, such multiple testing may increase the possibility of making a Type I error at least once. The goal of this study was to investigate how to control a Type I error rate and power using adjustment…
Item Response Theory Models for Performance Decline during Testing
ERIC Educational Resources Information Center
Jin, Kuan-Yu; Wang, Wen-Chung
2014-01-01
Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…
Differential item functioning analysis of the Vanderbilt Expertise Test for cars.
Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W; Van Gulick, Ana Beth; Gauthier, Isabel
2015-01-01
The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge.
Samejima Items in Multiple-Choice Tests: Identification and Implications
ERIC Educational Resources Information Center
Rahman, Nazia
2013-01-01
Samejima hypothesized that non-monotonically increasing item response functions (IRFs) of ability might occur for multiple-choice items (referred to here as "Samejima items") if low ability test takers with some, though incomplete, knowledge or skill are drawn to a particularly attractive distractor, while very low ability test takers…
Computerized Numerical Control Test Item Bank.
ERIC Educational Resources Information Center
Reneau, Fred; And Others
This guide contains 285 test items for use in teaching a course in computerized numerical control. All test items were reviewed, revised, and validated by incumbent workers and subject matter instructors. Items are provided for assessing student achievement in such aspects of programming and planning, setting up, and operating machines with…
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating
ERIC Educational Resources Information Center
He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei
2013-01-01
Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
ERIC Educational Resources Information Center
He, Yong
2013-01-01
Common test items play an important role in equating multiple test forms under the common-item nonequivalent groups design. Inconsistent item parameter estimates among common items can lead to large bias in equated scores for IRT true score equating. Current methods extensively focus on detection and elimination of outlying common items, which…
ERIC Educational Resources Information Center
Scheuneman, Janice Dowd; Gerritz, Kalle
1990-01-01
Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)
Item Structural Properties as Predictors of Item Difficulty and Item Association.
ERIC Educational Resources Information Center
Solano-Flores, Guillermo
1993-01-01
Studied the ability of logical test design (LTD) to predict student performance in reading Roman numerals for 211 sixth graders in Mexico City tested on Roman numeral items varying on LTD-related and non-LTD-related variables. The LTD-related variable item iterativity was found to be the best predictor of item difficulty. (SLD)
Investigating Item Exposure Control Methods in Computerized Adaptive Testing
ERIC Educational Resources Information Center
Ozturk, Nagihan Boztunc; Dogan, Nuri
2015-01-01
This study aims to investigate the effects of item exposure control methods on measurement precision and on test security under various item selection methods and item pool characteristics. In this study, the Randomesque (with item group sizes of 5 and 10), Sympson-Hetter, and Fade-Away methods were used as item exposure control methods. Moreover,…
ERIC Educational Resources Information Center
Lee, Woo-yeol; Cho, Sun-Joo
2017-01-01
Cross-level invariance in a multilevel item response model can be investigated by testing whether the within-level item discriminations are equal to the between-level item discriminations. Testing the cross-level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model…
Item Pool Design for an Operational Variable-Length Computerized Adaptive Test
ERIC Educational Resources Information Center
He, Wei; Reckase, Mark D.
2014-01-01
For computerized adaptive tests (CATs) to work well, they must have an item pool with sufficient numbers of good quality items. Many researchers have pointed out that, in developing item pools for CATs, not only is the item pool size important but also the distribution of item parameters and practical considerations such as content distribution…
ERIC Educational Resources Information Center
Yoon, Su-Youn; Lee, Chong Min; Houghton, Patrick; Lopez, Melissa; Sakano, Jennifer; Loukina, Anastasia; Krovetz, Bob; Lu, Chi; Madani, Nitin
2017-01-01
In this study, we developed assistive tools and resources to support TOEIC® Listening test item generation. There has recently been an increased need for a large pool of items for these tests. This need has, in turn, inspired efforts to increase the efficiency of item generation while maintaining the quality of the created items. We aimed to…
ERIC Educational Resources Information Center
Nissan, Susan; And Others
One of the item types in the Listening Comprehension section of the Test of English as a Foreign Language (TOEFL) test is the dialogue. Because the dialogue item pool needs to have an appropriate balance of items at a range of difficulty levels, test developers have examined items at various difficulty levels in an attempt to identify their…
Park, In Sook; Suh, Yeon Ok; Park, Hae Sook; Kang, So Young; Kim, Kwang Sung; Kim, Gyung Hee; Choi, Yeon-Hee; Kim, Hyun-Ju
2017-01-01
The purpose of this study was to improve the quality of items on the Korean Nursing Licensing Examination by developing and evaluating case-based items that reflect integrated nursing knowledge. We conducted a cross-sectional observational study to develop new case-based items. The methods for developing test items included expert workshops, brainstorming, and verification of content validity. After a mock examination of undergraduate nursing students using the newly developed case-based items, we evaluated the appropriateness of the items through classical test theory and item response theory. A total of 50 case-based items were developed for the mock examination, and content validity was evaluated. The question items integrated 34 discrete elements of integrated nursing knowledge. The mock examination was taken by 741 baccalaureate students in their fourth year of study at 13 universities. Their average score on the mock examination was 57.4, and the examination showed a reliability of 0.40. According to classical test theory, the average level of item difficulty of the items was 57.4% (80%-100% for 12 items; 60%-80% for 13 items; and less than 60% for 25 items). The mean discrimination index was 0.19, and was above 0.30 for 11 items and 0.20 to 0.29 for 15 items. According to item response theory, the item discrimination parameter (in the logistic model) was none for 10 items (0.00), very low for 20 items (0.01 to 0.34), low for 12 items (0.35 to 0.64), moderate for 6 items (0.65 to 1.34), high for 1 item (1.35 to 1.69), and very high for 1 item (above 1.70). The item difficulty was very easy for 24 items (below -2.0), easy for 8 items (-2.0 to -0.5), medium for 6 items (-0.5 to 0.5), hard for 3 items (0.5 to 2.0), and very hard for 9 items (2.0 or above). The goodness-of-fit test in terms of the 2-parameter item response model between the range of 2.0 to 0.5 revealed that 12 items had an ideal correct answer rate. We surmised that the low reliability of the mock examination was influenced by the timing of the test for the examinees and the inappropriate difficulty of the items. Our study suggested a methodology for the development of future case-based items for the Korean Nursing Licensing Examination.
The beneficial effect of testing: an event-related potential study
Bai, Cheng-Hua; Bridger, Emma K.; Zimmer, Hubert D.; Mecklinger, Axel
2015-01-01
The enhanced memory performance for items that are tested as compared to being restudied (the testing effect) is a frequently reported memory phenomenon. According to the episodic context account of the testing effect, this beneficial effect of testing is related to a process which reinstates the previously learnt episodic information. Few studies have explored the neural correlates of this effect at the time point when testing takes place, however. In this study, we utilized the ERP correlates of successful memory encoding to address this issue, hypothesizing that if the benefit of testing is due to retrieval-related processes at test then subsequent memory effects (SMEs) should resemble the ERP correlates of retrieval-based processing in their temporal and spatial characteristics. Participants were asked to learn Swahili-German word pairs before items were presented in either a testing or a restudy condition. Memory performance was assessed immediately and 1-day later with a cued recall task. Successfully recalling items at test increased the likelihood that items were remembered over time compared to items which were only restudied. An ERP subsequent memory contrast (later remembered vs. later forgotten tested items), which reflects the engagement of processes that ensure items are recallable the next day were topographically comparable with the ERP correlate of immediate recollection (immediately remembered vs. immediately forgotten tested items). This result shows that the processes which allow items to be more memorable over time share qualitatively similar neural correlates with the processes that relate to successful retrieval at test. This finding supports the notion that testing is more beneficial than restudying on memory performance over time because of its engagement of retrieval processes, such as the re-encoding of actively retrieved memory representations. PMID:26441577
The development of a science process assessment for fourth-grade students
NASA Astrophysics Data System (ADS)
Smith, Kathleen A.; Welliver, Paul W.
In this study, a multiple-choice test entitled the Science Process Assessment was developed to measure the science process skills of students in grade four. Based on the Recommended Science Competency Continuum for Grades K to 6 for Pennsylvania Schools, this instrument measured the skills of (1) observing, (2) classifying, (3) inferring, (4) predicting, (5) measuring, (6) communicating, (7) using space/time relations, (8) defining operationally, (9) formulating hypotheses, (10) experimenting, (11) recognizing variables, (12) interpreting data, and (13) formulating models. To prepare the instrument, classroom teachers and science educators were invited to participate in two science education workshops designed to develop an item bank of test questions applicable to measuring process skill learning. Participants formed writing teams and generated 65 test items representing the 13 process skills. After a comprehensive group critique of each item, 61 items were identified for inclusion into the Science Process Assessment item bank. To establish content validity, the item bank was submitted to a select panel of science educators for the purpose of judging item acceptability. This analysis yielded 55 acceptable test items and produced the Science Process Assessment, Pilot 1. Pilot 1 was administered to 184 fourth-grade students. Students were given a copy of the test booklet; teachers read each test aloud to the students. Upon completion of this first administration, data from the item analysis yielded a reliability coefficient of 0.73. Subsequently, 40 test items were identified for the Science Process Assessment, Pilot 2. Using the test-retest method, the Science Process Assessment, Pilot 2 (Test 1 and Test 2) was administered to 113 fourth-grade students. Reliability coefficients of 0.80 and 0.82, respectively, were ascertained. The correlation between Test 1 and Test 2 was 0.77. The results of this study indicate that (1) the Science Process Assessment, Pilot 2, is a valid and reliable instrument applicable to measuring the science process skills of students in grade four, (2) using educational workshops as a means of developing item banks of test questions is viable and productive in the test development process, and (3) involving classroom teachers and science educators in the test development process is educationally efficient and effective.
Impaired consciousness in partial seizures is bimodally distributed
Cunningham, Courtney; Chen, William C.; Shorten, Andrew; McClurkin, Michael; Choezom, Tenzin; Schmidt, Christian P.; Chu, Victoria; Bozik, Anne; Best, Cameron; Chapman, Melissa; Furman, Moran; Detyniecki, Kamil; Giacino, Joseph T.
2014-01-01
Objective: To investigate whether impaired consciousness in partial seizures can usually be attributed to specific deficits in the content of consciousness or to a more general decrease in the overall level of consciousness. Methods: Prospective testing during partial seizures was performed in patients with epilepsy using the Responsiveness in Epilepsy Scale (n = 83 partial seizures, 30 patients). Results were compared with responsiveness scores in a cohort of patients with severe traumatic brain injury evaluated with the JFK Coma Recovery Scale–Revised (n = 552 test administrations, 184 patients). Results: Standardized testing during partial seizures reveals a bimodal scoring distribution, such that most patients were either fully impaired or relatively spared in their ability to respond on multiple cognitive tests. Seizures with impaired performance on initial test items remained consistently impaired on subsequent items, while other seizures showed spared performance throughout. In the comparison group, we found that scores of patients with brain injury were more evenly distributed across the full range in severity of impairment. Conclusions: Partial seizures can often be cleanly separated into those with vs without overall impaired responsiveness. Results from similar testing in a comparison group of patients with brain injury suggest that the bimodal nature of Responsiveness in Epilepsy Scale scores is not a result of scale bias but may be a finding unique to partial seizures. These findings support a model in which seizures either propagate or do not propagate to key structures that regulate overall arousal and thalamocortical function. Future investigations are needed to relate these behavioral findings to the physiology underlying impaired consciousness in partial seizures. PMID:24727311
Impaired consciousness in partial seizures is bimodally distributed.
Cunningham, Courtney; Chen, William C; Shorten, Andrew; McClurkin, Michael; Choezom, Tenzin; Schmidt, Christian P; Chu, Victoria; Bozik, Anne; Best, Cameron; Chapman, Melissa; Furman, Moran; Detyniecki, Kamil; Giacino, Joseph T; Blumenfeld, Hal
2014-05-13
To investigate whether impaired consciousness in partial seizures can usually be attributed to specific deficits in the content of consciousness or to a more general decrease in the overall level of consciousness. Prospective testing during partial seizures was performed in patients with epilepsy using the Responsiveness in Epilepsy Scale (n = 83 partial seizures, 30 patients). Results were compared with responsiveness scores in a cohort of patients with severe traumatic brain injury evaluated with the JFK Coma Recovery Scale-Revised (n = 552 test administrations, 184 patients). Standardized testing during partial seizures reveals a bimodal scoring distribution, such that most patients were either fully impaired or relatively spared in their ability to respond on multiple cognitive tests. Seizures with impaired performance on initial test items remained consistently impaired on subsequent items, while other seizures showed spared performance throughout. In the comparison group, we found that scores of patients with brain injury were more evenly distributed across the full range in severity of impairment. Partial seizures can often be cleanly separated into those with vs without overall impaired responsiveness. Results from similar testing in a comparison group of patients with brain injury suggest that the bimodal nature of Responsiveness in Epilepsy Scale scores is not a result of scale bias but may be a finding unique to partial seizures. These findings support a model in which seizures either propagate or do not propagate to key structures that regulate overall arousal and thalamocortical function. Future investigations are needed to relate these behavioral findings to the physiology underlying impaired consciousness in partial seizures.
Michaelides, Michalis P.
2010-01-01
Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items. PMID:21833230
Michaelides, Michalis P
2010-01-01
Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items.
Raykov, Tenko; Marcoulides, George A
2016-04-01
The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete nature of the observed items. Two distinct observational equivalence approaches are outlined that render the item response models from corresponding classical test theory-based models, and can each be used to obtain the former from the latter models. Similarly, classical test theory models can be furnished using the reverse application of either of those approaches from corresponding item response models.
Locally Dependent Linear Logistic Test Model with Person Covariates
ERIC Educational Resources Information Center
Ip, Edward H.; Smits, Dirk J. M.; De Boeck, Paul
2009-01-01
The article proposes a family of item-response models that allow the separate and independent specification of three orthogonal components: item attribute, person covariate, and local item dependence. Special interest lies in extending the linear logistic test model, which is commonly used to measure item attributes, to tests with embedded item…
Applying Bayesian Item Selection Approaches to Adaptive Tests Using Polytomous Items
ERIC Educational Resources Information Center
Penfield, Randall D.
2006-01-01
This study applied the maximum expected information (MEI) and the maximum posterior-weighted information (MPI) approaches of computer adaptive testing item selection to the case of a test using polytomous items following the partial credit model. The MEI and MPI approaches are described. A simulation study compared the efficiency of ability…
Do Reading Experts Agree with MCAT Verbal Reasoning Item Classifications?
ERIC Educational Resources Information Center
Jackson, Evelyn W.; And Others
1994-01-01
Examined whether expert raters (n=5) could agree about classification of Medical College Admission Test (MCAT) items and whether they agreed with MCAT student manual in labeling skill being measured by each test item. Results revealed difficulties in replicating authors' labeling of skills for reading items on practice test provided with 1991 MCAT…
ACER Chemistry Test Item Collection (ACER CHEMTIC Year 12 Supplement).
ERIC Educational Resources Information Center
Australian Council for Educational Research, Hawthorn.
This publication contains 317 multiple-choice chemistry test items related to topics covered in the Victorian (Australia) Year 12 chemistry course. It allows teachers access to a range of items suitable for diagnostic and achievement purposes, supplementing the ACER Chemistry Test Item Collection--Year 12 (CHEMTIC). The topics covered are: organic…
Differential Item Functioning: Its Consequences. Research Report. ETS RR-10-01
ERIC Educational Resources Information Center
Lee, Yi-Hsuan; Zhang, Jinming
2010-01-01
This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…
Electronics. Criterion-Referenced Test (CRT) Item Bank.
ERIC Educational Resources Information Center
Davis, Diane, Ed.
This document contains 519 criterion-referenced multiple choice and true or false test items for a course in electronics. The test item bank is designed to work with both the Vocational Instructional Management System (VIMS) and the Vocational Administrative Management System (VAMS) in Missouri. The items are grouped into 15 units covering the…
Auto Mechanics. Criterion-Referenced Test (CRT) Item Bank.
ERIC Educational Resources Information Center
Tannehill, Dana, Ed.
This document contains 546 criterion-referenced multiple choice and true or false test items for a course in auto mechanics. The test item bank is designed to work with both the Vocational Instructional Management System (VIMS) and Vocational Administrative Management System (VAMS) in Missouri. The items are grouped into 35 units covering the…
Developing a Strategy for Using Technology-Enhanced Items in Large-Scale Standardized Tests
ERIC Educational Resources Information Center
Bryant, William
2017-01-01
As large-scale standardized tests move from paper-based to computer-based delivery, opportunities arise for test developers to make use of items beyond traditional selected and constructed response types. Technology-enhanced items (TEIs) have the potential to provide advantages over conventional items, including broadening construct measurement,…
Evaluating brain-computer interface performance using color in the P300 checkerboard speller.
Ryan, D B; Townsend, G; Gates, N A; Colwell, K; Sellers, E W
2017-10-01
Current Brain-Computer Interface (BCI) systems typically flash an array of items from grey to white (GW). The objective of this study was to evaluate BCI performance using uniquely colored stimuli. In addition to the GW stimuli, the current study tested two types of color stimuli (grey to color [GC] and color intensification [CI]). The main hypotheses were that in a checkboard paradigm, unique color stimuli will: (1) increase BCI performance over the standard GW paradigm; (2) elicit larger event-related potentials (ERPs); and, (3) improve offline performance with an electrode selection algorithm (i.e., Jumpwise). Online results (n=36) showed that GC provides higher accuracy and information transfer rate than the CI and GW conditions. Waveform analysis showed that GC produced higher amplitude ERPs than CI and GW. Information transfer rate was improved by the Jumpwise-selected channel locations in all conditions. Unique color stimuli (GC) improved BCI performance and enhanced ERPs. Jumpwise-selected electrode locations improved offline performance. These results show that in a checkerboard paradigm, unique color stimuli increase BCI performance, are preferred by participants, and are important to the design of end-user applications; thus, could lead to an increase in end-user performance and acceptance of BCI technology. Copyright © 2017 International Federation of Clinical Neurophysiology. All rights reserved.
2016-01-01
Purpose: The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. Methods: The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. Results: In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). Conclusion: In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination. PMID:26883810
Doig, Emmah; Prescott, Sarah; Fleming, Jennifer; Cornwell, Petrea; Kuipers, Pim
2016-01-01
To examine the internal reliability and test-retest reliability of the Client-Centeredness of Goal Setting (C-COGS) scale. The C-COGS scale was administered to 42 participants with acquired brain injury after completion of multidisciplinary goal planning. Internal reliability of scale items was examined using item-partial total correlations and Cronbach's α coefficient. The scale was readministered within a 1-mo period to a subsample of 12 participants to examine test-retest reliability by calculating exact and close percentage agreement for each item. After examination of item-partial total correlations, test items were revised. The revised items demonstrated stronger internal consistency than the original items. Preliminary evaluation of test-retest reliability was fair, with an average exact percent agreement across all test items of 67%. Findings support the preliminary reliability of the C-COGS scale as a tool to evaluate and promote client-centered goal planning in brain injury rehabilitation. Copyright © 2016 by the American Occupational Therapy Association, Inc.
Item-Writing Guidelines for Physics
ERIC Educational Resources Information Center
Regan, Tom
2015-01-01
A teacher learning how to write test questions (test items) will almost certainly encounter item-writing guidelines--lists of item-writing do's and don'ts. Item-writing guidelines usually are presented as applicable across all assessment settings. Table I shows some guidelines that I believe to be generally applicable and two will be briefly…
Unidimensional Interpretations for Multidimensional Test Items
ERIC Educational Resources Information Center
Kahraman, Nilufer
2013-01-01
This article considers potential problems that can arise in estimating a unidimensional item response theory (IRT) model when some test items are multidimensional (i.e., show a complex factorial structure). More specifically, this study examines (1) the consequences of model misfit on IRT item parameter estimates due to unintended minor item-level…
Kisala, Pamela A.; Victorson, David; Pace, Natalie; Heinemann, Allen W.; Choi, Seung W.; Tulsky, David S.
2015-01-01
Objective To describe the development and psychometric properties of the SCI-QOL Psychological Trauma item bank and short form. Design Using a mixed-methods design, we developed and tested a Psychological Trauma item bank with patient and provider focus groups, cognitive interviews, and item response theory based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. Setting We tested a 31-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Veterans Administration hospital. Participants A total of 716 individuals with SCI completed the trauma items Results The 31 items fit a unidimensional model (CFI=0.952; RMSEA=0.061) and demonstrated good precision (theta range between 0.6 and 2.5). Nine items demonstrated negligible DIF with little impact on score estimates. The final calibrated item bank contains 19 items Conclusion The SCI-QOL Psychological Trauma item bank is a psychometrically robust measurement tool from which a short form and a computer adaptive test (CAT) version are available. PMID:26010967
Vaughn, Kalif E; Rawson, Katherine A; Pyc, Mary A
2013-12-01
A wealth of previous research has established that retrieval practice promotes memory, particularly when retrieval is successful. Although successful retrieval promotes memory, it remains unclear whether successful retrieval promotes memory equally well for items of varying difficulty. Will easy items still outperform difficult items on a final test if all items have been correctly recalled equal numbers of times during practice? In two experiments, normatively difficult and easy Lithuanian-English word pairs were learned via test-restudy practice until each item had been correctly recalled a preassigned number of times (from 1 to 11 correct recalls). Despite equating the numbers of successful recalls during practice, performance on a delayed final cued-recall test was lower for difficult than for easy items. Experiment 2 was designed to diagnose whether the disadvantage for difficult items was due to deficits in cue memory, target memory, and/or associative memory. The results revealed a disadvantage for the difficult versus the easy items only on the associative recognition test, with no differences on cue recognition, and even an advantage on target recognition. Although successful retrieval enhanced memory for both difficult and easy items, equating retrieval success during practice did not eliminate normative item difficulty differences.
Test Bias: An Objective Definition for Test Items.
ERIC Educational Resources Information Center
Durovic, Jerry J.
A test bias definition, applicable at the item-level of a test is presented. The definition conceptually equates test bias with measuring different things in different groups, and operationally equates test bias with a difference in item fit to the Rasch Model, greater than one, between groups. It is suggested that the proposed definition avoids…
2013-01-01
Background Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. Methods The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Results Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. Conclusions The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information. PMID:23453056
Zoanetti, Nathan; Beaves, Mark; Griffin, Patrick; Wallace, Euan M
2013-03-04
Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information.
Code of Federal Regulations, 2014 CFR
2014-10-01
... research, review existing product literature generally available in the industry to determine its adequacy... literature from offerors of commercial items in lieu of unique technical proposals. (b) Contracting officers...
Code of Federal Regulations, 2013 CFR
2013-10-01
... research, review existing product literature generally available in the industry to determine its adequacy... literature from offerors of commercial items in lieu of unique technical proposals. (b) Contracting officers...
Code of Federal Regulations, 2012 CFR
2012-10-01
... research, review existing product literature generally available in the industry to determine its adequacy... literature from offerors of commercial items in lieu of unique technical proposals. (b) Contracting officers...
Detecting Gender Bias Through Test Item Analysis
NASA Astrophysics Data System (ADS)
González-Espada, Wilson J.
2009-03-01
Many physical science and physics instructors might not be trained in pedagogically appropriate test construction methods. This could lead to test items that do not measure what they are intended to measure. A subgroup of these items might show bias against some groups of students. This paper describes how the author became aware of potentially biased items against females in his examinations, which led to the exploration of fundamental issues related to item validity, gender bias, and differential item functioning, or DIF. A brief discussion of DIF in the context of university courses, as well as practical suggestions to detect possible gender-biased items, follows.
2014-12-01
chemical etching EDM electrical discharge machine EID enterprise identifier EOSS Engineering Operational Sequencing System F Fahrenheit...Center in Corona , California, released a DoN IUID Marking Guide, which made recommendations on how to mark legacy items. It provides technical...uploaded into the IUID registry managed by the Naval Surface Warfare Center (NSWC) in Corona , California. There is no set amount of information
Negative outcomes evoke cyclic irrational decisions in Rock, Paper, Scissors.
Dyson, Benjamin James; Wilbiks, Jonathan Michael Paul; Sandhu, Raj; Papanicolaou, Georgios; Lintag, Jaimie
2016-02-04
Rock, Paper, Scissors (RPS) represents a unique gaming space in which the predictions of human rational decision-making can be compared with actual performance. Playing a computerized opponent adopting a mixed-strategy equilibrium, participants revealed a non-significant tendency to over-select Rock. Further violations of rational decision-making were observed using an inter-trial analysis where participants were more likely to switch their item selection at trial n + 1 following a loss or draw at trial n, revealing the strategic vulnerability of individuals following the experience of negative rather than positive outcome. Unique switch strategies related to each of these trial n outcomes were also identified: after losing participants were more likely to 'downgrade' their item (e.g., Rock followed by Scissors) but after drawing participants were more likely to 'upgrade' their item (e.g., Rock followed by Paper). Further repetition analysis revealed that participants were more likely to continue their specific cyclic item change strategy into trial n + 2. The data reveal the strategic vulnerability of individuals following the experience of negative rather than positive outcome, the tensions between behavioural and cognitive influences on decision making, and underline the dangers of increased behavioural predictability in other recursive, non-cooperative environments such as economics and politics.
Estimating Total-test Scores from Partial Scores in a Matrix Sampling Design.
ERIC Educational Resources Information Center
Sachar, Jane; Suppes, Patrick
It is sometimes desirable to obtain an estimated total-test score for an individual who was administered only a subset of the items in a total test. The present study compared six methods, two of which utilize the content structure of items, to estimate total-test scores using 450 students in grades 3-5 and 60 items of the ll0-item Stanford Mental…
Differential item functioning analysis of the Vanderbilt Expertise Test for cars
Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W.; Van Gulick, Ana Beth; Gauthier, Isabel
2015-01-01
The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge. PMID:26418499
Olsen, Lise L; Ishikawa, Takuro; Mâsse, Louise C; Chan, Grace; Brussoni, Mariana
2018-04-01
Fathers play a unique role in keeping children safe from injury yet understanding of their views and attitudes towards protecting children from injury and allowing them to engage in risks is limited. The purpose of this study was to develop and validate an instrument to measure fathers' attitudes towards these two constructs. An instrument was developed that used prior qualitative research to inform item generation. The questions were assessed for content validity with experts, then pilot-tested with fathers. The survey was completed by 302 fathers attending hospital with their child for an injury or non-injury reason. Results of confirmatory factor analysis identified eight items relating to the protection from injury factor and six items relating to the risk engagement factor. Correlation between the two factors was low, suggesting these are two independent constructs. The Risk Engagement and Protection Survey offers a tool for measuring attitudes and assisting with intervention strategy development in ways that reflect fathers' views and promotes a balanced view of children's needs for safety with their needs for engaging in active, healthy risk-taking. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Sargeant, J M; O'Connor, A M; Dohoo, I R; Erb, H N; Cevallos, M; Egger, M; Ersbøll, A K; Martin, S W; Nielsen, L R; Pearl, D L; Pfeiffer, D U; Sanchez, J; Torrence, M E; Vigre, H; Waldner, C; Ward, M P
2016-11-01
Reporting of observational studies in veterinary research presents challenges that often are not addressed in published reporting guidelines. To develop an extension of the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) statement that addresses unique reporting requirements for observational studies in veterinary medicine related to health, production, welfare, and food safety. Consensus meeting of experts. Mississauga, Canada. Seventeen experts from North America, Europe, and Australia. Experts completed a pre-meeting survey about whether items in the STROBE statement should be modified or added to address unique issues related to observational studies in animal species with health, production, welfare, or food safety outcomes. During the meeting, each STROBE item was discussed to determine whether or not rewording was recommended and whether additions were warranted. Anonymous voting was used to determine consensus. Six items required no modifications or additions. Modifications or additions were made to the STROBE items 1 (title and abstract), 3 (objectives), 5 (setting), 6 (participants), 7 (variables), 8 (data sources/measurement), 9 (bias), 10 (study size), 12 (statistical methods), 13 (participants), 14 (descriptive data), 15 (outcome data), 16 (main results), 17 (other analyses), 19 (limitations), and 22 (funding). The methods and processes used were similar to those used for other extensions of the STROBE statement. The use of this STROBE statement extension should improve reporting of observational studies in veterinary research by recognizing unique features of observational studies involving food-producing and companion animals, products of animal origin, aquaculture, and wildlife. Copyright © 2016 The Authors. Journal of Veterinary Internal Medicine published by Wiley Periodicals, Inc. on behalf of the American College of Veterinary Internal Medicine.
The Validity of the Teacher Burnout Scale for Use with Special Education Teachers
ERIC Educational Resources Information Center
Cook, Bradley Caro
2012-01-01
Unique stressors can cause special education teachers to experience burnout at twice the rate of their peers in general education. The purpose of this study was to determine if the Teacher Burnout Scale (TBS) is able to accurately predict burnout in special education teachers even though it does not include items that reflect the unique factors…
ERIC Educational Resources Information Center
Gattamorta, Karina A.; Penfield, Randall D.; Myers, Nicholas D.
2012-01-01
Measurement invariance is a common consideration in the evaluation of the validity and fairness of test scores when the tested population contains distinct groups of examinees, such as examinees receiving different forms of a translated test. Measurement invariance in polytomous items has traditionally been evaluated at the item-level,…
Science Library of Test Items. Volume Two.
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
The second volume of test items in the Science Library of Test Items is intended as a resource to assist teachers in implementing and evaluating science courses in the first 4 years of Australian secondary school. The items were selected from questions submitted to the School Certificate Development Unit by teachers in New South Wales. Only the…
Measuring the Instructional Sensitivity of ESL Reading Comprehension Items.
ERIC Educational Resources Information Center
Brutten, Sheila R.; And Others
A study attempted to estimate the instructional sensitivity of items in three reading comprehension tests in English as a second language (ESL). Instructional sensitivity is a test-item construct defined as the tendency for a test item to vary in difficulty as a function of instruction. Similar tasks were given to readers at different proficiency…
Reducing the Impact of Inappropriate Items on Reviewable Computerized Adaptive Testing
ERIC Educational Resources Information Center
Yen, Yung-Chin; Ho, Rong-Guey; Liao, Wen-Wei; Chen, Li-Ju
2012-01-01
In a test, the testing score would be closer to examinee's actual ability when careless mistakes were corrected. In CAT, however, changing the answer of one item in CAT might cause the following items no longer appropriate for estimating the examinee's ability. These inappropriate items in a reviewable CAT might in turn introduce bias in ability…
ERIC Educational Resources Information Center
Lau, C. Allen; Wang, Tianyou
The purposes of this study were to: (1) extend the sequential probability ratio testing (SPRT) procedure to polytomous item response theory (IRT) models in computerized classification testing (CCT); (2) compare polytomous items with dichotomous items using the SPRT procedure for their accuracy and efficiency; (3) study a direct approach in…
A Conditional Exposure Control Method for Multidimensional Adaptive Testing
ERIC Educational Resources Information Center
Finkelman, Matthew; Nering, Michael L.; Roussos, Louis A.
2009-01-01
In computerized adaptive testing (CAT), ensuring the security of test items is a crucial practical consideration. A common approach to reducing item theft is to define maximum item exposure rates, i.e., to limit the proportion of examinees to whom a given item can be administered. Numerous methods for controlling exposure rates have been proposed…
ERIC Educational Resources Information Center
Downing, Steven M.; Maatsch, Jack L.
To test the effect of clinically relevant multiple-choice item content on the validity of statistical discriminations of physicians' clinical competence, data were collected from a field test of the Emergency Medicine Examination, test items for the certification of specialists in emergency medicine. Two 91-item multiple-choice subscales were…
The Effect of Including or Excluding Students with Testing Accommodations on IRT Calibrations.
ERIC Educational Resources Information Center
Karkee, Thakur; Lewis, Dan M.; Barton, Karen; Haug, Carolyn
This study aimed to determine the degree to which the inclusion of accommodated students with disabilities in the calibration sample affects the characteristics of item parameters and the test results. Investigated were effects on test reliability, item fit to the applicable item response theory (IRT) model, item parameter estimates, and students'…
Three controversies over item disclosure in medical licensure examinations.
Park, Yoon Soo; Yang, Eunbae B
2015-01-01
In response to views on public's right to know, there is growing attention to item disclosure - release of items, answer keys, and performance data to the public - in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations - 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure - by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers' right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration.
Online Calibration of Polytomous Items Under the Generalized Partial Credit Model
Zheng, Yi
2016-01-01
Online calibration is a technology-enhanced architecture for item calibration in computerized adaptive tests (CATs). Many CATs are administered continuously over a long term and rely on large item banks. To ensure test validity, these item banks need to be frequently replenished with new items, and these new items need to be pretested before being used operationally. Online calibration dynamically embeds pretest items in operational tests and calibrates their parameters as response data are gradually obtained through the continuous test administration. This study extends existing formulas, procedures, and algorithms for dichotomous item response theory models to the generalized partial credit model, a popular model for items scored in more than two categories. A simulation study was conducted to investigate the developed algorithms and procedures under a variety of conditions, including two estimation algorithms, three pretest item selection methods, three seeding locations, two numbers of score categories, and three calibration sample sizes. Results demonstrated acceptable estimation accuracy of the two estimation algorithms in some of the simulated conditions. A variety of findings were also revealed for the interacted effects of included factors, and recommendations were made respectively. PMID:29881063
Evaluating Statistical Targets for Assembling Parallel Mixed-Format Test Forms
ERIC Educational Resources Information Center
Debeer, Dries; Ali, Usama S.; van Rijn, Peter W.
2017-01-01
Test assembly is the process of selecting items from an item pool to form one or more new test forms. Often new test forms are constructed to be parallel with an existing (or an ideal) test. Within the context of item response theory, the test information function (TIF) or the test characteristic curve (TCC) are commonly used as statistical…
Find the Hidden Object. Understanding Play in Psychological Assessments
Fasulo, Alessandra; Shukla, Janhavi; Bennett, Stephanie
2017-01-01
Standardized psychological assessments are extensively used by practitioners to determine rate and level of development in different domains of ability in both typical and atypical children. The younger the children, the more likely the trials will resemble play activities. However, mode of administration, timing and use of objects involved are constrained. The purpose of this study is to explore what kind of play is play in psychological assessments, what are the expectations about children's performance and what are the abilities supporting the test activities. Conversation Analysis (CA) was applied to the videorecording of an interaction between a child and a practitioner during the administration of the Bayley Scale of Infant and Toddler Development, III edition. The analysis focuses on a 2′07″ long sequence relative to the administration of the test item “Find the hidden object” to a 23 months old child with Down syndrome. The analysis of the sequence shows that the assessor promotes the child's engagement by couching the actions required to administer the item in utterances with marked child-directed features. The analysis also shows that the objects constituting the test item did not suggest to the child a unique course of action, leading to the assessor's modeling of the successful sequence. We argue that when a play frame is activated by an interactional partner, the relational aspect of the activity is foregrounded and the co-player becomes a source of cues for ways in which playing can develop. We discuss the assessment interaction as orienting the child toward a right-or-wrong interpretation, leaving the realm of play, which is inherently exploratory and inventive, to enter that of instructional activities. Finally, we argue that the sequential analysis of the interaction and of the mutual sense-making procedures that partners put in place during the administration of an assessment could be used in the design and evaluation of tests for a finer understanding of the abilities involved. PMID:28392771
Find the Hidden Object. Understanding Play in Psychological Assessments.
Fasulo, Alessandra; Shukla, Janhavi; Bennett, Stephanie
2017-01-01
Standardized psychological assessments are extensively used by practitioners to determine rate and level of development in different domains of ability in both typical and atypical children. The younger the children, the more likely the trials will resemble play activities. However, mode of administration, timing and use of objects involved are constrained. The purpose of this study is to explore what kind of play is play in psychological assessments, what are the expectations about children's performance and what are the abilities supporting the test activities. Conversation Analysis (CA) was applied to the videorecording of an interaction between a child and a practitioner during the administration of the Bayley Scale of Infant and Toddler Development, III edition. The analysis focuses on a 2'07″ long sequence relative to the administration of the test item "Find the hidden object" to a 23 months old child with Down syndrome. The analysis of the sequence shows that the assessor promotes the child's engagement by couching the actions required to administer the item in utterances with marked child-directed features. The analysis also shows that the objects constituting the test item did not suggest to the child a unique course of action, leading to the assessor's modeling of the successful sequence. We argue that when a play frame is activated by an interactional partner, the relational aspect of the activity is foregrounded and the co-player becomes a source of cues for ways in which playing can develop. We discuss the assessment interaction as orienting the child toward a right-or-wrong interpretation, leaving the realm of play, which is inherently exploratory and inventive, to enter that of instructional activities. Finally, we argue that the sequential analysis of the interaction and of the mutual sense-making procedures that partners put in place during the administration of an assessment could be used in the design and evaluation of tests for a finer understanding of the abilities involved.
Normal Theory Two-Stage ML Estimator When Data Are Missing at the Item Level
Savalei, Victoria; Rhemtulla, Mijke
2017-01-01
In many modeling contexts, the variables in the model are linear composites of the raw items measured for each participant; for instance, regression and path analysis models rely on scale scores, and structural equation models often use parcels as indicators of latent constructs. Currently, no analytic estimation method exists to appropriately handle missing data at the item level. Item-level multiple imputation (MI), however, can handle such missing data straightforwardly. In this article, we develop an analytic approach for dealing with item-level missing data—that is, one that obtains a unique set of parameter estimates directly from the incomplete data set and does not require imputations. The proposed approach is a variant of the two-stage maximum likelihood (TSML) methodology, and it is the analytic equivalent of item-level MI. We compare the new TSML approach to three existing alternatives for handling item-level missing data: scale-level full information maximum likelihood, available-case maximum likelihood, and item-level MI. We find that the TSML approach is the best analytic approach, and its performance is similar to item-level MI. We recommend its implementation in popular software and its further study. PMID:29276371
Normal Theory Two-Stage ML Estimator When Data Are Missing at the Item Level.
Savalei, Victoria; Rhemtulla, Mijke
2017-08-01
In many modeling contexts, the variables in the model are linear composites of the raw items measured for each participant; for instance, regression and path analysis models rely on scale scores, and structural equation models often use parcels as indicators of latent constructs. Currently, no analytic estimation method exists to appropriately handle missing data at the item level. Item-level multiple imputation (MI), however, can handle such missing data straightforwardly. In this article, we develop an analytic approach for dealing with item-level missing data-that is, one that obtains a unique set of parameter estimates directly from the incomplete data set and does not require imputations. The proposed approach is a variant of the two-stage maximum likelihood (TSML) methodology, and it is the analytic equivalent of item-level MI. We compare the new TSML approach to three existing alternatives for handling item-level missing data: scale-level full information maximum likelihood, available-case maximum likelihood, and item-level MI. We find that the TSML approach is the best analytic approach, and its performance is similar to item-level MI. We recommend its implementation in popular software and its further study.
Nickel and cobalt release from jewellery and metal clothing items in Korea.
Cheong, Seung Hyun; Choi, You Won; Choi, Hae Young; Byun, Ji Yeon
2014-01-01
In Korea, the prevalence of nickel allergy has shown a sharply increasing trend. Cobalt contact allergy is often associated with concomitant reactions to nickel, and is more common in Korea than in western countries. The aim of the present study was to investigate the prevalence of items that release nickel and cobalt on the Korean market. A total of 471 items that included 193 branded jewellery, 202 non-branded jewellery and 76 metal clothing items were sampled and studied with a dimethylglyoxime (DMG) test and a cobalt spot test to detect nickel and cobalt release, respectively. Nickel release was detected in 47.8% of the tested items. The positive rates in the DMG test were 12.4% for the branded jewellery, 70.8% for the non-branded jewellery, and 76.3% for the metal clothing items. Cobalt release was found in 6.2% of items. Among the types of jewellery, belts and hair pins showed higher positive rates in both the DMG test and the cobalt spot test. Our study shows that the prevalence of items that release nickel or cobalt among jewellery and metal clothing items is high in Korea. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
The Role of Item Feedback in Self-Adapted Testing.
ERIC Educational Resources Information Center
Roos, Linda L.; And Others
1997-01-01
The importance of item feedback in self-adapted testing was studied by comparing feedback and no feedback conditions for computerized adaptive tests and self-adapted tests taken by 363 college students. Results indicate that item feedback is not necessary to realize score differences between self-adapted and computerized adaptive testing. (SLD)
Criterion-Referenced Test Items for Auto Body.
ERIC Educational Resources Information Center
Tannehill, Dana, Ed.
This test item bank on auto body repair contains criterion-referenced test questions based upon competencies found in the Missouri Auto Body Competency Profile. Some test items are keyed for multiple competencies. The tests cover the following 26 competency areas in the auto body curriculum: auto body careers; measuring and mixing; tools and…
Automated Test-Form Generation
ERIC Educational Resources Information Center
van der Linden, Wim J.; Diao, Qi
2011-01-01
In automated test assembly (ATA), the methodology of mixed-integer programming is used to select test items from an item bank to meet the specifications for a desired test form and optimize its measurement accuracy. The same methodology can be used to automate the formatting of the set of selected items into the actual test form. Three different…
ERIC Educational Resources Information Center
Kouimanos, John, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Solving the measurement invariance anchor item problem in item response theory.
Meade, Adam W; Wright, Natalie A
2012-09-01
The efficacy of tests of differential item functioning (measurement invariance) has been well established. It is clear that when properly implemented, these tests can successfully identify differentially functioning (DF) items when they exist. However, an assumption of these analyses is that the metric for different groups is linked using anchor items that are invariant. In practice, however, it is impossible to be certain which items are DF and which are invariant. This problem of anchor items, or referent indicators, has long plagued invariance research, and a multitude of suggested approaches have been put forth. Unfortunately, the relative efficacy of these approaches has not been tested. This study compares 11 variations on 5 qualitatively different approaches from recent literature for selecting optimal anchor items. A large-scale simulation study indicates that for nearly all conditions, an easily implemented 2-stage procedure recently put forth by Lopez Rivas, Stark, and Chernyshenko (2009) provided optimal power while maintaining nominal Type I error. With this approach, appropriate anchor items can be easily and quickly located, resulting in more efficacious invariance tests. Recommendations for invariance testing are illustrated using a pedagogical example of employee responses to an organizational culture measure.
When Listening Is Better Than Reading: Performance Gains on Cardiac Auscultation Test Questions.
Short, Kathleen; Bucak, S Deniz; Rosenthal, Francine; Raymond, Mark R
2018-05-01
In 2007, the United States Medical Licensing Examination embedded multimedia simulations of heart sounds into multiple-choice questions. This study investigated changes in item difficulty as determined by examinee performance over time. The data reflect outcomes obtained following initial use of multimedia items from 2007 through 2012, after which an interface change occurred. A total of 233,157 examinees responded to 1,306 cardiology test items over the six-year period; 138 items included multimedia simulations of heart sounds, while 1,168 text-based items without multimedia served as controls. The authors compared changes in difficulty of multimedia items over time with changes in difficulty of text-based cardiology items over time. Further, they compared changes in item difficulty for both groups of items between graduates of Liaison Committee on Medical Education (LCME)-accredited and non-LCME-accredited (i.e., international) medical schools. Examinee performance on cardiology test items with multimedia heart sounds improved by 12.4% over the six-year period, while performance on text-based cardiology items improved by approximately 1.4%. These results were similar for graduates of LCME-accredited and non-LCME-accredited medical schools. Examinees' ability to interpret auscultation findings in test items that include multimedia presentations increased from 2007 to 2012.
Revisiting the role of recollection in item versus forced-choice recognition memory.
Cook, Gabriel I; Marsh, Richard L; Hicks, Jason L
2005-08-01
Many memory theorists have assumed that forced-choice recognition tests can rely more on familiarity, whereas item (yes-no) tests must rely more on recollection. In actuality, several studies have found no differences in the contributions of recollection and familiarity underlying the two different test formats. Using word frequency to manipulate stimulus characteristics, the present study demonstrated that the contributions of recollection to item versus forced-choice tests is variable. Low word frequency resulted in significantly more recollection in an item test than did a forced-choice procedure, but high word frequency produced the opposite result. These results clearly constrain any uniform claim about the degree to which recollection supports responding in item versus forced-choice tests.
A Comparison of Methods of Vertical Equating.
ERIC Educational Resources Information Center
Loyd, Brenda H.; Hoover, H. D.
Rasch model vertical equating procedures were applied to three mathematics computation tests for grades six, seven, and eight. Each level of the test was composed of 45 items in three sets of 15 items, arranged in such a way that tests for adjacent grades had two sets (30 items) in common, and the sixth and eighth grades had 15 items in common. In…
ERIC Educational Resources Information Center
Zebehazy, Kim T.; Zigmond, Naomi; Zimmerman, George J.
2012-01-01
Introduction: This study investigated differential item functioning (DIF) of test items on Pennsylvania's Alternate System of Assessment (PASA) for students with visual impairments and severe cognitive disabilities and what the reasons for the differences may be. Methods: The Wilcoxon signed ranks test was used to analyze differences in the scores…
Objective and Item Banking Computer Software and Its Use in Comprehensive Achievement Monitoring.
ERIC Educational Resources Information Center
Schriber, Peter E.; Gorth, William P.
The current emphasis on objectives and test item banks for constructing more effective tests is being augmented by increasingly sophisticated computer software. Items can be catalogued in numerous ways for retrieval. The items as well as instructional objectives can be stored and test forms can be selected and printed by the computer. It is also…
An Item-Driven Adaptive Design for Calibrating Pretest Items. Research Report. ETS RR-14-38
ERIC Educational Resources Information Center
Ali, Usama S.; Chang, Hua-Hua
2014-01-01
Adaptive testing is advantageous in that it provides more efficient ability estimates with fewer items than linear testing does. Item-driven adaptive pretesting may also offer similar advantages, and verification of such a hypothesis about item calibration was the main objective of this study. A suitability index (SI) was introduced to adaptively…
Fitting the Rasch Model to Account for Variation in Item Discrimination
ERIC Educational Resources Information Center
Weitzman, R. A.
2009-01-01
Building on the Kelley and Gulliksen versions of classical test theory, this article shows that a logistic model having only a single item parameter can account for varying item discrimination, as well as difficulty, by using item-test correlations to adjust incorrect-correct (0-1) item responses prior to an initial model fit. The fit occurs…
Weighted Maximum-a-Posteriori Estimation in Tests Composed of Dichotomous and Polytomous Items
ERIC Educational Resources Information Center
Sun, Shan-Shan; Tao, Jian; Chang, Hua-Hua; Shi, Ning-Zhong
2012-01-01
For mixed-type tests composed of dichotomous and polytomous items, polytomous items often yield more information than dichotomous items. To reflect the difference between the two types of items and to improve the precision of ability estimation, an adaptive weighted maximum-a-posteriori (WMAP) estimation is proposed. To evaluate the performance of…
ERIC Educational Resources Information Center
Sengul Avsar, Asiye; Tavsancil, Ezel
2017-01-01
This study analysed polytomous items' psychometric properties according to nonparametric item response theory (NIRT) models. Thus, simulated datasets--three different test lengths (10, 20 and 30 items), three sample distributions (normal, right and left skewed) and three samples sizes (100, 250 and 500)--were generated by conducting 20…
Rasch Measurement and Item Banking: Theory and Practice.
ERIC Educational Resources Information Center
Nakamura, Yuji
The Rasch Model is an item response theory, one parameter model developed that states that the probability of a correct response on a test is a function of the difficulty of the item and the ability of the candidate. Item banking is useful for language testing. The Rasch Model provides estimates of item difficulties that are meaningful,…
Test Design Project: Studies in Test Bias. Annual Report.
ERIC Educational Resources Information Center
McArthur, David
Item bias in a multiple-choice test can be detected by appropriate analyses of the persons x items scoring matrix. This permits comparison of groups of examinees tested with the same instrument. The test may be biased if it is not measuring the same thing in comparable groups, if groups are responding to different aspects of the test items, or if…
ERIC Educational Resources Information Center
Truell, Allen D.; Zhao, Jensen J.; Alexander, Melody W.
2005-01-01
The purposes of this study were to determine if there is a significant difference in postsecondary business student scores and test completion time based on settable test item exposure control interface format, and to determine if there is a significant difference in student scores and test completion time based on settable test item exposure…
Estimating Total-Test Scores from Partial Scores in a Matrix Sampling Design.
ERIC Educational Resources Information Center
Sachar, Jane; Suppes, Patrick
1980-01-01
The present study compared six methods, two of which utilize the content structure of items, to estimate total-test scores using 450 students and 60 items of the 110-item Stanford Mental Arithmetic Test. Three methods yielded fairly good estimates of the total-test score. (Author/RL)
ERIC Educational Resources Information Center
Penfield, Randall D.; Algina, James
2006-01-01
One approach to measuring unsigned differential test functioning is to estimate the variance of the differential item functioning (DIF) effect across the items of the test. This article proposes two estimators of the DIF effect variance for tests containing dichotomous and polytomous items. The proposed estimators are direct extensions of the…
Smolen, Tomasz; Chuderski, Adam
2015-01-01
Fluid intelligence (Gf) is a crucial cognitive ability that involves abstract reasoning in order to solve novel problems. Recent research demonstrated that Gf strongly depends on the individual effectiveness of working memory (WM). We investigated a popular claim that if the storage capacity underlay the WM-Gf correlation, then such a correlation should increase with an increasing number of items or rules (load) in a Gf-test. As often no such link is observed, on that basis the storage-capacity account is rejected, and alternative accounts of Gf (e.g., related to executive control or processing speed) are proposed. Using both analytical inference and numerical simulations, we demonstrated that the load-dependent change in correlation is primarily a function of the amount of floor/ceiling effect for particular items. Thus, the item-wise WM correlation of a Gf-test depends on its overall difficulty, and the difficulty distribution across its items. When the early test items yield huge ceiling, but the late items do not approach floor, that correlation will increase throughout the test. If the early items locate themselves between ceiling and floor, but the late items approach floor, the respective correlation will decrease. For a hallmark Gf-test, the Raven-test, whose items span from ceiling to floor, the quadratic relationship is expected, and it was shown empirically using a large sample and two types of WMC tasks. In consequence, no changes in correlation due to varying WM/Gf load, or lack of them, can yield an argument for or against any theory of WM/Gf. Moreover, as the mathematical properties of the correlation formula make it relatively immune to ceiling/floor effects for overall moderate correlations, only minor changes (if any) in the WM-Gf correlation should be expected for many psychological tests.
The Classroom Environment Questionnaire (CEQ): Development and preliminary structural validity.
Lyons, Carissa; Brown, Ted; Bourke-Taylor, Helen
2018-04-16
Occupational therapists offer a unique perspective regarding the contribution of the environment to occupational performance. Therefore, a scale that measures the unique characteristics of the primary school classroom environment where children complete their daily schoolwork occupations is needed. The aim of this study was to develop and psychometrically evaluate a new teacher-report questionnaire that measures a number of environmental characteristics of primary school classrooms. Participants (N = 117) completed the Classroom Environment Questionnaire (CEQ), which utilises a 4-point Likert scale where teachers rate 51 environmental characteristics of their classroom. Teachers also rate the extent to which they believe the physical, social, temporal, institutional and cultural classroom environmental domains contribute to students' schoolwork performance using a 10-point scale. The structural validity of the CEQ was examined using principal component analysis (PCA). Inter-item correlations were examined using Pearson r correlations, while the internal consistency of the CEQ was assessed using Cronbach's alpha. PCA revealed the CEQ to be multidimensional, with 31 items loading onto nine viable factors, representing the unique nature of classroom environments. Based on the PCA results, 20 items were removed from the CEQ. Cronbach's alpha and correlation analysis indicated that most CEQ subsections had acceptable internal consistency (alpha range 0.70-0.82), with four subsections demonstrating a lower level of internal consistency (alpha range 0.55-0.69). Preliminary structural validity and internal consistency analysis findings confirm that the CEQ has potential to be a useful scale for professionals wishing to examine the unique characteristics of primary school classrooms that influence the occupational performance of students. Ongoing analyses will be undertaken to further explore the CEQ's validity and reliability. © 2018 Occupational Therapy Australia.
Fabricant, Peter D; Robles, Alex; Downey-Zayas, Timothy; Do, Huong T; Marx, Robert G; Widmann, Roger F; Green, Daniel W
2013-10-01
Having simple and reliable validated outcome measures is vital to conducting high-quality outcomes research in the field of orthopaedic surgery. Activity level is a key prognostic variable for patients with sports injuries. There is a paucity of such activity scales for children and adolescents who are otherwise healthy and athletically active. In addition to frequency and intensity of athletic activity, level of play and coach/trainer supervision are important variables unique to children and adolescents that are not captured in available adult scoring systems. To create and validate a concise and comprehensive activity rating scale for athletically active children and adolescents 10 to 18 years of age. Cohort study (diagnosis); Level of evidence, 2. Item generation was performed with a panel of orthopaedic surgeons and adolescent athletes. Item reduction, pilot testing and scale refinement resulted in a final 8-item instrument, the Hospital for Special Surgery Pediatric Functional Activity Brief Scale (HSS Pedi-FABS). Existing methods were used to determine reliability and validation. The Flesch-Kincaid score was calculated at a 6.6th-grade reading level (approximately 13 years old); therefore, although all subjects provided their own answers, parents were allowed to assist children younger than 13 years with reading the questionnaire. Scale reliability was excellent (test-retest reliability, intraclass correlation coefficient = 0.91; internal consistency, Cronbach alpha = .914), and there were no floor or ceiling effects. There was also robust construct validity: Convergent validity testing revealed positive correlations between the HSS Pedi-FABS and level of competition in athletic activity, number of reported hours of athletic activity per week, and existing comparable adult and pediatric scales. Discriminant validity was shown with age, body mass index, and type of sport as measured by the Daniel scale. The 8-item HSS Pedi-FABS can be used to reliably and accurately evaluate activity level as a prognostic variable for clinical research studies. It is a simple, reliable, and valid metric to assess activity in children and adolescents 10 to 18 years of age. This instrument will lead to better evaluation of posttreatment outcomes and patient-reported activity for child and adolescent athletes.
Item response theory analysis of the mechanics baseline test
NASA Astrophysics Data System (ADS)
Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.
2012-02-01
Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.
Computerized adaptive testing: the capitalization on chance problem.
Olea, Julio; Barrada, Juan Ramón; Abad, Francisco J; Ponsoda, Vicente; Cuevas, Lara
2012-03-01
This paper describes several simulation studies that examine the effects of capitalization on chance in the selection of items and the ability estimation in CAT, employing the 3-parameter logistic model. In order to generate different estimation errors for the item parameters, the calibration sample size was manipulated (N = 500, 1000 and 2000 subjects) as was the ratio of item bank size to test length (banks of 197 and 788 items, test lengths of 20 and 40 items), both in a CAT and in a random test. Results show that capitalization on chance is particularly serious in CAT, as revealed by the large positive bias found in the small sample calibration conditions. For broad ranges of theta, the overestimation of the precision (asymptotic Se) reaches levels of 40%, something that does not occur with the RMSE (theta). The problem is greater as the item bank size to test length ratio increases. Potential solutions were tested in a second study, where two exposure control methods were incorporated into the item selection algorithm. Some alternative solutions are discussed.
ERIC Educational Resources Information Center
Öztürk-Gübes, Nese; Kelecioglu, Hülya
2016-01-01
The purpose of this study was to examine the impact of dimensionality, common-item set format, and different scale linking methods on preserving equity property with mixed-format test equating. Item response theory (IRT) true-score equating (TSE) and IRT observed-score equating (OSE) methods were used under common-item nonequivalent groups design.…
Ackerman, Robert A; Donnellan, M Brent; Roberts, Brent W; Fraley, R Chris
2016-04-01
The Narcissistic Personality Inventory (NPI) is currently the most widely used measure of narcissism in social/personality psychology. It is also relatively unique because it uses a forced-choice response format. We investigate the consequences of changing the NPI's response format for item meaning and factor structure. Participants were randomly assigned to one of three conditions: 40 forced-choice items (n = 2,754), 80 single-stimulus dichotomous items (i.e., separate true/false responses for each item; n = 2,275), or 80 single-stimulus rating scale items (i.e., 5-point Likert-type response scales for each item; n = 2,156). Analyses suggested that the "narcissistic" and "nonnarcissistic" response options from the Entitlement and Superiority subscales refer to independent personality dimensions rather than high and low levels of the same attribute. In addition, factor analyses revealed that although the Leadership dimension was evident across formats, dimensions with entitlement and superiority were not as robust. Implications for continued use of the NPI are discussed. © The Author(s) 2015.
Kuijpers, Rowella C. W. M.; Otten, Roy; Vermulst, Ad A.; Bitfoi, Adina; Goelitz, Dietmar; Koç, Ceren; Mihova, Zlatka; Pez, Ondine; Carta, Mauro; Keyes, Katherine; Lesinskiene, Sigita; Engels, Rutger C. M. E.; Kovess, Viviane
2015-01-01
Large-scale international surveys are important to globally evaluate, monitor, and promote children's mental health. However, use of young children's self-reports in these studies is still controversial. The Dominic Interactive, a computerized DSM-IV–based child mental health self-report questionnaire, has unique characteristics that may make it preeminently appropriate for usage in cross-country comparisons. This study aimed to determine scale score reliabilities (omega) of the Dominic Interactive in a sample of 8,135 primary school children, ages 6–11 years old, in 7 European countries, to confirm the proposed 7-scale factor structure, and to test for measurement invariance of scale and item scores across countries. Omega reliability values for scale scores were good to high in every country, and the factor structure was confirmed for all countries. A thorough examination of measurement invariance provided evidence for cross-country test score comparability of 5 of the 7 scales and partial scale score invariance of 2 anxiety scales. Possible explanations for this partial invariance include cross-country differences in conceptualizing items and defining what is socially and culturally acceptable anxiety. The convincing evidence for validity of score interpretation makes the Dominic Interactive an indispensable tool for cross-country screening purposes. PMID:26237209
ERIC Educational Resources Information Center
Ali, Usama S.; Chang, Hua-Hua; Anderson, Carolyn J.
2015-01-01
Polytomous items are typically described by multiple category-related parameters; situations, however, arise in which a single index is needed to describe an item's location along a latent trait continuum. Situations in which a single index would be needed include item selection in computerized adaptive testing or test assembly. Therefore single…
Designing a Virtual Item Bank Based on the Techniques of Image Processing
ERIC Educational Resources Information Center
Liao, Wen-Wei; Ho, Rong-Guey
2011-01-01
One of the major weaknesses of the item exposure rates of figural items in Intelligence Quotient (IQ) tests lies in its inaccuracies. In this study, a new approach is proposed and a useful test tool known as the Virtual Item Bank (VIB) is introduced. The VIB combine Automatic Item Generation theory and image processing theory with the concepts of…
The Rasch Model and Missing Data, with an Emphasis on Tailoring Test Items.
ERIC Educational Resources Information Center
de Gruijter, Dato N. M.
Many applications of educational testing have a missing data aspect (MDA). This MDA is perhaps most pronounced in item banking, where each examinee responds to a different subtest of items from a large item pool and where both person and item parameter estimates are needed. The Rasch model is emphasized, and its non-parametric counterpart (the…
Three controversies over item disclosure in medical licensure examinations
Park, Yoon Soo; Yang, Eunbae B.
2015-01-01
In response to views on public's right to know, there is growing attention to item disclosure – release of items, answer keys, and performance data to the public – in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations – 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure – by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers’ right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration. PMID:26374693
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fu, L.J.; Johnson, E.M.; Newman, L.M.
A series of seven randomly selected potential halogenated water disinfection by-products were evaluated in vitro by the hydra assay to determine their developmental toxicity hazard potential. For six of the chemicals tested by this assay (dibromoacetonitrile; trichloroacetonitrile; 2-chlorophenol; 2,4,6-trichlorophenol; trichloroacetic acid; dichloroacetone) it was predicted that they would be generally equally toxic to both adult and embryonic mammals when studied by means of standard developmental toxicity teratology tests. However, the potential water disinfection by-product chloroacetic acid (CA) was determined to be over eight times more toxic to the embryonic developmental portion of the assay than it was to the adults.more » Because of this potential selectivity, CA is a high-priority item for developmental toxicity tests in pregnant mammals to confirm or refute its apparent unique developmental hazard potential and/or to establish a NOAEL by the route of most likely human exposure.« less
The eyes test is influenced more by artistic inclination and less by sex
Guariglia, Paola; Piccardi, Laura; Giaimo, Flavio; Alaimo, Sofia; Miccichè, Giusy; Antonucci, Gabriella
2015-01-01
The “Reading the Mind in the Eyes” test was developed by Baron-Cohen and his co-workers. This test provides them the unique opportunity to evaluate social cognition assessing the ability to recognize the mental state of others using only the expressions around the eyes. In healthy populations, however, it has produced conflicting results, particularly regarding sex differences and number of items to use. In this study we performed two studies: The first one investigated the presence of gender effects and the sensitivity of test stimuli; the second one considered other individual factors (i.e., artistic attitude, social empathy and personality traits) that could influence the ability to understand emotions from gaze. Our results demonstrated a sex effect, which can be more or less attenuated by the nature of the stimuli. This could be as aforementioned the result of the following, empathy or artistic attitude in being proficient in understanding the mental states of others. PMID:26052278
The eyes test is influenced more by artistic inclination and less by sex.
Guariglia, Paola; Piccardi, Laura; Giaimo, Flavio; Alaimo, Sofia; Miccichè, Giusy; Antonucci, Gabriella
2015-01-01
The "Reading the Mind in the Eyes" test was developed by Baron-Cohen and his co-workers. This test provides them the unique opportunity to evaluate social cognition assessing the ability to recognize the mental state of others using only the expressions around the eyes. In healthy populations, however, it has produced conflicting results, particularly regarding sex differences and number of items to use. In this study we performed two studies: The first one investigated the presence of gender effects and the sensitivity of test stimuli; the second one considered other individual factors (i.e., artistic attitude, social empathy and personality traits) that could influence the ability to understand emotions from gaze. Our results demonstrated a sex effect, which can be more or less attenuated by the nature of the stimuli. This could be as aforementioned the result of the following, empathy or artistic attitude in being proficient in understanding the mental states of others.
Bayesian Item Selection in Constrained Adaptive Testing Using Shadow Tests
ERIC Educational Resources Information Center
Veldkamp, Bernard P.
2010-01-01
Application of Bayesian item selection criteria in computerized adaptive testing might result in improvement of bias and MSE of the ability estimates. The question remains how to apply Bayesian item selection criteria in the context of constrained adaptive testing, where large numbers of specifications have to be taken into account in the item…
Mathematics Library of Test Items. Volume One.
ERIC Educational Resources Information Center
Fraser, Graham, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from previous tests are made available to teachers for the construction of pretests or posttests, reference tests for inter-class comparisons and general assignments. The collection was reviewed for content…
Are Learning Disabled Students "Test-Wise?": An Inquiry into Reading Comprehension Test Items.
ERIC Educational Resources Information Center
Scruggs, Thomas E.; Lifson, Steve
The ability to correctly answer reading comprehension test items, without having read the accompanying reading passage, was compared for third grade learning disabled students and their peers from a regular classroom. In the first experiment, fourteen multiple choice items were selected from the Stanford Achievement Test. No reading passages were…
Agriculture Library of Test Items.
ERIC Educational Resources Information Center
Sutherland, Duncan, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection is reviewed for content validity and reliability. The test…
ERIC Educational Resources Information Center
Bermundo, Cesar B.; Bermundo, Alex B.; Ballester, Rex C.
2012-01-01
iBank is a project that utilizes a software to create an item Bank that store quality questions, generate test and print exam. The items are from analyze teacher-constructed test questions that provides the basis for discussing test results, by determining why a test item is or not discriminating between the better and poorer students, and by…
Effects of Test Item Disclosure on Medical Licensing Examination
ERIC Educational Resources Information Center
Yang, Eunbae B.; Lee, Myung Ae; Park, Yoon Soo
2018-01-01
In 2012, the National Health Personnel Licensing Examination Board of Korea decided to publicly disclose all test items and answers to satisfy the test takers' right to know and enhance the transparency of tests administered by the government. This study investigated the effects of item disclosure on the medical licensing examination (MLE),…
Controlling Item Exposure Conditional on Ability in Computerized Adaptive Testing.
ERIC Educational Resources Information Center
Stocking, Martha L.; Lewis, Charles
1998-01-01
Ensuring item and pool security in a continuous testing environment is explored through a new method of controlling exposure rate of items conditional on ability level in computerized testing. Properties of this conditional control on exposure rate, when used in conjunction with a particular adaptive testing algorithm, are explored using simulated…
Battalion Combat Operations Center (COC) Test. Volume II. Test Report,
1982-02-08
reveal, perhaps, that item X can perform a task faster than item-Y. A utility assessment from an experienced, knowledgeable test participant, however...can ascertain whether or not item X can better enable him to accomplish his mission than item Y. 2.4 GENeRALIZED TEST FACILITY. The capabilities of...ATHE MIX D -IX AE4SY MIXES A & C MIX A .IX D M X D IMIX C RATHER DIFFICUJLT VERY DIFFICULT ABILITY TO ABILITY TO ABILITY TO CONTROL DATA EXPLOIT DATA
V-TECS Criterion-Referenced Test Item Bank for Radiologic Technology Occupations.
ERIC Educational Resources Information Center
Reneau, Fred; And Others
This Vocational-Technical Education Consortium of States (V-TECS) criterion-referenced test item bank provides 696 multiple-choice items and 33 matching items for radiologic technology occupations. These job titles are included: radiologic technologist, chief; radiologic technologist; nuclear medicine technologist; radiation therapy technologist;…
ERIC Educational Resources Information Center
Magno, Carlo
2009-01-01
The present report demonstrates the difference between classical test theory (CTT) and item response theory (IRT) approach using an actual test data for chemistry junior high school students. The CTT and IRT were compared across two samples and two forms of test on their item difficulty, internal consistency, and measurement errors. The specific…
Modeling Local Item Dependence Due to Common Test Format with a Multidimensional Rasch Model
ERIC Educational Resources Information Center
Baghaei, Purya; Aryadoust, Vahid
2015-01-01
Research shows that test method can exert a significant impact on test takers' performance and thereby contaminate test scores. We argue that common test method can exert the same effect as common stimuli and violate the conditional independence assumption of item response theory models because, in general, subsets of items which have a shared…
Sargeant, J M; O'Connor, A M; Dohoo, I R; Erb, H N; Cevallos, M; Egger, M; Ersbøll, A K; Martin, S W; Nielsen, L R; Pearl, D L; Pfeiffer, D U; Sanchez, J; Torrence, M E; Vigre, H; Waldner, C; Ward, M P
2016-11-01
The reporting of observational studies in veterinary research presents many challenges that often are not adequately addressed in published reporting guidelines. To develop an extension of the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) statement that addresses unique reporting requirements for observational studies in veterinary medicine related to health, production, welfare, and food safety. A consensus meeting of experts was organized to develop an extension of the STROBE statement to address observational studies in veterinary medicine with respect to animal health, animal production, animal welfare, and food safety outcomes. Consensus meeting May 11-13, 2014 in Mississauga, Ontario, Canada. Seventeen experts from North America, Europe, and Australia attended the meeting. The experts were epidemiologists and biostatisticians, many of whom hold or have held editorial positions with relevant journals. Prior to the meeting, 19 experts completed a survey about whether they felt any of the 22 items of the STROBE statement should be modified and if items should be added to address unique issues related to observational studies in animal species with health, production, welfare, or food safety outcomes. At the meeting, the participants were provided with the survey responses and relevant literature concerning the reporting of veterinary observational studies. During the meeting, each STROBE item was discussed to determine whether or not re-wording was recommended, and whether additions were warranted. Anonymous voting was used to determine whether there was consensus for each item change or addition. The consensus was that six items needed no modifications or additions. Modifications or additions were made to the STROBE items numbered: 1 (title and abstract), 3 (objectives), 5 (setting), 6 (participants), 7 (variables), 8 (data sources/measurement), 9 (bias), 10 (study size), 12 (statistical methods), 13 (participants), 14 (descriptive data), 15 (outcome data), 16 (main results), 17 (other analyses), 19 (limitations), and 22 (funding). Published literature was not always available to support modification to, or inclusion of, an item. The methods and processes used in the development of this statement were similar to those used for other extensions of the STROBE statement. The use of this extension to the STROBE statement should improve the reporting of observational studies in veterinary research related to animal health, production, welfare, or food safety outcomes by recognizing the unique features of observational studies involving food-producing and companion animals, products of animal origin, aquaculture, and wildlife. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Garcia, Sofia F.; Hahn, Elizabeth A.; Magasi, Susan; Lai, Jin-Shei; Semik, Patrick; Hammel, Joy; Heinemann, Allen W.
2014-01-01
Objective To describe the development of new self-report measures of social attitudes that act as environmental facilitators or barriers to the participation of people with disabilities in society. Design A mixed methods approach included a literature review; item classification, selection and writing; cognitive interviews and field testing with participants with spinal cord injury (SCI), traumatic brain injury (TBI) or stroke; and rating scale analysis to evaluate initial psychometric properties. Setting General community. Participants Nine individuals with SCI, TBI or stroke participated in cognitive interviews; 305 community residents with those same conditions participated in field testing. Interventions None. Main Outcome Measure(s) Self-report item pool of social attitudes that act as facilitators or barriers to people with disabilities participating in society. Results An interdisciplinary team of experts classified 710 existing social environment items into content areas and wrote 32 new items. Additional qualitative item review included item refinement and winnowing of the pool prior to cognitive interviews and field testing 82 items. Field test data indicated that the pool satisfies a one-parameter item response theory measurement model and would be appropriate for development into a calibrated item bank. Conclusions Our qualitative item review process supported a social environment conceptual framework that includes both social support and social attitudes. We developed a new social attitudes self-report item pool. Calibration testing of that pool is underway with a larger sample in order to develop a social attitudes item bank for persons with disabilities. PMID:25045803
Garcia, Sofia F; Hahn, Elizabeth A; Magasi, Susan; Lai, Jin-Shei; Semik, Patrick; Hammel, Joy; Heinemann, Allen W
2015-04-01
To describe the development of new self-report measures of social attitudes that act as environmental facilitators or barriers to the participation of people with disabilities in society. A mixed-methods approach included a literature review; item classification, selection, and writing; cognitive interviews and field testing of participants with spinal cord injury (SCI), traumatic brain injury (TBI), or stroke; and rating scale analysis to evaluate initial psychometric properties. General community. Individuals with SCI, TBI, or stroke participated in cognitive interviews (n=9); community residents with those same conditions participated in field testing (n=305). None. Self-report item pool of social attitudes that act as facilitators or barriers to people with disabilities participating in society. An interdisciplinary team of experts classified 710 existing social environment items into content areas and wrote 32 new items. Additional qualitative item review included item refinement and winnowing of the pool prior to cognitive interviews and field testing of 82 items. Field test data indicated that the pool satisfies a 1-parameter item response theory measurement model and would be appropriate for development into a calibrated item bank. Our qualitative item review process supported a social environment conceptual framework that includes both social support and social attitudes. We developed a new social attitudes self-report item pool. Calibration testing of that pool is underway with a larger sample to develop a social attitudes item bank for persons with disabilities. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Mitchell, Alex J; Smith, Adam B; Al-salihy, Zerak; Rahim, Twana A; Mahmud, Mahmud Q; Muhyaldin, Asma S
2011-10-01
We aimed to redefine the optimal self-report symptoms of depression suitable for creation of an item bank that could be used in computer adaptive testing or to develop a simplified screening tool for DSM-V. Four hundred subjects (200 patients with primary depression and 200 non-depressed subjects), living in Iraqi Kurdistan were interviewed. The Mini International Neuropsychiatric Interview (MINI) was used to define the presence of major depression (DSM-IV criteria). We examined symptoms of depression using four well-known scales delivered in Kurdish. The Partial Credit Model was applied to each instrument. Common-item equating was subsequently used to create an item bank and differential item functioning (DIF) explored for known subgroups. A symptom level Rasch analysis reduced the original 45 items to 24 items of the original after the exclusion of 21 misfitting items. A further six items (CESD13 and CESD17, HADS-D4, HADS-D5 and HADS-D7, and CDSS3 and CDSS4) were removed due to misfit as the items were added together to form the item bank, and two items were subsequently removed following the DIF analysis by diagnosis (CESD20 and CDSS9, both of which were harder to endorse for women). Therefore the remaining optimal item bank consisted of 17 items and produced an area under the curve (AUC) of 0.987. Using a bank restricted to the optimal nine items revealed only minor loss of accuracy (AUC = 0.989, sensitivity 96%, specificity 95%). Finally, when restricted to only four items accuracy was still high (AUC was still 0.976; sensitivity 93%, specificity 96%). An item bank of 17 items may be useful in computer adaptive testing and nine or even four items may be used to develop a simplified screening tool for DSM-V major depressive disorder (MDD). Further examination of this item bank should be conducted in different cultural settings.
Kalpakjian, Claire Z.; Tate, Denise G.; Kisala, Pamela A.; Tulsky, David S.
2015-01-01
Objective To describe the development and psychometric properties of the Spinal Cord Injury-Quality of Life (SCI-QOL) Self-esteem item bank. Design Using a mixed-methods design, we developed and tested a self-esteem item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory- (IRT) based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. Setting We tested a pool of 30 items at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters/Bronx Department of Veterans Affairs hospital. Participants A total of 717 individuals with SCI completed the self-esteem items. Results A unidimensional model was observed (CFI = 0.946; RMSEA = 0.087) and measurement precision was good (theta range between −2.7 and 0.7). Eleven items were flagged for DIF; however, effect sizes were negligible with little practical impact on score estimates. The final calibrated item bank resulted in 23 retained items. Conclusion This study indicates that the SCI-QOL Self-esteem item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010972
Kalpakjian, Claire Z; Tate, Denise G; Kisala, Pamela A; Tulsky, David S
2015-05-01
To describe the development and psychometric properties of the Spinal Cord Injury-Quality of Life (SCI-QOL) Self-esteem item bank. Using a mixed-methods design, we developed and tested a self-esteem item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory-(IRT) based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. We tested a pool of 30 items at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters/Bronx Department of Veterans Affairs hospital. A total of 717 individuals with SCI completed the self-esteem items. A unidimensional model was observed (CFI=0.946; RMSEA=0.087) and measurement precision was good (theta range between -2.7 and 0.7). Eleven items were flagged for DIF; however, effect sizes were negligible with little practical impact on score estimates. The final calibrated item bank resulted in 23 retained items. This study indicates that the SCI-QOL Self-esteem item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available.
Victorson, David; Tulsky, David S; Kisala, Pamela A; Kalpakjian, Claire Z; Weiland, Brian; Choi, Seung W
2015-05-01
To describe the development and psychometric properties of the Spinal Cord Injury--Quality of Life (SCI-QOL) Resilience item bank and short form. Using a mixed-methods design, we developed and tested a resilience item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory based analytic approaches, including tests of model fit and differential item functioning (DIF). We tested a 32-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs medical center. A total of 717 individuals with SCI completed the Resilience items. A unidimensional model was observed (CFI=0.968; RMSEA=0.074) and measurement precision was good (theta range between -3.1 and 0.9). Ten items were flagged for DIF, however, after examination of effect sizes we found this to be negligible with little practical impact on score estimates. The final calibrated item bank resulted in 21 retained items. This study indicates that the SCI-QOL Resilience item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available.
Victorson, David; Tulsky, David S.; Kisala, Pamela A.; Kalpakjian, Claire Z.; Weiland, Brian; Choi, Seung W.
2015-01-01
Objective To describe the development and psychometric properties of the Spinal Cord Injury - Quality of Life (SCI-QOL) Resilience item bank and short form. Design Using a mixed-methods design, we developed and tested a resilience item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory based analytic approaches, including tests of model fit and differential item functioning (DIF). Setting We tested a 32-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs medical center. Participants A total of 717 individuals with SCI completed the Resilience items. Results A unidimensional model was observed (CFI = 0.968; RMSEA = 0.074) and measurement precision was good (theta range between −3.1 and 0.9). Ten items were flagged for DIF, however, after examination of effect sizes we found this to be negligible with little practical impact on score estimates. The final calibrated item bank resulted in 21 retained items. Conclusion This study indicates that the SCI-QOL Resilience item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010971
Grundgeiger, Tobias
2014-04-01
Retrieving a subset of learned items can lead to the forgetting of related items. Such retrieval-induced forgetting (RIF) can be explained by the inhibition of irrelevant items in order to overcome retrieval competition when the target item is retrieved. According to the retrieval inhibition account, such retrieval competition is a necessary condition for RIF. However, research has indicated that noncompetitive retrieval practice can also cause RIF by strengthening cue-item associations. According to the strength-dependent competition account, the strengthened items interfere with the retrieval of weaker items, resulting in impaired recall of weaker items in the final memory test. The aim of this study was to replicate RIF caused by noncompetitive retrieval practice and to determine whether this forgetting is also observed in recognition tests. In the context of RIF, it has been assumed that recognition tests circumvent interference and, therefore, should not be sensitive to forgetting due to strength-dependent competition. However, this has not been empirically tested, and it has been suggested that participants may reinstate learned cues as retrieval aids during the final test. In the present experiments, competitive practice or noncompetitive practice was followed by either final cued-recall tests or recognition tests. In cued-recall tests, RIF was observed in both competitive and noncompetitive conditions. However, in recognition tests, RIF was observed only in the competitive condition and was absent in the noncompetitive condition. The result underscores the contribution of strength-dependent competition to RIF. However, recognition tests seem to be a reliable way of distinguishing between RIF due to retrieval inhibition or strength-dependent competition.
Adaptive Mental Testing: The State of the Art
1979-11-01
typically vary in their psychometric properties --particularly in their difficulty--the test designer must decide what configuration of these item...psychometric properties best suits the test’s purpose. There are two extreme ration- ales to guide that decision. One rationale is to choose items that are...development of item response theory (Rasch, 1960; Lord, 1952, 1970, 1974a; Birnbaum, 1968) that provided the needed invariance properties for item
ERIC Educational Resources Information Center
Pohl, Steffi; Gräfe, Linda; Rose, Norman
2014-01-01
Data from competence tests usually show a number of missing responses on test items due to both omitted and not-reached items. Different approaches for dealing with missing responses exist, and there are no clear guidelines on which of those to use. While classical approaches rely on an ignorable missing data mechanism, the most recently developed…
Procedures for Selecting Items for Computerized Adaptive Tests.
ERIC Educational Resources Information Center
Kingsbury, G. Gage; Zara, Anthony R.
1989-01-01
Several classical approaches and alternative approaches to item selection for computerized adaptive testing (CAT) are reviewed and compared. The study also describes procedures for constrained CAT that may be added to classical item selection approaches to allow them to be used for applied testing. (TJH)
Efforts Toward the Development of Unbiased Selection and Assessment Instruments.
ERIC Educational Resources Information Center
Rudner, Lawrence M.
Investigations into item bias provide an empirical basis for the identification and elimination of test items which appear to measure different traits across populations or cultural groups. The Psychometric rationales for six approaches to the identification of biased test items are reviewed: (1) Transformed item difficulties: within-group…
Effect of Differential Item Functioning on Test Equating
ERIC Educational Resources Information Center
Kabasakal, Kübra Atalay; Kelecioglu, Hülya
2015-01-01
This study examines the effect of differential item functioning (DIF) items on test equating through multilevel item response models (MIRMs) and traditional IRMs. The performances of three different equating models were investigated under 24 different simulation conditions, and the variables whose effects were examined included sample size, test…
Ramsay-Curve Differential Item Functioning
ERIC Educational Resources Information Center
Woods, Carol M.
2011-01-01
Differential item functioning (DIF) occurs when an item on a test, questionnaire, or interview has different measurement properties for one group of people versus another, irrespective of true group-mean differences on the constructs being measured. This article is focused on item response theory based likelihood ratio testing for DIF (IRT-LR or…
Code of Federal Regulations, 2013 CFR
2013-07-01
... identifies, relates to or is unique to, or describes him or her; e.g., a social security number, age.... Any item, collection, or grouping of information, whatever the storage media (e.g., paper, electronic...
Code of Federal Regulations, 2011 CFR
2011-07-01
... identifies, relates to or is unique to, or describes him or her; e.g., a social security number, age.... Any item, collection, or grouping of information, whatever the storage media (e.g., paper, electronic...
ERIC Educational Resources Information Center
Çikirikçi Demirtasli, Nükhet; Ulutas, Seher
2015-01-01
Problem Statement: Item bias occurs when individuals from different groups (different gender, cultural background, etc.) have different probabilities of responding correctly to a test item despite having the same skill levels. It is important that tests or items do not have bias in order to ensure the accuracy of decisions taken according to test…
ERIC Educational Resources Information Center
Egberink, Iris J. L.; Meijer, Rob R.; Tendeiro, Jorge N.
2015-01-01
A popular method to assess measurement invariance of a particular item is based on likelihood ratio tests with all other items as anchor items. The results of this method are often only reported in terms of statistical significance, and researchers proposed different methods to empirically select anchor items. It is unclear, however, how many…
ERIC Educational Resources Information Center
Masters, James S.
2010-01-01
With the need for larger and larger banks of items to support adaptive testing and to meet security concerns, large-scale item generation is a requirement for many certification and licensure programs. As part of the mass production of items, it is critical that the difficulty and the discrimination of the items be known without the need for…
Unilateral neglect: further validation of the baking tray task.
Appelros, Peter; Karlsson, Gunnel M; Thorwalls, Annika; Tham, Kerstin; Nydevik, Ingegerd
2004-11-01
The Baking Tray Task is a comprehensible, simple-to-perform test for use in assessing unilateral neglect. The aim of this study was to validate further its use with stroke patients. The Baking Tray Task was compared with 2 versions of the Behaviour Inattention Test and a test for personal neglect. A total of 270 patients were subjected to a 3-item version of the Behaviour Inattention Test and 40 patients were subjected to an 8-item version of the Behaviour Inattention Test, besides the Baking Tray Task and the personal neglect test. The Baking Tray Task was more sensitive than the 3-item Behaviour Inattention Test, but the 8-item Behaviour Inattention Test was more sensitive than the Baking Tray Task. The best combination of any 3 tests was Baking Tray Task, Reading an article, and Figure copying; the 2 last-mentioned being a part of the 8-item Behaviour Inattention Test. Multi-item tests detect more cases of neglect than do single tests. However, it is tiresome for the patient to undergo a larger test battery than necessary. It is also time-consuming for the staff. Behavioural tests seem more appropriate when assessing neglect. The Baking Tray Task seems to be one of the most sensitive single tests, but its sensitivity can be further enhanced when it is used in combination with other tests.
Adjusting for cross-cultural differences in computer-adaptive tests of quality of life.
Gibbons, C J; Skevington, S M
2018-04-01
Previous studies using the WHOQOL measures have demonstrated that the relationship between individual items and the underlying quality of life (QoL) construct may differ between cultures. If unaccounted for, these differing relationships can lead to measurement bias which, in turn, can undermine the reliability of results. We used item response theory (IRT) to assess differential item functioning (DIF) in WHOQOL data from diverse language versions collected in UK, Zimbabwe, Russia, and India (total N = 1332). Data were fitted to the partial credit 'Rasch' model. We used four item banks previously derived from the WHOQOL-100 measure, which provided excellent measurement for physical, psychological, social, and environmental quality of life domains (40 items overall). Cross-cultural differential item functioning was assessed using analysis of variance for item residuals and post hoc Tukey tests. Simulated computer-adaptive tests (CATs) were conducted to assess the efficiency and precision of the four items banks. Splitting item parameters by DIF results in four linked item banks without DIF or other breaches of IRT model assumptions. Simulated CATs were more precise and efficient than longer paper-based alternatives. Assessing differential item functioning using item response theory can identify measurement invariance between cultures which, if uncontrolled, may undermine accurate comparisons in computer-adaptive testing assessments of QoL. We demonstrate how compensating for DIF using item anchoring allowed data from all four countries to be compared on a common metric, thus facilitating assessments which were both sensitive to cultural nuance and comparable between countries.
Item analysis of three Spanish naming tests: a cross-cultural investigation.
Marquez de la Plata, Carlos; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C Munro
2009-01-01
Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test's construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (136 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided.
Miller, William C; Deathe, A Barry; Speechley, Mark
2003-05-01
To evaluate the internal consistency, test-retest reliability, and construct validity of the Activities-specific Balance Confidence (ABC) Scale among people who have a lower-limb amputation. Retest design. A university-affiliated outpatient amputee clinic in Ontario. Two samples of individuals who have unilateral transtibial and transfemoral amputation. Sample 1 (n=54) was a consecutive and sample 2 (n=329) a convenience sample of all members of the clinic population. Not applicable. Repeated application of the ABC Scale, a 16-item questionnaire that assesses confidence in performing various mobility-related tasks. Correlation to test hypothesized relationships between the ABC Scale and the 2-minute walk (2MWT) and the timed up-and-go (TUG) tests; and assessment of the ability of the ABC Scale to discriminate among groups based on amputation cause, amputation level, mobility device use, automatic stepping ability, wearing time, stair climbing ability, and walking distance. Test-retest reliability (intraclass correlation coefficient) of the ABC Scale was .91 (95% confidence interval [CI], .84-.95) with individual item test-retest coefficients ranging from .53 to .87. Internal consistency, measured by Cronbach alpha, was .95. Hypothesized associations with the 2MWT and TUG test were observed with correlations of .72 (95% CI, .56-.84) and -.70 (95% CI, -.82 to -.53), respectively. The ABC Scale discriminated between all groups except those based on amputation level. Balance confidence, as measured by the ABC Scale, is a construct that provides unique information potentially useful to clinicians who provide amputee rehabilitation. The ABC Scale is reliable, with strong support for validity. Study of the scale's responsiveness is recommended.
Testing enhances both encoding and retrieval for both tested and untested items.
Cho, Kit W; Neely, James H; Crocco, Stephanie; Vitrano, Deana
2017-07-01
In forward testing effects, taking a test enhances memory for subsequently studied material. These effects have been observed for previously studied and tested items, a potentially item-specific testing effect, and newly studied untested items, a purely generalized testing effect. We directly compared item-specific and generalized forward testing effects using procedures to separate testing benefits due to encoding versus retrieval. Participants studied two lists of Swahili-English word pairs, with the second study list containing "new" pairs intermixed with the previously studied "old" pairs. Participants completed a review phase in which they took a cued-recall test on only the "old" pairs or restudied them. In Experiments 1a, 1b, and 2, the review phase was given either before or after the second study list. Testing benefited memory to the same degree for both "new" and "old" pairs, suggesting that there were no pair-specific benefits of testing. The larger benefit from testing when review was given before rather than after the second study list suggests that the memory enhancement was due to both testing-enhanced encoding and testing-enhanced retrieval. To better equate generalized testing effects for "new" and "old" pairs, Experiment 3 intermixed them in the review phase. A statistically significant pair-specific testing effect for "old" items was now observed. Overall, these results show that forward testing effects are due to both testing-enhanced encoding and retrieval effects and that direct, pair-specific forward testing benefits are considerably smaller than indirect, generalized forward testing benefits.
The Influence of Item Calibration Error on Variable-Length Computerized Adaptive Testing
ERIC Educational Resources Information Center
Patton, Jeffrey M.; Cheng, Ying; Yuan, Ke-Hai; Diao, Qi
2013-01-01
Variable-length computerized adaptive testing (VL-CAT) allows both items and test length to be "tailored" to examinees, thereby achieving the measurement goal (e.g., scoring precision or classification) with as few items as possible. Several popular test termination rules depend on the standard error of the ability estimate, which in turn depends…
A Paradox in the Study of the Benefits of Test-Item Review
ERIC Educational Resources Information Center
van der Linden, Wim J.; Jeon, Minjeong; Ferrara, Steve
2011-01-01
According to a popular belief, test takers should trust their initial instinct and retain their initial responses when they have the opportunity to review test items. More than 80 years of empirical research on item review, however, has contradicted this belief and shown minor but consistently positive score gains for test takers who changed…
Geography Library of Test Items. Volume Four.
ERIC Educational Resources Information Center
Kouimanos, John, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Home Science Library of Test Items. Volume One.
ERIC Educational Resources Information Center
Smith, Jan, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection is reviewed for content validity and reliability. The test…
Languages Library of Test Items. Volume Two: German, Latin.
ERIC Educational Resources Information Center
Campbell, Thomas; And Others
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Languages Library of Test Items. Volume One: French, Indonesian.
ERIC Educational Resources Information Center
Campbell, Thomas; And Others
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography Library of Test Items. Volume Three.
ERIC Educational Resources Information Center
Kouimanos, John, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Commerce Library of Test Items. Volume One.
ERIC Educational Resources Information Center
Meeve, Brian, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography Library of Test Items. Volume Five.
ERIC Educational Resources Information Center
Kouimanos, John, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Textiles and Design Library of Test Items. Volume I.
ERIC Educational Resources Information Center
Smith, Jan, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection is reviewed for content validity and reliability. The test…
Commerce Library of Test Items. Volume Two.
ERIC Educational Resources Information Center
Meeve, Brian, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography Library of Test Items. Volume Six.
ERIC Educational Resources Information Center
Kouimanos, John, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Geography: Library of Test Items. Volume II.
ERIC Educational Resources Information Center
Kouimanos, John, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Sex Differences in the Tendency to Omit Items on Multiple-Choice Tests: 1980-2000
ERIC Educational Resources Information Center
von Schrader, Sarah; Ansley, Timothy
2006-01-01
Much has been written concerning the potential group differences in responding to multiple-choice achievement test items. This discussion has included references to possible disparities in tendency to omit such test items. When test scores are used for high-stakes decision making, even small differences in scores and rankings that arise from male…
A Person Fit Test for IRT Models for Polytomous Items
ERIC Educational Resources Information Center
Glas, C. A. W.; Dagohoy, Anna Villa T.
2007-01-01
A person fit test based on the Lagrange multiplier test is presented for three item response theory models for polytomous items: the generalized partial credit model, the sequential model, and the graded response model. The test can also be used in the framework of multidimensional ability parameters. It is shown that the Lagrange multiplier…
How Big Is Big Enough? Sample Size Requirements for CAST Item Parameter Estimation
ERIC Educational Resources Information Center
Chuah, Siang Chee; Drasgow, Fritz; Luecht, Richard
2006-01-01
Adaptive tests offer the advantages of reduced test length and increased accuracy in ability estimation. However, adaptive tests require large pools of precalibrated items. This study looks at the development of an item pool for 1 type of adaptive administration: the computer-adaptive sequential test. An important issue is the sample size required…
An Explanatory Item Response Theory Approach for a Computer-Based Case Simulation Test
ERIC Educational Resources Information Center
Kahraman, Nilüfer
2014-01-01
Problem: Practitioners working with multiple-choice tests have long utilized Item Response Theory (IRT) models to evaluate the performance of test items for quality assurance. The use of similar applications for performance tests, however, is often encumbered due to the challenges encountered in working with complicated data sets in which local…
Geography Library of Test Items. Volume One.
ERIC Educational Resources Information Center
Kouimanos, John, Ed.
As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…
Electronic Quality of Life Assessment Using Computer-Adaptive Testing
2016-01-01
Background Quality of life (QoL) questionnaires are desirable for clinical practice but can be time-consuming to administer and interpret, making their widespread adoption difficult. Objective Our aim was to assess the performance of the World Health Organization Quality of Life (WHOQOL)-100 questionnaire as four item banks to facilitate adaptive testing using simulated computer adaptive tests (CATs) for physical, psychological, social, and environmental QoL. Methods We used data from the UK WHOQOL-100 questionnaire (N=320) to calibrate item banks using item response theory, which included psychometric assessments of differential item functioning, local dependency, unidimensionality, and reliability. We simulated CATs to assess the number of items administered before prespecified levels of reliability was met. Results The item banks (40 items) all displayed good model fit (P>.01) and were unidimensional (fewer than 5% of t tests significant), reliable (Person Separation Index>.70), and free from differential item functioning (no significant analysis of variance interaction) or local dependency (residual correlations < +.20). When matched for reliability, the item banks were between 45% and 75% shorter than paper-based WHOQOL measures. Across the four domains, a high standard of reliability (alpha>.90) could be gained with a median of 9 items. Conclusions Using CAT, simulated assessments were as reliable as paper-based forms of the WHOQOL with a fraction of the number of items. These properties suggest that these item banks are suitable for computerized adaptive assessment. These item banks have the potential for international development using existing alternative language versions of the WHOQOL items. PMID:27694100
Bernhardt, Jay M; Stellefson, Michael; Weiler, Robert M; Anderson-Lewis, Charkarra; Miller, M David; MacInnes, Jann
2015-01-01
Background Social media can promote healthy behaviors by facilitating engagement and collaboration among health professionals and the public. Thus, social media is quickly becoming a vital tool for health promotion. While guidelines and trainings exist for public health professionals, there are currently no standardized measures to assess individual social media competency among Certified Health Education Specialists (CHES) and Master Certified Health Education Specialists (MCHES). Objective The aim of this study was to design, develop, and test the Social Media Competency Inventory (SMCI) for CHES and MCHES. Methods The SMCI was designed in three sequential phases: (1) Conceptualization and Domain Specifications, (2) Item Development, and (3) Inventory Testing and Finalization. Phase 1 consisted of a literature review, concept operationalization, and expert reviews. Phase 2 involved an expert panel (n=4) review, think-aloud sessions with a small representative sample of CHES/MCHES (n=10), a pilot test (n=36), and classical test theory analyses to develop the initial version of the SMCI. Phase 3 included a field test of the SMCI with a random sample of CHES and MCHES (n=353), factor and Rasch analyses, and development of SMCI administration and interpretation guidelines. Results Six constructs adapted from the unified theory of acceptance and use of technology and the integrated behavioral model were identified for assessing social media competency: (1) Social Media Self-Efficacy, (2) Social Media Experience, (3) Effort Expectancy, (4) Performance Expectancy, (5) Facilitating Conditions, and (6) Social Influence. The initial item pool included 148 items. After the pilot test, 16 items were removed or revised because of low item discrimination (r<.30), high interitem correlations (Ρ>.90), or based on feedback received from pilot participants. During the psychometric analysis of the field test data, 52 items were removed due to low discrimination, evidence of content redundancy, low R-squared value, or poor item infit or outfit. Psychometric analyses of the data revealed acceptable reliability evidence for the following scales: Social Media Self-Efficacy (alpha=.98, item reliability=.98, item separation=6.76), Social Media Experience (alpha=.98, item reliability=.98, item separation=6.24), Effort Expectancy(alpha =.74, item reliability=.95, item separation=4.15), Performance Expectancy (alpha =.81, item reliability=.99, item separation=10.09), Facilitating Conditions (alpha =.66, item reliability=.99, item separation=16.04), and Social Influence (alpha =.66, item reliability=.93, item separation=3.77). There was some evidence of local dependence among the scales, with several observed residual correlations above |.20|. Conclusions Through the multistage instrument-development process, sufficient reliability and validity evidence was collected in support of the purpose and intended use of the SMCI. The SMCI can be used to assess the readiness of health education specialists to effectively use social media for health promotion research and practice. Future research should explore associations across constructs within the SMCI and evaluate the ability of SMCI scores to predict social media use and performance among CHES and MCHES. PMID:26399428
Alber, Julia M; Bernhardt, Jay M; Stellefson, Michael; Weiler, Robert M; Anderson-Lewis, Charkarra; Miller, M David; MacInnes, Jann
2015-09-23
Social media can promote healthy behaviors by facilitating engagement and collaboration among health professionals and the public. Thus, social media is quickly becoming a vital tool for health promotion. While guidelines and trainings exist for public health professionals, there are currently no standardized measures to assess individual social media competency among Certified Health Education Specialists (CHES) and Master Certified Health Education Specialists (MCHES). The aim of this study was to design, develop, and test the Social Media Competency Inventory (SMCI) for CHES and MCHES. The SMCI was designed in three sequential phases: (1) Conceptualization and Domain Specifications, (2) Item Development, and (3) Inventory Testing and Finalization. Phase 1 consisted of a literature review, concept operationalization, and expert reviews. Phase 2 involved an expert panel (n=4) review, think-aloud sessions with a small representative sample of CHES/MCHES (n=10), a pilot test (n=36), and classical test theory analyses to develop the initial version of the SMCI. Phase 3 included a field test of the SMCI with a random sample of CHES and MCHES (n=353), factor and Rasch analyses, and development of SMCI administration and interpretation guidelines. Six constructs adapted from the unified theory of acceptance and use of technology and the integrated behavioral model were identified for assessing social media competency: (1) Social Media Self-Efficacy, (2) Social Media Experience, (3) Effort Expectancy, (4) Performance Expectancy, (5) Facilitating Conditions, and (6) Social Influence. The initial item pool included 148 items. After the pilot test, 16 items were removed or revised because of low item discrimination (r<.30), high interitem correlations (Ρ>.90), or based on feedback received from pilot participants. During the psychometric analysis of the field test data, 52 items were removed due to low discrimination, evidence of content redundancy, low R-squared value, or poor item infit or outfit. Psychometric analyses of the data revealed acceptable reliability evidence for the following scales: Social Media Self-Efficacy (alpha=.98, item reliability=.98, item separation=6.76), Social Media Experience (alpha=.98, item reliability=.98, item separation=6.24), Effort Expectancy(alpha =.74, item reliability=.95, item separation=4.15), Performance Expectancy (alpha =.81, item reliability=.99, item separation=10.09), Facilitating Conditions (alpha =.66, item reliability=.99, item separation=16.04), and Social Influence (alpha =.66, item reliability=.93, item separation=3.77). There was some evidence of local dependence among the scales, with several observed residual correlations above |.20|. Through the multistage instrument-development process, sufficient reliability and validity evidence was collected in support of the purpose and intended use of the SMCI. The SMCI can be used to assess the readiness of health education specialists to effectively use social media for health promotion research and practice. Future research should explore associations across constructs within the SMCI and evaluate the ability of SMCI scores to predict social media use and performance among CHES and MCHES.
ERIC Educational Resources Information Center
Lee, Guemin; Park, In-Yong
2012-01-01
Previous assessments of the reliability of test scores for testlet-composed tests have indicated that item-based estimation methods overestimate reliability. This study was designed to address issues related to the extent to which item-based estimation methods overestimate the reliability of test scores composed of testlets and to compare several…
Harrison, Peter M C; Collins, Tom; Müllensiefen, Daniel
2017-06-15
Modern psychometric theory provides many useful tools for ability testing, such as item response theory, computerised adaptive testing, and automatic item generation. However, these techniques have yet to be integrated into mainstream psychological practice. This is unfortunate, because modern psychometric techniques can bring many benefits, including sophisticated reliability measures, improved construct validity, avoidance of exposure effects, and improved efficiency. In the present research we therefore use these techniques to develop a new test of a well-studied psychological capacity: melodic discrimination, the ability to detect differences between melodies. We calibrate and validate this test in a series of studies. Studies 1 and 2 respectively calibrate and validate an initial test version, while Studies 3 and 4 calibrate and validate an updated test version incorporating additional easy items. The results support the new test's viability, with evidence for strong reliability and construct validity. We discuss how these modern psychometric techniques may also be profitably applied to other areas of music psychology and psychological science in general.
Application of Item Response Theory to Tests of Substance-related Associative Memory
Shono, Yusuke; Grenard, Jerry L.; Ames, Susan L.; Stacy, Alan W.
2015-01-01
A substance-related word association test (WAT) is one of the commonly used indirect tests of substance-related implicit associative memory and has been shown to predict substance use. This study applied an item response theory (IRT) modeling approach to evaluate psychometric properties of the alcohol- and marijuana-related WATs and their items among 775 ethnically diverse at-risk adolescents. After examining the IRT assumptions, item fit, and differential item functioning (DIF) across gender and age groups, the original 18 WAT items were reduced to 14- and 15-items in the alcohol- and marijuana-related WAT, respectively. Thereafter, unidimensional one- and two-parameter logistic models (1PL and 2PL models) were fitted to the revised WAT items. The results demonstrated that both alcohol- and marijuana-related WATs have good psychometric properties. These results were discussed in light of the framework of a unified concept of construct validity (Messick, 1975, 1989, 1995). PMID:25134051
Nahathai, Wongpakaran
2012-01-01
Objective The Rosenberg Self-Esteem Scale (RSES) is a widely used instrument that has been tested for reliability and validity in many settings; however, some negative-worded items appear to have caused it to reveal low reliability in a number of studies. In this study, we revised one negative item that had previously (from the previous studies) produced the worst outcome in terms of the structure of the scale, then re-analyzed the new version for its reliability and construct validity, comparing it to the original version with respect to fit indices. Methods In total, 851 students from Chiang Mai University (mean age: 19.51±1.7, 57% of whom were female), participated in this study. Of these, 664 students completed the Thai version of the original RSES - containing five positively worded and five negatively worded items, while 187 students used the revised version containing six positively worded and four negatively worded items. Confirmatory factor analysis was applied, using a uni-dimensional model with method effects and a correlated uniqueness approach. Results The revised version showed the same level of reliability (good) as the original, but yielded a better model fit. The revised RSES demonstrated excellent fit statistics, with χ2=29.19 (df=19, n=187, p=0.063), GFI=0.970, TFI=0.969, NFI=0.964, CFI=0.987, SRMR=0.040 and RMSEA=0.054. Conclusion The revised version of the Thai RSES demonstrated an equivalent level of reliability but a better construct validity when compared to the original. PMID:22396685
Smith, William Pastor
2013-09-01
The primary purpose of this two-phased study was to examine the structural validity and statistical utility of a racism scale specific to Black men who have sex with men (MSM) who resided in the Washington, DC, metropolitan area and Baltimore, Maryland. Phase I involved pretesting a 10-item racism measure with 20 Black MSM. Based on pretest findings, the scale was adapted into a 21-item racism scale for use in collecting data on 166 respondents in Phase II. Exploratory factor analysis of the 21-item racism scale resulted in a 19-item, two-factor solution. The two factors or subscales were the following: General Racism and Relationships and Racism. Confirmatory factor analysis was used in testing construct validity of the factored racism scale. Specifically, the two racism factors were combined with three homophobia factors into a confirmatory factor analysis model. Based on a summary of the fit indices, both comparative and incremental were equal to .90, suggesting an adequate convergence of the racism and homophobia dimensions into a single social oppression construct. Statistical utility of the two racism subscales was demonstrated when regression analysis revealed that the gay-identified men versus bisexual-identified men in the sample were more likely to experience increased racism within the context of intimate relationships and less likely to be exposed to repeated experiences of general racism. Overall, the findings in this study highlight the importance of continuing to explore the psychometric properties of a racism scale that accounts for the unique psychosocial concerns experienced by Black MSM.
Wongpakaran, Tinakon; Tinakon, Wongpakaran; Wongpakaran, Nahathai; Nahathai, Wongpakaran
2012-03-01
The Rosenberg Self-Esteem Scale (RSES) is a widely used instrument that has been tested for reliability and validity in many settings; however, some negative-worded items appear to have caused it to reveal low reliability in a number of studies. In this study, we revised one negative item that had previously (from the previous studies) produced the worst outcome in terms of the structure of the scale, then re-analyzed the new version for its reliability and construct validity, comparing it to the original version with respect to fit indices. In total, 851 students from Chiang Mai University (mean age: 19.51±1.7, 57% of whom were female), participated in this study. Of these, 664 students completed the Thai version of the original RSES - containing five positively worded and five negatively worded items, while 187 students used the revised version containing six positively worded and four negatively worded items. Confirmatory factor analysis was applied, using a uni-dimensional model with method effects and a correlated uniqueness approach. The revised version showed the same level of reliability (good) as the original, but yielded a better model fit. The revised RSES demonstrated excellent fit statistics, with χ²=29.19 (df=19, n=187, p=0.063), GFI=0.970, TFI=0.969, NFI=0.964, CFI=0.987, SRMR=0.040 and RMSEA=0.054. The revised version of the Thai RSES demonstrated an equivalent level of reliability but a better construct validity when compared to the original.
Bäuml, Karl-Heinz T; Holterman, Christoph; Abel, Magdalena
2014-11-01
The testing effect refers to the finding that retrieval practice in comparison to restudy of previously encoded contents can improve memory performance and reduce time-dependent forgetting. Naturally, long retention intervals include both wake and sleep delay, which can influence memory contents differently. In fact, sleep immediately after encoding can induce a mnemonic benefit, stabilizing and strengthening the encoded contents. We investigated in a series of 5 experiments whether sleep influences the testing effect. After initial study of categorized item material (Experiments 1, 2, and 4A), paired associates (Experiment 3), or educational text material (Experiment 4B), subjects were asked to restudy encoded contents or engage in active retrieval practice. A final recall test was conducted after a 12-hr delay that included diurnal wakefulness or nocturnal sleep. The results consistently showed typical testing effects after the wake delay. However, these testing effects were reduced or even eliminated after sleep, because sleep benefited recall of restudied items but left recall of retrieved items unaffected. The findings are consistent with the bifurcation model of the testing effect (Kornell, Bjork, & Garcia, 2011), according to which the distribution of memory strengths across items is shifted differentially by retrieving and restudying, with retrieval strengthening items to a much higher degree than restudy does. On the basis of this model, most of the retrieved items already fall above recall threshold in the absence of sleep, so additional sleep-induced strengthening may not improve recall of retrieved items any further. PsycINFO Database Record (c) 2014 APA, all rights reserved.
ERIC Educational Resources Information Center
van der Linden, Wim J.; Scrams, David J.; Schnipke, Deborah L.
This paper proposes an item selection algorithm that can be used to neutralize the effect of time limits in computer adaptive testing. The method is based on a statistical model for the response-time distributions of the test takers on the items in the pool that is updated each time a new item has been administered. Predictions from the model are…
Identification of metallic items that caused nickel dermatitis in Danish patients.
Thyssen, Jacob P; Menné, Torkil; Johansen, Jeanne D
2010-09-01
Nickel allergy is prevalent as assessed by epidemiological studies. In an attempt to further identify and characterize sources that may result in nickel allergy and dermatitis, we analysed items identified by nickel-allergic dermatitis patients as causative of nickel dermatitis by using the dimethylglyoxime (DMG) test. Dermatitis patients with nickel allergy of current relevance were identified over a 2-year period in a tertiary referral patch test centre. When possible, their work tools and personal items were examined with the DMG test. Among 95 nickel-allergic dermatitis patients, 70 (73.7%) had metallic items investigated for nickel release. A total of 151 items were investigated, and 66 (43.7%) gave positive DMG test reactions. Objects were nearly all purchased or acquired after the introduction of the EU Nickel Directive. Only one object had been inherited, and only two objects had been purchased outside of Denmark. DMG testing is valuable as a screening test for nickel release and should be used to identify relevant exposures in nickel-allergic patients. Mainly consumer items, but also work tools used in an occupational setting, released nickel in dermatitis patients. This study confirmed 'risk items' from previous studies, including mobile phones.
A Comparison of the One-and Three-Parameter Logistic Models on Measures of Test Efficiency.
ERIC Educational Resources Information Center
Benson, Jeri
Two methods of item selection were used to select sets of 40 items from a 50-item verbal analogies test, and the resulting item sets were compared for relative efficiency. The BICAL program was used to select the 40 items having the best mean square fit to the one parameter logistic (Rasch) model. The LOGIST program was used to select the 40 items…
ERIC Educational Resources Information Center
Liu, Jinghua; Zu, Jiyun; Curley, Edward; Carey, Jill
2014-01-01
The purpose of this study is to investigate the impact of discrete anchor items versus passage-based anchor items on observed score equating using empirical data.This study compares an "SAT"® critical reading anchor that contains more discrete items proportionally, compared to the total tests to be equated, to another anchor that…
Computerized Adaptive Testing: Overview and Introduction.
ERIC Educational Resources Information Center
Meijer, Rob R.; Nering, Michael L.
1999-01-01
Provides an overview of computerized adaptive testing (CAT) and introduces contributions to this special issue. CAT elements discussed include item selection, estimation of the latent trait, item exposure, measurement precision, and item-bank development. (SLD)
Flens, Gerard; Smits, Niels; Terwee, Caroline B; Dekker, Joost; Huijbrechts, Irma; de Beurs, Edwin
2017-03-01
We developed a Dutch-Flemish version of the patient-reported outcomes measurement information system (PROMIS) adult V1.0 item bank for depression as input for computerized adaptive testing (CAT). As item bank, we used the Dutch-Flemish translation of the original PROMIS item bank (28 items) and additionally translated 28 U.S. depression items that failed to make the final U.S. item bank. Through psychometric analysis of a combined clinical and general population sample ( N = 2,010), 8 added items were removed. With the final item bank, we performed several CAT simulations to assess the efficiency of the extended (48 items) and the original item bank (28 items), using various stopping rules. Both item banks resulted in highly efficient and precise measurement of depression and showed high similarity between the CAT simulation scores and the full item bank scores. We discuss the implications of using each item bank and stopping rule for further CAT development.
ERIC Educational Resources Information Center
Swiggett, Wanda D.; Kotloff, Laurie; Ezzo, Chelsea; Adler, Rachel; Oliveri, Maria Elena
2014-01-01
The computer-based "Graduate Record Examinations"® ("GRE"®) revised General Test includes interactive item types and testing environment tools (e.g., test navigation, on-screen calculator, and help). How well do test takers understand these innovations? If test takers do not understand the new item types, these innovations may…
Christens, Brian D; Speer, Paul W; Peterson, N Andrew
2016-06-01
How well do self-reported levels of community and organizational participation align with recorded acts of community and organizational participation? This study explores this question among participants in social action community organizing initiatives by comparing responses on a community participation scale designed to retrospectively assess community participation (T1, n = 482; T2, n = 220) with individual participants' attendance records in various social action organizing activities over two 1-year periods. By testing the self-reported measure's overall and item-by-item association with documented participation in various types of organizing activities, we find that the self-report measure is positively, but weakly correlated with actual participation levels in community organizing activities. Moreover, associations between self-report and recorded acts of participation differ by types of activity. Examining this unique source of data raises important questions about how community participation is conceptualized and measured in our field. Implications are explored for theory and measurement of participation in community and organizational contexts. © Society for Community Research and Action 2016.
Severity of Organized Item Theft in Computerized Adaptive Testing: A Simulation Study
ERIC Educational Resources Information Center
Yi, Qing; Zhang, Jinming; Chang, Hua-Hua
2008-01-01
Criteria had been proposed for assessing the severity of possible test security violations for computerized tests with high-stakes outcomes. However, these criteria resulted from theoretical derivations that assumed uniformly randomized item selection. This study investigated potential damage caused by organized item theft in computerized adaptive…
Detecting Item Drift in Large-Scale Testing
ERIC Educational Resources Information Center
Guo, Hongwen; Robin, Frederic; Dorans, Neil
2017-01-01
The early detection of item drift is an important issue for frequently administered testing programs because items are reused over time. Unfortunately, operational data tend to be very sparse and do not lend themselves to frequent monitoring analyses, particularly for on-demand testing. Building on existing residual analyses, the authors propose…
Tree versus Geometric Representation of Tests and Items.
ERIC Educational Resources Information Center
Beller, Michael
1990-01-01
Geometric approaches to representing interrelations among tests and items are compared with an additive tree model (ATM), using 2,644 examinees and 2 other data sets. The ATM's close fit to the data and its coherence of presentation indicate that it is the best means of representing tests and items. (TJH)
Superficial Priming in Episodic Recognition
ERIC Educational Resources Information Center
Dopkins, Stephen; Sargent, Jesse; Ngo, Catherine T.
2010-01-01
We explored the effect of superficial priming in episodic recognition and found it to be different from the effect of semantic priming in episodic recognition. Participants made recognition judgments to pairs of items, with each pair consisting of a prime item and a test item. Correct positive responses to the test item were impeded if the prime…
Statistical Indexes for Monitoring Item Behavior under Computer Adaptive Testing Environment.
ERIC Educational Resources Information Center
Zhu, Renbang; Yu, Feng; Liu, Su
A computerized adaptive test (CAT) administration usually requires a large supply of items with accurately estimated psychometric properties, such as item response theory (IRT) parameter estimates, to ensure the precision of examinee ability estimation. However, an estimated IRT model of a given item in any given pool does not always correctly…
Using Item Response Theory to Describe the Nonverbal Literacy Assessment (NVLA)
ERIC Educational Resources Information Center
Fleming, Danielle; Wilson, Mark; Ahlgrim-Delzell, Lynn
2018-01-01
The Nonverbal Literacy Assessment (NVLA) is a literacy assessment designed for students with significant intellectual disabilities. The 218-item test was initially examined using confirmatory factor analysis. This method showed that the test worked as expected, but the items loaded onto a single factor. This article uses item response theory to…
Aggregating Polytomous DIF Results over Multiple Test Administrations
ERIC Educational Resources Information Center
Zwick, Rebecca; Ye, Lei; Isham, Steven
2018-01-01
In typical differential item functioning (DIF) assessments, an item's DIF status is not influenced by its status in previous test administrations. An item that has shown DIF at multiple administrations may be treated the same way as an item that has shown DIF in only the most recent administration. Therefore, much useful information about the…
A Comparison of Linking and Concurrent Calibration under the Graded Response Model.
ERIC Educational Resources Information Center
Kim, Seock-Ho; Cohen, Allan S.
Applications of item response theory to practical testing problems including equating, differential item functioning, and computerized adaptive testing, require that item parameter estimates be placed onto a common metric. In this study, two methods for developing a common metric for the graded response model under item response theory were…
ERIC Educational Resources Information Center
Missouri State Dept. of Elementary and Secondary Education, Jefferson City.
This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to fifth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
ERIC Educational Resources Information Center
Nitko, Anthony J.; Hsu, Tse-chi
Item analysis procedures appropriate for domain-referenced classroom testing are described. A conceptual framework within which item statistics can be considered and promising statistics in light of this framework are presented. The sampling fluctuations of the more promising item statistics for sample sizes comparable to the typical classroom…
ERIC Educational Resources Information Center
Bennett, Randy Elliot; And Others
1990-01-01
The relationship of an expert-system-scored constrained free-response item type to multiple-choice and free-response items was studied using data for 614 students on the College Board's Advanced Placement Computer Science (APCS) Examination. Implications for testing and the APCS test are discussed. (SLD)
Fissile interrogation using gamma rays from oxygen
Smith, Donald; Micklich, Bradley J.; Fessler, Andreas
2004-04-20
The subject apparatus provides a means to identify the presence of fissionable material or other nuclear material contained within an item to be tested. The system employs a portable accelerator to accelerate and direct protons to a fluorine-compound target. The interaction of the protons with the fluorine-compound target produces gamma rays which are directed at the item to be tested. If the item to be tested contains either a fissionable material or other nuclear material the interaction of the gamma rays with the material contained within the test item with result in the production of neutrons. A system of neutron detectors is positioned to intercept any neutrons generated by the test item. The results from the neutron detectors are analyzed to determine the presence of a fissionable material or other nuclear material.
Validation of a clinical critical thinking skills test in nursing.
Shin, Sujin; Jung, Dukyoo; Kim, Sungeun
2015-01-27
The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability.
Validation of a clinical critical thinking skills test in nursing
2015-01-01
Purpose: The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. Methods: This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Results: Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. Conclusion: From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability. PMID:25622716
ERIC Educational Resources Information Center
Schroeders, Ulrich; Robitzsch, Alexander; Schipolowski, Stefan
2014-01-01
C-tests are a specific variant of cloze tests that are considered time-efficient, valid indicators of general language proficiency. They are commonly analyzed with models of item response theory assuming local item independence. In this article we estimated local interdependencies for 12 C-tests and compared the changes in item difficulties,…
ERIC Educational Resources Information Center
Lynch, Mervin D.; Chaves, John
Items from Peirs-Harris and Coopersmith self-concept tests were evaluated against independent measures on three self-constructs, idealized, empathic, and worth. Construct measurements were obtained with the semantic differential and D statistic. Ratings were obtained from 381 children, grades 4-6. For each test, item ratings and construct measures…
ERIC Educational Resources Information Center
Browning, Robert; And Others
1979-01-01
Effects that item order and basal and ceiling rules have on test means, variances, and internal consistency estimates for the Peabody Individual Achievement Test mathematics and reading recognition subtests were examined. Items on the math and reading recognition subtests were significantly easier or harder than test placements indicated. (Author)
Current State of Test Development, Administration, and Analysis: A Study of Faculty Practices.
Bristol, Timothy J; Nelson, John W; Sherrill, Karin J; Wangerin, Virginia S
Developing valid and reliable test items is a critical skill for nursing faculty. This research analyzed the test item writing practice of 674 nursing faculty. Relationships between faculty characteristics and their test item writing practices were analyzed. Findings reveal variability in practice and a gap in implementation of evidence-based standards when developing and evaluating teacher-made examinations.
A Review of Guidelines on Home Drug Testing Websites for Parents
Washio, Yukiko; Fairfax-Columbo, Jaymes; Ball, Emily; Cassey, Heather; Arria, Amelia M.; Bresani, Elena; Curtis, Brenda L.; Kirby, Kimberly C.
2014-01-01
Purpose To update and extend prior work reviewing websites that discuss home drug testing for parents and assess the quality of information that the websites provide to assist them to decide when and how to use home drug testing. Methods We conducted a world-wide web search that identified eight websites providing information for parents on home drug testing. We assessed the information on the sites using checklist developed with field experts in adolescent substance abuse and psychosocial interventions that focus on urine testing. Results None of the websites covered all of items on the 24-item checklist, and only three covered at least half of the items (12, 14, and 21 items, respectively). The five remaining websites covered less than half the checklist items. The mean number of items covered by the websites was 11. Conclusions Among the websites that we reviewed, few provided thorough information to parents regarding empirically-supported strategies to effectively use drug testing to intervene on adolescent substance use. Furthermore, most websites did not provide thorough information regarding the risks and benefits to inform parents’ decision to use home drug testing. Empirical evidence regarding efficacy, benefits, risks, and limitations of home drug testing is needed. PMID:25026103
ERIC Educational Resources Information Center
New South Wales Dept. of Education, Sydney (Australia).
Continuing a series of short tests aimed at measuring student mastery of specific skills in the natural sciences, this supplementary volume includes teachers' notes, a users' guide and inspection copies of test items 27 to 50. Answer keys and test scoring statistics are provided. The items are designed for grades 7 through 10, and a list of the…
ERIC Educational Resources Information Center
Weiss, David J., Ed.
This symposium consists of five papers and presents some recent developments in adaptive testing which have applications to several military testing problems. The overview, by James R. McBride, defines adaptive testing and discusses some of its item selection and scoring strategies. Item response theory, or item characteristic curve theory, is…
DeGeest, David Scott; Schmidt, Frank
2015-01-01
Our objective was to apply the rigorous test developed by Browne (1992) to determine whether the circumplex model fits Big Five personality data. This test has yet to be applied to personality data. Another objective was to determine whether blended items explained correlations among the Big Five traits. We used two working adult samples, the Eugene-Springfield Community Sample and the Professional Worker Career Experience Survey. Fit to the circumplex was tested via Browne's (1992) procedure. Circumplexes were graphed to identify items with loadings on multiple traits (blended items), and to determine whether removing these items changed five-factor model (FFM) trait intercorrelations. In both samples, the circumplex structure fit the FFM traits well. Each sample had items with dual-factor loadings (8 items in the first sample, 21 in the second). Removing blended items had little effect on construct-level intercorrelations among FFM traits. We conclude that rigorous tests show that the fit of personality data to the circumplex model is good. This finding means the circumplex model is competitive with the factor model in understanding the organization of personality traits. The circumplex structure also provides a theoretically and empirically sound rationale for evaluating intercorrelations among FFM traits. Even after eliminating blended items, FFM personality traits remained correlated.
[Mokken scaling of the Cognitive Screening Test].
Diesfeldt, H F A
2009-10-01
The Cognitive Screening Test (CST) is a twenty-item orientation questionnaire in Dutch, that is commonly used to evaluate cognitive impairment. This study applied Mokken Scale Analysis, a non-parametric set of techniques derived from item response theory (IRT), to CST-data of 466 consecutive participants in psychogeriatric day care. The full item set and the standard short version of fourteen items both met the assumptions of the monotone homogeneity model, with scalability coefficient H = 0.39, which is considered weak. In order to select items that would fulfil the assumption of invariant item ordering or the double monotonicity model, the subjects were randomly partitioned into a training set (50% of the sample) and a test set (the remaining half). By means of an automated item selection eleven items were found to measure one latent trait, with H = 0.67 and item H coefficients larger than 0.51. Cross-validation of the item analysis in the remaining half of the subjects gave comparable values (H = 0.66; item H coefficients larger than 0.56). The selected items involve year, place of residence, birth date, the monarch's and prime minister's names, and their predecessors. Applying optimal discriminant analysis (ODA) it was found that the full set of twenty CST items performed best in distinguishing two predefined groups of patients of lower or higher cognitive ability, as established by an independent criterion derived from the Amsterdam Dementia Screening Test. The chance corrected predictive value or prognostic utility was 47.5% for the full item set, 45.2% for the fourteen items of the standard short version of the CST, and 46.1% for the homogeneous, unidimensional set of selected eleven items. The results of the item analysis support the application of the CST in cognitive assessment, and revealed a more reliable 'short' version of the CST than the standard short version (CST14).
Jones, G L; Morrell, C J; Cooke, J M; Speier, D; Anumba, D; Stewart-Brown, S
2011-09-01
To develop and psychometrically evaluate two questionnaires measuring both positive and negative postnatal health of mothers (M-PHI) and fathers (F-PHI) during the first year of parenting. The M-PHI and the F-PHI were developed in four stages. Stage 1: Postnatal women's focus group (M-PHI) and postnatal fathers' postal questionnaire (F-PHI); Stage 2: Qualitative interviews; Stage 3: Pilot postal survey and main postal survey; and Stage 4: Test-retest postal survey. The M-PHI consisted of a 29-item core questionnaire with six main scales and five conditional scales. The F-PHI consisted of a 27-item questionnaire with six main scales. All scales achieved good internal reliability (Cronbach's α 0.66-0.87 for M-PHI, 0.72-0.90 for F-PHI). Intraclass correlation coefficients demonstrated high test-retest reliability (0.60-0.88). Correlation coefficients supported the criterion validity of the M-PHI and the F-PHI when tested against the Short-Form-12 (SF-12), Edinburgh Postnatal Depression Scale (EPDS) and the Warwick and Edinburgh Mental Well-Being Scale (WEMWBS). The M-PHI and F-PHI are valid, reliable, parent-generated instruments. These unique instruments will be invaluable for practitioners wishing to promote family-centred care and for trialists and other researchers requiring a validated instrument to measure both positive and negative health during the first postnatal year, as to date no such measurement has existed.
Spacing and lag effects in free recall of pure lists.
Kahana, Michael J; Howard, Marc W
2005-02-01
Repeating list items leads to better recall when the repetitions are separated by several unique items than when they are presented successively; the spacing effect refers to improved recall for spaced versus successive repetition (lag > 0 vs. lag = 0); the lag effect refers to improved recall for long lags versus short lags. Previous demonstrations of the lag effect have utilized lists containing a mixture of items with varying degrees of spacing. Because differential rehearsal of items in mixed lists may exaggerate any effects of spacing, it is important to demonstrate these effects in pure lists. As in Toppino and Schneider (1999), we found an overall advantage for recall of spaced lists. We further report the first demonstration of a lag effect in pure lists, with significantly better recall for lists with widely spaced repetitions than for those with moderately spaced repetitions.
Osth, Adam F; Jansson, Anna; Dennis, Simon; Heathcote, Andrew
2018-08-01
A robust finding in recognition memory is that performance declines monotonically across test trials. Despite the prevalence of this decline, there is a lack of consensus on the mechanism responsible. Three hypotheses have been put forward: (1) interference is caused by learning of test items (2) the test items cause a shift in the context representation used to cue memory and (3) participants change their speed-accuracy thresholds through the course of testing. We implemented all three possibilities in a combined model of recognition memory and decision making, which inherits the memory retrieval elements of the Osth and Dennis (2015) model and uses the diffusion decision model (DDM: Ratcliff, 1978) to generate choice and response times. We applied the model to four datasets that represent three challenges, the findings that: (1) the number of test items plays a larger role in determining performance than the number of studied items, (2) performance decreases less for strong items than weak items in pure lists but not in mixed lists, and (3) lexical decision trials interspersed between recognition test trials do not increase the rate at which performance declines. Analysis of the model's parameter estimates suggests that item interference plays a weak role in explaining the effects of recognition testing, while context drift plays a very large role. These results are consistent with prior work showing a weak role for item noise in recognition memory and that retrieval is a strong cause of context change in episodic memory. Copyright © 2018 Elsevier Inc. All rights reserved.
48 CFR 246.710 - Solicitation provision and contract clauses.
Code of Federal Regulations, 2014 CFR
2014-10-01
... of Construction (Germany), instead of the clause at FAR 52.246-21, Warranty of Construction, in... performance will be in Germany. (3)(i) In addition to 252.211-7003, Item Unique Identification and Valuation...
Multistage Computerized Adaptive Testing with Uniform Item Exposure
ERIC Educational Resources Information Center
Edwards, Michael C.; Flora, David B.; Thissen, David
2012-01-01
This article describes a computerized adaptive test (CAT) based on the uniform item exposure multi-form structure (uMFS). The uMFS is a specialization of the multi-form structure (MFS) idea described by Armstrong, Jones, Berliner, and Pashley (1998). In an MFS CAT, the examinee first responds to a small fixed block of items. The items comprising…
Primary Science Assessment Item Setters' Misconceptions Concerning the State Changes of Water
ERIC Educational Resources Information Center
Boo, Hong Kwen
2006-01-01
Assessment is an integral and vital part of teaching and learning, providing feedback on progress through the assessment period to both learners and teachers. However, if test items are flawed because of misconceptions held by the questions setter, then such test items are invalid as assessment tools. Moreover, such flawed items are also likely to…
Stratified and Maximum Information Item Selection Procedures in Computer Adaptive Testing
ERIC Educational Resources Information Center
Deng, Hui; Ansley, Timothy; Chang, Hua-Hua
2010-01-01
In this study we evaluated and compared three item selection procedures: the maximum Fisher information procedure (F), the a-stratified multistage computer adaptive testing (CAT) (STR), and a refined stratification procedure that allows more items to be selected from the high a strata and fewer items from the low a strata (USTR), along with…
Assessment of Differential Item Functioning in Testlet-Based Items Using the Rasch Testlet Model
ERIC Educational Resources Information Center
Wang, Wen-Chung; Wilson, Mark
2005-01-01
This study presents a procedure for detecting differential item functioning (DIF) for dichotomous and polytomous items in testlet-based tests, whereby DIF is taken into account by adding DIF parameters into the Rasch testlet model. Simulations were conducted to assess recovery of the DIF and other parameters. Two independent variables, test type…
Ethnic Group Bias in Intelligence Test Items.
ERIC Educational Resources Information Center
Scheuneman, Janice
In previous studies of ethnic group bias in intelligence test items, the question of bias has been confounded with ability differences between the ethnic group samples compared. The present study is based on a conditional probability model in which an unbiased item is defined as one where the probability of a correct response to an item is the…
Primary Science Assessment Item Setters' Misconceptions Concerning Biological Science Concepts
ERIC Educational Resources Information Center
Boo, Hong Kwen
2007-01-01
Assessment is an integral and vital part of teaching and learning, providing feedback on progress through the assessment period to both learners and teachers. However, if test items are flawed because of misconceptions held by the question setter, then such test items are invalid as assessment tools. Moreover, such flawed items are also likely to…
Examination of Different Item Response Theory Models on Tests Composed of Testlets
ERIC Educational Resources Information Center
Kogar, Esin Yilmaz; Kelecioglu, Hülya
2017-01-01
The purpose of this research is to first estimate the item and ability parameters and the standard error values related to those parameters obtained from Unidimensional Item Response Theory (UIRT), bifactor (BIF) and Testlet Response Theory models (TRT) in the tests including testlets, when the number of testlets, number of independent items, and…
A Monte Carlo Study of an Iterative Wald Test Procedure for DIF Analysis
ERIC Educational Resources Information Center
Cao, Mengyang; Tay, Louis; Liu, Yaowu
2017-01-01
This study examined the performance of a proposed iterative Wald approach for detecting differential item functioning (DIF) between two groups when preknowledge of anchor items is absent. The iterative approach utilizes the Wald-2 approach to identify anchor items and then iteratively tests for DIF items with the Wald-1 approach. Monte Carlo…
A Semiparametric Model for Jointly Analyzing Response Times and Accuracy in Computerized Testing
ERIC Educational Resources Information Center
Wang, Chun; Fan, Zhewen; Chang, Hua-Hua; Douglas, Jeffrey A.
2013-01-01
The item response times (RTs) collected from computerized testing represent an underutilized type of information about items and examinees. In addition to knowing the examinees' responses to each item, we can investigate the amount of time examinees spend on each item. Current models for RTs mainly focus on parametric models, which have the…
ERIC Educational Resources Information Center
Missouri State Dept. of Elementary and Secondary Education, Jefferson City.
This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to ninth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
An Empirical Investigation of Methods for Assessing Item Fit for Mixed Format Tests
ERIC Educational Resources Information Center
Chon, Kyong Hee; Lee, Won-Chan; Ansley, Timothy N.
2013-01-01
Empirical information regarding performance of model-fit procedures has been a persistent need in measurement practice. Statistical procedures for evaluating item fit were applied to real test examples that consist of both dichotomously and polytomously scored items. The item fit statistics used in this study included the PARSCALE's G[squared],…
ERIC Educational Resources Information Center
Missouri State Dept. of Elementary and Secondary Education, Jefferson City.
This document deals with testing in intermediate communication arts for seventh graders in Missouri public schools. The document contains the following items from the Session 1 Test Booklet: "Swimming in Snow" (Diana C. Conway) (Items 1, 2, and 5); "Discovery" (Marion Dane Bauer) (Item 13); writing prompt; and a writer's…
Automated Item Generation with Recurrent Neural Networks.
von Davier, Matthias
2018-03-12
Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven's progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation.
Assessing the Conceptual Understanding about Heat and Thermodynamics at Undergraduate Level
ERIC Educational Resources Information Center
Kulkarni, Vasudeo Digambar; Tambade, Popat Savaleram
2013-01-01
In this study, a Thermodynamic Concept Test (TCT) was designed to assess student's conceptual understanding heat and thermodynamics at undergraduate level. The different statistical tests such as item difficulty index, item discrimination index, point biserial coefficient were used for assessing TCT. For each item of the test these indices were…
A Study of Inference in Standardized Reading Test Items and Its Relationship to Difficulty.
ERIC Educational Resources Information Center
Marzano, Robert J.
To study the relationship between inferences made on standardized reading tests and item difficulty, 50 items on the reading comprehension section of the Metropolitan Achievement Test were analyzed independently in this study by two raters using four general categories of inferences: (1) reference inferences, (2) between proposition inferences,…
Questions and Problems in Science.
ERIC Educational Resources Information Center
Dressel, Paul L.; Nelson, Clarence H.
This folio of test items, contributed by a number of colleges and universities from their course, placement, entrance, or other institutional examinations, was compiled to aid teachers in constructing tests. Only those science courses offered in the first two years of college are represented by the scope of the items. The test items may also serve…
Effects of Using Modified Items to Test Students with Persistent Academic Difficulties
ERIC Educational Resources Information Center
Elliott, Stephen N.; Kettler, Ryan J.; Beddow, Peter A.; Kurz, Alexander; Compton, Elizabeth; McGrath, Dawn; Bruen, Charles; Hinton, Kent; Palmer, Porter; Rodriguez, Michael C.; Bolt, Daniel; Roach, Andrew T.
2010-01-01
This study investigated the effects of using modified items in achievement tests to enhance accessibility. An experiment determined whether tests composed of modified items would reduce the performance gap between students eligible for an alternate assessment based on modified achievement standards (AA-MAS) and students not eligible, and the…
Optimal Stratification of Item Pools in a-Stratified Computerized Adaptive Testing.
ERIC Educational Resources Information Center
Chang, Hua-Hua; van der Linden, Wim J.
2003-01-01
Developed a method based on 0-1 linear programming to stratify an item pool optimally for use in alpha-stratified adaptive testing. Applied the method to a previous item pool from the computerized adaptive test of the Graduate Record Examinations. Results show the new method performs well in practical situations. (SLD)
The Development and Validation of a Formula for Measuring Single-Sentence Test Item Readability.
ERIC Educational Resources Information Center
Homan, Susan; And Others
1994-01-01
A study was conducted with 782 elementary school students to determine whether the Homan-Hewitt Readability Formula could identify the readability of a single-sentence test item. Results indicate that a relationship exists between students' reading grade levels and responses to test items written at higher readability levels. (SLD)
Development and Validation of a Computer Adaptive EFL Test
ERIC Educational Resources Information Center
He, Lianzhen; Min, Shangchao
2017-01-01
The first aim of this study was to develop a computer adaptive EFL test (CALT) that assesses test takers' listening and reading proficiency in English with dichotomous items and polytomous testlets. We reported in detail on the development of the CALT, including item banking, determination of suitable item response theory (IRT) models for item…
The Development and Management of Banks of Performance Based Test Items.
ERIC Educational Resources Information Center
Curtis, H. A., Ed.
Symposium papers presented at an Annual Meeting of the National Council on Measurement in Education (Chicago, 1972), all of which concern banks of test items for use in constructing criterion referenced tests, comprise this document. The first paper, "Locally Produced Item Banks" by Thomas J. Slocum, presents information on the…
Gjersoe, Nathalia L.; Newman, George E.; Chituc, Vladimir; Hood, Bruce
2014-01-01
The current studies examine how valuation of authentic items varies as a function of culture. We find that U.S. respondents value authentic items associated with individual persons (a sweater or an artwork) more than Indian respondents, but that both cultures value authentic objects not associated with persons (a dinosaur bone or a moon rock) equally. These differences cannot be attributed to more general cultural differences in the value assigned to authenticity. Rather, the results support the hypothesis that individualistic cultures place a greater value on objects associated with unique persons and in so doing, offer the first evidence for how valuation of certain authentic items may vary cross-culturally. PMID:24658437
Gjersoe, Nathalia L; Newman, George E; Chituc, Vladimir; Hood, Bruce
2014-01-01
The current studies examine how valuation of authentic items varies as a function of culture. We find that U.S. respondents value authentic items associated with individual persons (a sweater or an artwork) more than Indian respondents, but that both cultures value authentic objects not associated with persons (a dinosaur bone or a moon rock) equally. These differences cannot be attributed to more general cultural differences in the value assigned to authenticity. Rather, the results support the hypothesis that individualistic cultures place a greater value on objects associated with unique persons and in so doing, offer the first evidence for how valuation of certain authentic items may vary cross-culturally.
Test-retest stability of the Task and Ego Orientation Questionnaire.
Lane, Andrew M; Nevill, Alan M; Bowes, Neal; Fox, Kenneth R
2005-09-01
Establishing stability, defined as observing minimal measurement error in a test-retest assessment, is vital to validating psychometric tools. Correlational methods, such as Pearson product-moment, intraclass, and kappa are tests of association or consistency, whereas stability or reproducibility (regarded here as synonymous) assesses the agreement between test-retest scores. Indexes of reproducibility using the Task and Ego Orientation in Sport Questionnaire (TEOSQ; Duda & Nicholls, 1992) were investigated using correlational (Pearson product-moment, intraclass, and kappa) methods, repeated measures multivariate analysis of variance, and calculating the proportion of agreement within a referent value of +/-1 as suggested by Nevill, Lane, Kilgour, Bowes, and Whyte (2001). Two hundred thirteen soccer players completed the TEOSQ on two occasions, 1 week apart. Correlation analyses indicated a stronger test-retest correlation for the Ego subscale than the Task subscale. Multivariate analysis of variance indicated stability for ego items but with significant increases in four task items. The proportion of test-retest agreement scores indicated that all ego items reported relatively poor stability statistics with test-retest scores within a range of +/-1, ranging from 82.7-86.9%. By contrast, all task items showed test-retest difference scores ranging from 92.5-99%, although further analysis indicated that four task subscale items increased significantly. Findings illustrated that correlational methods (Pearson product-moment, intraclass, and kappa) are influenced by the range in scores, and calculating the proportion of agreement of test-retest differences with a referent value of +/-1 could provide additional insight into the stability of the questionnaire. It is suggested that the item-by-item proportion of agreement method proposed by Nevill et al. (2001) should be used to supplement existing methods and could be especially helpful in identifying rogue items in the initial stages of psychometric questionnaire validation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tulay, M.P.; Yurich, F.J.; Schremser, F.M. Jr.
1988-06-01
This guideline provides direction for the procurement and use of Commercial Grade Items (CGI)in safety-related applications. It is divided into five major sections. A glossary of terms and definitions, an acronym listing, and seven appendices have been included. The glossary defines terms used in this guideline. In certain instances, the definitions may be unique to this guideline. Identification of acronyms utilized in this guideline is also provided. Section 1 provides a background of the commercial grade item issues facing the nuclear industry. It provides a historical perspective of commercial grade item issues. Section 2 discusses the generic process for themore » acceptance of a commercial grade item for safety-related use. Section 3 defines the four distinct methods used to accept commercial grade items for safety-related applications. Section 4 lists specific references that are identified in this guideline. Section 5 is a bibliography of documents that were considered in developed this guideline, but were not directly referenced in the document.« less
Assessing cross-cultural validity of scales: a methodological review and illustrative example.
Beckstead, Jason W; Yang, Chiu-Yueh; Lengacher, Cecile A
2008-01-01
In this article, we assessed the cross-cultural validity of the Women's Role Strain Inventory (WRSI), a multi-item instrument that assesses the degree of strain experienced by women who juggle the roles of working professional, student, wife and mother. Cross-cultural validity is evinced by demonstrating the measurement invariance of the WRSI. Measurement invariance is the extent to which items of multi-item scales function in the same way across different samples of respondents. We assessed measurement invariance by comparing a sample of working women in Taiwan with a similar sample from the United States. Structural equation models (SEMs) were employed to determine the invariance of the WRSI and to estimate the unique validity variance of its items. This article also provides nurse-researchers with the necessary underlying measurement theory and illustrates how SEMs may be applied to assess cross-cultural validity of instruments used in nursing research. Overall performance of the WRSI was acceptable but our analysis showed that some items did not display invariance properties across samples. Item analysis is presented and recommendations for improving the instrument are discussed.
ERIC Educational Resources Information Center
Samejima, Fumiko; Changas, Paul S.
The methods and approaches for estimating the operating characteristics of the discrete item responses without assuming any mathematical form have been developed and expanded. It has been made possible that, even if the test information function of a given test is not constant for the interval of ability of interest, it is used as the Old Test.…
Automatic Generation of Rasch-Calibrated Items: Figural Matrices Test GEOM and Endless-Loops Test EC
ERIC Educational Resources Information Center
Arendasy, Martin
2005-01-01
The future of test construction for certain psychological ability domains that can be analyzed well in a structured manner may lie--at the very least for reasons of test security--in the field of automatic item generation. In this context, a question that has not been explicitly addressed is whether it is possible to embed an item response theory…
Evaluation of Floors and Item Gradients for Reading and Math Tests for Young Children
ERIC Educational Resources Information Center
Bradley-Johnson, Sharon; Durmusoglu, Gokce
2005-01-01
Ignoring the adequacy of floors and item gradients for tests used with young children can have serious consequences. Thus, because of the importance of early intervention for reading and math problems, we used the criteria suggested by Bracken for adequate floors and item gradients, and reviewed 15 reading tests and 12 math tests for ages 4-0…
ERIC Educational Resources Information Center
Khaksefidi, Saman
2017-01-01
This study investigates the psychological effect of a wrong question with wrong items on answering to the next question in a test of structure. Forty students selected through stratified random sampling are given 15 questions of a standardized test namely a TOEFL structure test in which questions number 7 and number 11 are wrong and their answers…
Basra, M K A; Salek, M S; Fenech, D; Finlay, A Y
2018-01-01
Skin disease can affect the quality of life (QoL) of teenagers in a variety of different ways, some being unique to this age group. To develop and validate a dermatology-specific QoL instrument for adolescents with skin diseases. Qualitative semistructured interviews were conducted with adolescents with skin disease to gain in-depth understanding of how skin diseases affect their QoL. A prototype instrument based on the themes identified from content analysis of interviews was tested in several stages, using classical test theory and item response theory models to develop this new tool and conduct its psychometric evaluation. Thirty-three QoL issues were identified from semistructured interviews with 50 adolescents. A questionnaire based on items derived from content analysis of interviews was subjected to Rasch analysis: factor analysis identified three domains, therefore not supporting the validity of T-QoL as a unidimensional measure. Psychometric evaluation of the final 18-item questionnaire was carried out in a cohort of 203 adolescents. Convergent validity was demonstrated by significant correlation with Skindex-Teen and Dermatology Life Quality Index (DLQI) or Children's DLQI. The T-QoL showed excellent internal consistency reliability: Cronbach's α = 0·89 for total scale score and 0·85, 0·60 and 0·74, respectively, for domains 1, 2 and 3. Test-retest reliability was high in stable volunteers. T-QoL showed sensitivity to change in two subgroups of patients who indicated change in their self-assessed disease severity. Built on rich qualitative data from patients, the T-QoL is a simple and valid tool to quantify the impact of skin disease on adolescents' QoL; it could be used as an outcome measure in both clinical practice and clinical research. © 2017 British Association of Dermatologists.
Hernan, Andrea L; Giles, Sally J; O'Hara, Jane K; Fuller, Jeffrey; Johnson, Julie K; Dunbar, James A
2016-04-01
Patients are a valuable source of information about ways to prevent harm in primary care and are in a unique position to provide feedback about the factors that contribute to safety incidents. Unlike in the hospital setting, there are currently no tools that allow the systematic capture of this information from patients. The aim of this study was to develop a quantitative primary care patient measure of safety (PC PMOS). A two-stage approach was undertaken to develop questionnaire domains and items. Stage 1 involved a modified Delphi process. An expert panel reached consensus on domains and items based on three sources of information (validated hospital PMOS, previous research conducted by our study team and literature on threats to patient safety). Stage 2 involved testing the face validity of the questionnaire developed during stage 1 with patients and primary care staff using the 'think aloud' method. Following this process, the questionnaire was revised accordingly. The PC PMOS was received positively by both patients and staff during face validity testing. Barriers to completion included the length, relevance and clarity of questions. The final PC PMOS consisted of 50 items across 15 domains. The contributory factors to safety incidents centred on communication, access to care, patient-related factors, organisation and care planning, task performance and information flow. This is the first tool specifically designed for primary care settings, which allows patients to provide feedback about factors contributing to potential safety incidents. The PC PMOS provides a way for primary care organisations to learn about safety from the patient perspective and make service improvements with the aim of reducing harm in this setting. Future research will explore the reliability and construct validity of the PC PMOS. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
ITEM ANALYSIS OF THREE SPANISH NAMING TESTS: A CROSS-CULTURAL INVESTIGATION
de la Plata, Carlos Marquez; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C. Munro
2009-01-01
Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test’s construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (126 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided. PMID:19208960
Identifying predictors of physics item difficulty: A linear regression approach
NASA Astrophysics Data System (ADS)
Mesic, Vanes; Muratovic, Hasnija
2011-06-01
Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge structures. Identified predictors point out the fundamental cognitive dimensions of student physics achievement at the end of compulsory education in Bosnia and Herzegovina, whose level of development influenced the test results within the conducted assessments.
Stochl, Jan; Böhnke, Jan R; Pickett, Kate E; Croudace, Tim J
2016-05-20
Recent developments in psychometric modeling and technology allow pooling well-validated items from existing instruments into larger item banks and their deployment through methods of computerized adaptive testing (CAT). Use of item response theory-based bifactor methods and integrative data analysis overcomes barriers in cross-instrument comparison. This paper presents the joint calibration of an item bank for researchers keen to investigate population variations in general psychological distress (GPD). Multidimensional item response theory was used on existing health survey data from the Scottish Health Education Population Survey (n = 766) to calibrate an item bank consisting of pooled items from the short common mental disorder screen (GHQ-12) and the Affectometer-2 (a measure of "general happiness"). Computer simulation was used to evaluate usefulness and efficacy of its adaptive administration. A bifactor model capturing variation across a continuum of population distress (while controlling for artefacts due to item wording) was supported. The numbers of items for different required reliabilities in adaptive administration demonstrated promising efficacy of the proposed item bank. Psychometric modeling of the common dimension captured by more than one instrument offers the potential of adaptive testing for GPD using individually sequenced combinations of existing survey items. The potential for linking other item sets with alternative candidate measures of positive mental health is discussed since an optimal item bank may require even more items than these.
Expertise sensitive item selection.
Chow, P; Russell, H; Traub, R E
2000-12-01
In this paper we describe and illustrate a procedure for selecting items from a large pool for a certification test. The proposed procedure, which is intended to improve the alignment of the certification test with on-the-job performance, is based on an expertise sensitive index. This index for an item is the difference between the item's p values for experts and novices. An example is provided of the application of the index for selecting items to be used in certifying bakers.
Chen, Cheng-Te; Chen, Yu-Lan; Lin, Yu-Ching; Hsieh, Ching-Lin; Tzeng, Jeng-Yi
2018-01-01
Objective The purpose of this study was to construct a computerized adaptive test (CAT) for measuring self-care performance (the CAT-SC) in children with developmental disabilities (DD) aged from 6 months to 12 years in a content-inclusive, precise, and efficient fashion. Methods The study was divided into 3 phases: (1) item bank development, (2) item testing, and (3) a simulation study to determine the stopping rules for the administration of the CAT-SC. A total of 215 caregivers of children with DD were interviewed with the 73-item CAT-SC item bank. An item response theory model was adopted for examining the construct validity to estimate item parameters after investigation of the unidimensionality, equality of slope parameters, item fitness, and differential item functioning (DIF). In the last phase, the reliability and concurrent validity of the CAT-SC were evaluated. Results The final CAT-SC item bank contained 56 items. The stopping rules suggested were (a) reliability coefficient greater than 0.9 or (b) 14 items administered. The results of simulation also showed that 85% of the estimated self-care performance scores would reach a reliability higher than 0.9 with a mean test length of 8.5 items, and the mean reliability for the rest was 0.86. Administering the CAT-SC could reduce the number of items administered by 75% to 84%. In addition, self-care performances estimated by the CAT-SC and the full item bank were very similar to each other (Pearson r = 0.98). Conclusion The newly developed CAT-SC can efficiently measure self-care performance in children with DD whose performances are comparable to those of TD children aged from 6 months to 12 years as precisely as the whole item bank. The item bank of the CAT-SC has good reliability and a unidimensional self-care construct, and the CAT can estimate self-care performance with less than 25% of the items in the item bank. Therefore, the CAT-SC could be useful for measuring self-care performance in children with DD in clinical and research settings. PMID:29561879
Chen, Cheng-Te; Chen, Yu-Lan; Lin, Yu-Ching; Hsieh, Ching-Lin; Tzeng, Jeng-Yi; Chen, Kuan-Lin
2018-01-01
The purpose of this study was to construct a computerized adaptive test (CAT) for measuring self-care performance (the CAT-SC) in children with developmental disabilities (DD) aged from 6 months to 12 years in a content-inclusive, precise, and efficient fashion. The study was divided into 3 phases: (1) item bank development, (2) item testing, and (3) a simulation study to determine the stopping rules for the administration of the CAT-SC. A total of 215 caregivers of children with DD were interviewed with the 73-item CAT-SC item bank. An item response theory model was adopted for examining the construct validity to estimate item parameters after investigation of the unidimensionality, equality of slope parameters, item fitness, and differential item functioning (DIF). In the last phase, the reliability and concurrent validity of the CAT-SC were evaluated. The final CAT-SC item bank contained 56 items. The stopping rules suggested were (a) reliability coefficient greater than 0.9 or (b) 14 items administered. The results of simulation also showed that 85% of the estimated self-care performance scores would reach a reliability higher than 0.9 with a mean test length of 8.5 items, and the mean reliability for the rest was 0.86. Administering the CAT-SC could reduce the number of items administered by 75% to 84%. In addition, self-care performances estimated by the CAT-SC and the full item bank were very similar to each other (Pearson r = 0.98). The newly developed CAT-SC can efficiently measure self-care performance in children with DD whose performances are comparable to those of TD children aged from 6 months to 12 years as precisely as the whole item bank. The item bank of the CAT-SC has good reliability and a unidimensional self-care construct, and the CAT can estimate self-care performance with less than 25% of the items in the item bank. Therefore, the CAT-SC could be useful for measuring self-care performance in children with DD in clinical and research settings.
Procedures to develop a computerized adaptive test to assess patient-reported physical functioning.
McCabe, Erin; Gross, Douglas P; Bulut, Okan
2018-06-07
The purpose of this paper is to demonstrate the procedures to develop and implement a computerized adaptive patient-reported outcome (PRO) measure using secondary analysis of a dataset and items from fixed-format legacy measures. We conducted secondary analysis of a dataset of responses from 1429 persons with work-related lower extremity impairment. We calibrated three measures of physical functioning on the same metric, based on item response theory (IRT). We evaluated efficiency and measurement precision of various computerized adaptive test (CAT) designs using computer simulations. IRT and confirmatory factor analyses support combining the items from the three scales for a CAT item bank of 31 items. The item parameters for IRT were calculated using the generalized partial credit model. CAT simulations show that reducing the test length from the full 31 items to a maximum test length of 8 items, or 20 items is possible without a significant loss of information (95, 99% correlation with legacy measure scores). We demonstrated feasibility and efficiency of using CAT for PRO measurement of physical functioning. The procedures we outlined are straightforward, and can be applied to other PRO measures. Additionally, we have included all the information necessary to implement the CAT of physical functioning in the electronic supplementary material of this paper.
Survey Development to Assess College Students' Perceptions of the Campus Environment.
Sowers, Morgan F; Colby, Sarah; Greene, Geoffrey W; Pickett, Mackenzie; Franzen-Castle, Lisa; Olfert, Melissa D; Shelnutt, Karla; Brown, Onikia; Horacek, Tanya M; Kidd, Tandalayo; Kattelmann, Kendra K; White, Adrienne A; Zhou, Wenjun; Riggsbee, Kristin; Yan, Wangcheng; Byrd-Bredbenner, Carol
2017-11-01
We developed and tested a College Environmental Perceptions Survey (CEPS) to assess college students' perceptions of the healthfulness of their campus. CEPS was developed in 3 stages: questionnaire development, validity testing, and reliability testing. Questionnaire development was based on an extensive literature review and input from an expert panel to establish content validity. Face validity was established with the target population using cognitive interviews with 100 college students. Concurrent-criterion validity was established with in-depth interviews (N = 30) of college students compared to surveys completed by the same 30 students. Surveys completed by college students from 8 universities (N = 1147) were used to test internal structure (factor analysis) and internal consistency (Cronbach's alpha). After development and testing, 15 items remained from the original 48 items. A 5-factor solution emerged: physical activity (4 items, α = .635), water (3 items, α = .773), vending (2 items, α = .680), healthy food (2 items, α = .631), and policy (2 items, α = .573). The mean total score for all universities was 62.71 (±11.16) on a 100-point scale. CEPS appears to be a valid and reliable tool for assessing college students' perceptions of their health-related campus environment.
The precategorical nature of visual short-term memory.
Quinlan, Philip T; Cohen, Dale J
2016-11-01
We conducted a series of recognition experiments that assessed whether visual short-term memory (VSTM) is sensitive to shared category membership of to-be-remembered (tbr) images of common objects. In Experiment 1 some of the tbr items shared the same basic level category (e.g., hand axe): Such items were no better retained than others. In the remaining experiments, displays contained different images of items from the same higher-level category (e.g., food: a bagel, a sandwich, a pizza). Evidence from the later experiments did suggest that participants were sensitive to the categorical relations present in the displays. However, when separate measures of sensitivity and bias were computed, the data revealed no effects on sensitivity, but a greater tendency to respond positively to noncategory items relative to items from the depicted category. Across all experiments, there was no evidence that items from a common category were better remembered than unique items. Previous work has shown that principles of perceptual organization do affect the storage and maintenance of tbr items. The present work shows that there are no corresponding conceptual principles of organization in VSTM. It is concluded that the sort of VSTM tapped by single probe recognition methods is precategorical in nature. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Implicit and explicit forgetting: when is gist remembered?
Dorfman, J; Mandler, G
1994-08-01
Recognition (YES/NO) and stem completion (cued: complete with a word from the list; and uncued: complete with the first word that comes to mind) were tested following either semantic or non-semantic processing of a categorized input list. Item/instance information was tested by contrasting target items from the input list with new items that were categorically related to them; gist/categorical information was tested by comparing target items semantically related to the input items with unrelated new items. For both recognition and stem completion, regardless of initial processing condition, item information decayed rapidly over a period of one week. Gist information was maintained over the same period when initial processing was semantic but only in the cued condition for completion. These results are discussed in terms of dual process theory, which postulates activation/integration of a representation as primarily relevant to implicit item information and elaboration of a representation as mainly relevant to semantic (i.e. categorical) information.
Incidental retrieval-induced forgetting of location information.
Gómez-Ariza, Carlos J; Fernandez, Angel; Bajo, M Teresa
2012-06-01
Retrieval-induced forgetting (RIF) has been studied with different types of tests and materials. However, RIF has always been tested on the items' central features, and there is no information on whether inhibition also extends to peripheral features of the events in which the items are embedded. In two experiments, we specifically tested the presence of RIF in a task in which recall of peripheral information was required. After a standard retrieval practice task oriented to item identity, participants were cued with colors (Exp. 1) or with the items themselves (Exp. 2) and asked to recall the screen locations where the items had been displayed during the study phase. RIF for locations was observed after retrieval practice, an effect that was not present when participants were asked to read instead of retrieving the items. Our findings provide evidence that peripheral location information associated with an item during study can be also inhibited when the retrieval conditions promote the inhibition of more central, item identity information.
Computerized Adaptive Testing with Item Clones. Research Report.
ERIC Educational Resources Information Center
Glas, Cees A. W.; van der Linden, Wim J.
To reduce the cost of item writing and to enhance the flexibility of item presentation, items can be generated by item-cloning techniques. An important consequence of cloning is that it may cause variability on the item parameters. Therefore, a multilevel item response model is presented in which it is assumed that the item parameters of a…
Hjermstad, Marianne J; Bergenmar, Mia; Bjordal, Kristin; Fisher, Sheila E; Hofmeister, Dirk; Montel, Sébastien; Nicolatou-Galitis, Ourania; Pinto, Monica; Raber-Durlacher, Judith; Singer, Susanne; Tomaszewska, Iwona M; Tomaszewski, Krzysztof A; Verdonck-de Leeuw, Irma; Yarom, Noam; Winstanley, Julie B; Herlofson, Bente B
2016-09-01
This international EORTC validation study (phase IV) is aimed at testing the psychometric properties of a quality of life (QoL) module related to oral health problems in cancer patients. The phase III module comprised 17 items with four hypothesized multi-item scales and three single items. In phase IV, patients with mixed cancers, in different treatment phases from 10 countries completed the EORTC QLQ-C30, the QLQ-OH module, and a debriefing interview. The hypothesized structure was tested using combinations of classical test theory and item response theory, following EORTC guidelines. Test-retest assessments and responsiveness to change analysis (RCA) were performed after 2 weeks. Five hundred seventy-two patients (median age 60.3, 54 % females) were analyzed. Completion took <10 min for 84 %, 40 % expressed satisfaction that these issues were addressed. Analyses suggested a revision of the phase III hypothesized scale structure. Two items were deleted based on a high degree of item misfit, together with negative patient feedback. The remaining 15 items formed one eight-item scale named OH-QoL score, a two-item information scale, a two-item scale regarding dentures, and three single items (sticky saliva/mouth soreness/sensitivity to food/drink). Face and convergent validity and internal consistency were confirmed. Test-retest reliability (n = 60) was demonstrated as was RCA for patients undergoing chemotherapy (n = 117; p = 0.06). The resulting QLQ-OH15 discriminated between clinically distinct patient groups, e.g., low performance status vs. higher (p < 000.1), and head-and-neck cancer versus other cancers (p < 0.03). The EORTC module QLQ-OH15 is a short, well-accepted assessment tool focusing on oral problems and QoL to improve clinical management. ClinicalTrials.gov Identifier: NCT01724333.
Item Selection and Pre-equating with Empirical Item Characteristic Curves.
ERIC Educational Resources Information Center
Livingston, Samuel A.
An empirical item characteristic curve shows the probability of a correct response as a function of the student's total test score. These curves can be estimated from large-scale pretest data. They enable test developers to select items that discriminate well in the score region where decisions are made. A similar set of curves can be used to…
ERIC Educational Resources Information Center
Hol, A. Michiel; Vorst, Harrie C. M.; Mellenbergh, Gideon J.
2007-01-01
In a randomized experiment (n = 515), a computerized and a computerized adaptive test (CAT) are compared. The item pool consists of 24 polytomous motivation items. Although items are carefully selected, calibration data show that Samejima's graded response model did not fit the data optimally. A simulation study is done to assess possible…
The Effect of Error in Item Parameter Estimates on the Test Response Function Method of Linking.
ERIC Educational Resources Information Center
Kaskowitz, Gary S.; De Ayala, R. J.
2001-01-01
Studied the effect of item parameter estimation for computation of linking coefficients for the test response function (TRF) linking/equating method. Simulation results showed that linking was more accurate when there was less error in the parameter estimates, and that 15 or 25 common items provided better results than 5 common items under both…
ERIC Educational Resources Information Center
Dutke, Stephan; Barenberg, Jonathan
2015-01-01
We introduce a specific type of item for knowledge tests, confidence-weighted true-false (CTF) items, and review experiences of its application in psychology courses. A CTF item is a statement about the learning content to which students respond whether the statement is true or false, and they rate their confidence level. Previous studies using…
ERIC Educational Resources Information Center
Brown, Frank N.; And Others
The successful Wisconsin Title 1 project item bank offers a valid, flexible, and efficient means of providing migrant student tests in reading and mathematics tailored to instructor curricula. The item bank system consists of nine PASCAL computer programs which maintain, search, and select from approximately 1,000 test items stored on floppy disks…
Cupani, Marcos; Zamparella, Tatiana Castro; Piumatti, Gisella; Vinculado, Grupo
The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants' test burden. This study aims to develop a bank of items to measure the level of Knowledge on Biology using the Rasch model. The sample consisted of 1219 participants that studied in different faculties of the National University of Cordoba (mean age = 21.85 years, SD = 4.66; 66.9% are women). The items were organized in different forms and into separate subtests, with some common items across subtests. The students were told they had to answer 60 questions of knowledge on biology. Evaluation of Rasch model fit (Zstd >|2.0|), differential item functioning, dimensionality, local independence, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 180 items with good psychometric properties. The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. The contribution of this work is significant in the field of educational assessment in Argentina.
ERIC Educational Resources Information Center
Davis, Laurie Laughlin
2004-01-01
Choosing a strategy for controlling item exposure has become an integral part of test development for computerized adaptive testing (CAT). This study investigated the performance of six procedures for controlling item exposure in a series of simulated CATs under the generalized partial credit model. In addition to a no-exposure control baseline…
Effects of Differential Item Functioning on Examinees' Test Performance and Reliability of Test
ERIC Educational Resources Information Center
Lee, Yi-Hsuan; Zhang, Jinming
2017-01-01
Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…
Application of Computerized Adaptive Testing to Entrance Examination for Graduate Studies in Turkey
ERIC Educational Resources Information Center
Bulut, Okan; Kan, Adnan
2012-01-01
Problem Statement: Computerized adaptive testing (CAT) is a sophisticated and efficient way of delivering examinations. In CAT, items for each examinee are selected from an item bank based on the examinee's responses to the items. In this way, the difficulty level of the test is adjusted based on the examinee's ability level. Instead of…