Sample records for test items relating

  1. Item Structural Properties as Predictors of Item Difficulty and Item Association.

    ERIC Educational Resources Information Center

    Solano-Flores, Guillermo

    1993-01-01

    Studied the ability of logical test design (LTD) to predict student performance in reading Roman numerals for 211 sixth graders in Mexico City tested on Roman numeral items varying on LTD-related and non-LTD-related variables. The LTD-related variable item iterativity was found to be the best predictor of item difficulty. (SLD)

  2. Application of Item Response Theory to Tests of Substance-related Associative Memory

    PubMed Central

    Shono, Yusuke; Grenard, Jerry L.; Ames, Susan L.; Stacy, Alan W.

    2015-01-01

    A substance-related word association test (WAT) is one of the commonly used indirect tests of substance-related implicit associative memory and has been shown to predict substance use. This study applied an item response theory (IRT) modeling approach to evaluate psychometric properties of the alcohol- and marijuana-related WATs and their items among 775 ethnically diverse at-risk adolescents. After examining the IRT assumptions, item fit, and differential item functioning (DIF) across gender and age groups, the original 18 WAT items were reduced to 14- and 15-items in the alcohol- and marijuana-related WAT, respectively. Thereafter, unidimensional one- and two-parameter logistic models (1PL and 2PL models) were fitted to the revised WAT items. The results demonstrated that both alcohol- and marijuana-related WATs have good psychometric properties. These results were discussed in light of the framework of a unified concept of construct validity (Messick, 1975, 1989, 1995). PMID:25134051

  3. An Item Gains and Losses Analysis of False Memories Suggests Critical Items Receive More Item-Specific Processing than List Items

    ERIC Educational Resources Information Center

    Burns, Daniel J.; Martens, Nicholas J.; Bertoni, Alicia A.; Sweeney, Emily J.; Lividini, Michelle D.

    2006-01-01

    In a repeated testing paradigm, list items receiving item-specific processing are more likely to be recovered across successive tests (item gains), whereas items receiving relational processing are likely to be forgotten progressively less on successive tests. Moreover, analysis of cumulative-recall curves has shown that item-specific processing…

  4. A Comparison of Three Types of Test Development Procedures Using Classical and Latent Trait Methods.

    ERIC Educational Resources Information Center

    Benson, Jeri; Wilson, Michael

    Three methods of item selection were used to select sets of 38 items from a 50-item verbal analogies test and the resulting item sets were compared for internal consistency, standard errors of measurement, item difficulty, biserial item-test correlations, and relative efficiency. Three groups of 1,500 cases each were used for item selection. First…

  5. The beneficial effect of testing: an event-related potential study

    PubMed Central

    Bai, Cheng-Hua; Bridger, Emma K.; Zimmer, Hubert D.; Mecklinger, Axel

    2015-01-01

    The enhanced memory performance for items that are tested as compared to being restudied (the testing effect) is a frequently reported memory phenomenon. According to the episodic context account of the testing effect, this beneficial effect of testing is related to a process which reinstates the previously learnt episodic information. Few studies have explored the neural correlates of this effect at the time point when testing takes place, however. In this study, we utilized the ERP correlates of successful memory encoding to address this issue, hypothesizing that if the benefit of testing is due to retrieval-related processes at test then subsequent memory effects (SMEs) should resemble the ERP correlates of retrieval-based processing in their temporal and spatial characteristics. Participants were asked to learn Swahili-German word pairs before items were presented in either a testing or a restudy condition. Memory performance was assessed immediately and 1-day later with a cued recall task. Successfully recalling items at test increased the likelihood that items were remembered over time compared to items which were only restudied. An ERP subsequent memory contrast (later remembered vs. later forgotten tested items), which reflects the engagement of processes that ensure items are recallable the next day were topographically comparable with the ERP correlate of immediate recollection (immediately remembered vs. immediately forgotten tested items). This result shows that the processes which allow items to be more memorable over time share qualitatively similar neural correlates with the processes that relate to successful retrieval at test. This finding supports the notion that testing is more beneficial than restudying on memory performance over time because of its engagement of retrieval processes, such as the re-encoding of actively retrieved memory representations. PMID:26441577

  6. Science Library of Test Items. Volume Nineteen. A Collection of Multiple Choice Test Items Relating Mainly to Geology.

    ERIC Educational Resources Information Center

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  7. Science Library of Test Items. Volume Seventeen. A Collection of Multiple Choice Test Items Relating Mainly to Biology.

    ERIC Educational Resources Information Center

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  8. Science Library of Test Items. Volume Eighteen. A Collection of Multiple Choice Test Items Relating Mainly to Chemistry.

    ERIC Educational Resources Information Center

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  9. Memory for Items and Relationships among Items Embedded in Realistic Scenes: Disproportionate Relational Memory Impairments in Amnesia

    PubMed Central

    Hannula, Deborah E.; Tranel, Daniel; Allen, John S.; Kirchhoff, Brenda A.; Nickel, Allison E.; Cohen, Neal J.

    2014-01-01

    Objective The objective of this study was to examine the dependence of item memory and relational memory on medial temporal lobe (MTL) structures. Patients with amnesia, who either had extensive MTL damage or damage that was relatively restricted to the hippocampus, were tested, as was a matched comparison group. Disproportionate relational memory impairments were predicted for both patient groups, and those with extensive MTL damage were also expected to have impaired item memory. Method Participants studied scenes, and were tested with interleaved two-alternative forced-choice probe trials. Probe trials were either presented immediately after the corresponding study trial (lag 1), five trials later (lag 5), or nine trials later (lag 9) and consisted of the studied scene along with a manipulated version of that scene in which one item was replaced with a different exemplar (item memory test) or was moved to a new location (relational memory test). Participants were to identify the exact match of the studied scene. Results As predicted, patients were disproportionately impaired on the test of relational memory. Item memory performance was marginally poorer among patients with extensive MTL damage, but both groups were impaired relative to matched comparison participants. Impaired performance was evident at all lags, including the shortest possible lag (lag 1). Conclusions The results are consistent with the proposed role of the hippocampus in relational memory binding and representation, even at short delays, and suggest that the hippocampus may also contribute to successful item memory when items are embedded in complex scenes. PMID:25068665

  10. Computerized Adaptive Test (CAT) Applications and Item Response Theory Models for Polytomous Items

    ERIC Educational Resources Information Center

    Aybek, Eren Can; Demirtasli, R. Nukhet

    2017-01-01

    This article aims to provide a theoretical framework for computerized adaptive tests (CAT) and item response theory models for polytomous items. Besides that, it aims to introduce the simulation and live CAT software to the related researchers. Computerized adaptive test algorithm, assumptions of item response theory models, nominal response…

  11. Detecting a Gender-Related DIF Using Logistic Regression and Transformed Item Difficulty

    ERIC Educational Resources Information Center

    Abedlaziz, Nabeel; Ismail, Wail; Hussin, Zaharah

    2011-01-01

    Test items are designed to provide information about the examinees. Difficult items are designed to be more demanding and easy items are less so. However, sometimes, test items carry with their demands other than those intended by the test developer (Scheuneman & Gerritz, 1990). When personal attributes such as gender systematically affect…

  12. Science Library of Test Items. Volume Twenty. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 1.

    ERIC Educational Resources Information Center

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  13. Science Library of Test Items. Volume Twenty-One. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 2.

    ERIC Educational Resources Information Center

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  14. Science Library of Test Items. Volume Twenty-Two. A Collection of Multiple Choice Test Items Relating Mainly to Skills.

    ERIC Educational Resources Information Center

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  15. The Role of Item Models in Automatic Item Generation

    ERIC Educational Resources Information Center

    Gierl, Mark J.; Lai, Hollis

    2012-01-01

    Automatic item generation represents a relatively new but rapidly evolving research area where cognitive and psychometric theories are used to produce tests that include items generated using computer technology. Automatic item generation requires two steps. First, test development specialists create item models, which are comparable to templates…

  16. Australian Chemistry Test Item Bank: Years 11 & 12. Volume 1.

    ERIC Educational Resources Information Center

    Commons, C., Ed.; Martin, P., Ed.

    Volume 1 of the Australian Chemistry Test Item Bank, consisting of two volumes, contains nearly 2000 multiple-choice items related to the chemistry taught in Year 11 and Year 12 courses in Australia. Items which were written during 1979 and 1980 were initially published in the "ACER Chemistry Test Item Collection" and in the "ACER…

  17. Australian Chemistry Test Item Bank: Years 11 and 12. Volume 2.

    ERIC Educational Resources Information Center

    Commons, C., Ed.; Martin, P., Ed.

    The second volume of the Australian Chemistry Test Item Bank, consisting of two volumes, contains nearly 2000 multiple-choice items related to the chemistry taught in Year 11 and Year 12 courses in Australia. Items which were written during 1979 and 1980 were initially published in the "ACER Chemistry Test Item Collection" and in the…

  18. Serial-position effects for items and relations in short-term memory.

    PubMed

    Jones, Tim; Oberauer, Klaus

    2013-04-01

    Two experiments used immediate probed recall of words to investigate serial-position effects. Item memory was tested through probing with a semantic category. Relation memory was tested through probing with the word's spatial location of presentation. Input order and output order were deconfounded by presenting and probing items in different orders. Primacy and recency effects over input position were found for both item memory and relation memory. Both item and relation memory declined over output position. The finding of a U-shaped input position function for item memory rules out an explanation purely in terms of positional confusions (e.g., edge effects). Either these serial-position effects arise from variations in the intrinsic memory strength of the items, or they arise from variations in the strength of item-position bindings, together with retrieval by scanning.

  19. Negative and Positive Testing Effects in Terms of Item-Specific and Relational Information

    ERIC Educational Resources Information Center

    Mulligan, Neil W.; Peterson, Daniel J.

    2015-01-01

    Though retrieving information typically results in improved memory on a subsequent test (the testing effect), Peterson and Mulligan (2013) outlined the conditions under which retrieval practice results in poorer recall relative to restudy, a phenomenon dubbed the "negative testing effect." The item-specific-relational account proposes…

  20. ACER Chemistry Test Item Collection (ACER CHEMTIC Year 12 Supplement).

    ERIC Educational Resources Information Center

    Australian Council for Educational Research, Hawthorn.

    This publication contains 317 multiple-choice chemistry test items related to topics covered in the Victorian (Australia) Year 12 chemistry course. It allows teachers access to a range of items suitable for diagnostic and achievement purposes, supplementing the ACER Chemistry Test Item Collection--Year 12 (CHEMTIC). The topics covered are: organic…

  1. Test item linguistic complexity and assessments for deaf students.

    PubMed

    Cawthon, Stephanie

    2011-01-01

    Linguistic complexity of test items is one test format element that has been studied in the context of struggling readers and their participation in paper-and-pencil tests. The present article presents findings from an exploratory study on the potential relationship between linguistic complexity and test performance for deaf readers. A total of 64 students completed 52 multiple-choice items, 32 in mathematics and 20 in reading. These items were coded for linguistic complexity components of vocabulary, syntax, and discourse. Mathematics items had higher linguistic complexity ratings than reading items, but there were no significant relationships between item linguistic complexity scores and student performance on the test items. The discussion addresses issues related to the subject area, student proficiency levels in the test content, factors to look for in determining a "linguistic complexity effect," and areas for further research in test item development and deaf students.

  2. Detecting a Gender-Related Differential Item Functioning Using Transformed Item Difficulty

    ERIC Educational Resources Information Center

    Abedalaziz, Nabeel; Leng, Chin Hai; Alahmadi, Ahlam

    2014-01-01

    The purpose of the study was to examine gender differences in performance on multiple-choice mathematical ability test, administered within the context of high school graduation test that was designed to match eleventh grade curriculum. The transformed item difficulty (TID) was used to detect a gender related DIF. A random sample of 1400 eleventh…

  3. Location Indices for Ordinal Polytomous Items Based on Item Response Theory. Research Report. ETS RR-15-20

    ERIC Educational Resources Information Center

    Ali, Usama S.; Chang, Hua-Hua; Anderson, Carolyn J.

    2015-01-01

    Polytomous items are typically described by multiple category-related parameters; situations, however, arise in which a single index is needed to describe an item's location along a latent trait continuum. Situations in which a single index would be needed include item selection in computerized adaptive testing or test assembly. Therefore single…

  4. Measuring the effects of online health information: Scale validation for the e-Health Impact Questionnaire.

    PubMed

    Kelly, Laura; Ziebland, Sue; Jenkinson, Crispin

    2015-11-01

    Health-related websites have developed to be much more than information sites: they are used to exchange experiences and find support as well as information and advice. This paper documents the development of a tool to compare the potential consequences and experiences a person may encounter when using health-related websites. Questionnaire items were developed following a review of relevant literature and qualitative secondary analysis of interviews relating to experiences of health. Item reduction steps were performed on pilot survey data (n=167). Tests of validity and reliability were subsequently performed (n=170) to determine the psychometric properties of the questionnaire. Two independent item pools entered psychometric testing: (1) Items relating to general views of using the internet in relation to health and, (2) Items relating to the consequences of using a specific health-related website. Identified sub-scales were found to have high construct validity, internal consistency and test-retest reliability. Analyses confirmed good psychometric properties in the eHIQ-Part 1 (11 items) and the eHIQ-Part 2 (26 items). This tool will facilitate the measurement of the potential consequences of using websites containing different types of material (scientific facts and figures, blogs, experiences, images) across a range of health conditions. Copyright © 2015 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  5. Not all order memory is equal: Test demands reveal dissociations in memory for sequence information.

    PubMed

    Jonker, Tanya R; MacLeod, Colin M

    2017-02-01

    Remembering the order of a sequence of events is a fundamental feature of episodic memory. Indeed, a number of formal models represent temporal context as part of the memory system, and memory for order has been researched extensively. Yet, the nature of the code(s) underlying sequence memory is still relatively unknown. Across 4 experiments that manipulated encoding task, we found evidence for 3 dissociable facets of order memory. Experiment 1 introduced a test requiring a judgment of which of 2 alternatives had immediately followed a word during encoding. This measure revealed better retention of interitem associations following relational encoding (silent reading) than relatively item-specific encoding (judging referent size), a pattern consistent with that observed in previous research using order reconstruction tests. In sharp contrast, Experiment 2 demonstrated the reverse pattern: Memory for the studied order of 2 sequentially presented items was actually better following item-specific encoding than following relational encoding. Experiment 3 reproduced this dissociation in a single experiment using both tests. Experiment 4 extended these findings by further dissociating the roles of relational encoding and item strength in the 2 tests. Taken together, these results indicate that memory for event sequence is influenced by (a) interitem associations, (b) the emphasized directionality of an association, and (c) an item's strength independent of other items. Memory for order is more complicated than has been portrayed in theories of memory and its nuances should be carefully considered when designing tests and models of temporal and relational memory. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  6. A Comparison of the Approaches of Generalizability Theory and Item Response Theory in Estimating the Reliability of Test Scores for Testlet-Composed Tests

    ERIC Educational Resources Information Center

    Lee, Guemin; Park, In-Yong

    2012-01-01

    Previous assessments of the reliability of test scores for testlet-composed tests have indicated that item-based estimation methods overestimate reliability. This study was designed to address issues related to the extent to which item-based estimation methods overestimate the reliability of test scores composed of testlets and to compare several…

  7. Testing of electrical equipment for a commercial grade dedication program

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brown, J.L.; Srinivas, N.

    1995-10-01

    The availability of qualified safety related replacement parts for use in nuclear power plants has decreased over time. This has caused many nuclear power plants to purchase commercial grade items (CGI) and utilize the commercial grade dedication process to qualify the items for use in nuclear safety related applications. The laboratories of Technical and Engineering Services (the testing facility of Detroit Edison) have been providing testing services for verification of critical characteristics of these items. This paper presents an overview of the experience in testing electrical equipment with an emphasis on fuses.

  8. An Investigation of the Impact of Guessing on Coefficient α and Reliability

    PubMed Central

    2014-01-01

    Guessing is known to influence the test reliability of multiple-choice tests. Although there are many studies that have examined the impact of guessing, they used rather restrictive assumptions (e.g., parallel test assumptions, homogeneous inter-item correlations, homogeneous item difficulty, and homogeneous guessing levels across items) to evaluate the relation between guessing and test reliability. Based on the item response theory (IRT) framework, this study investigated the extent of the impact of guessing on reliability under more realistic conditions where item difficulty, item discrimination, and guessing levels actually vary across items with three different test lengths (TL). By accommodating multiple item characteristics simultaneously, this study also focused on examining interaction effects between guessing and other variables entered in the simulation to be more realistic. The simulation of the more realistic conditions and calculations of reliability and classical test theory (CTT) item statistics were facilitated by expressing CTT item statistics, coefficient α, and reliability in terms of IRT model parameters. In addition to the general negative impact of guessing on reliability, results showed interaction effects between TL and guessing and between guessing and test difficulty.

  9. Implicit and explicit forgetting: when is gist remembered?

    PubMed

    Dorfman, J; Mandler, G

    1994-08-01

    Recognition (YES/NO) and stem completion (cued: complete with a word from the list; and uncued: complete with the first word that comes to mind) were tested following either semantic or non-semantic processing of a categorized input list. Item/instance information was tested by contrasting target items from the input list with new items that were categorically related to them; gist/categorical information was tested by comparing target items semantically related to the input items with unrelated new items. For both recognition and stem completion, regardless of initial processing condition, item information decayed rapidly over a period of one week. Gist information was maintained over the same period when initial processing was semantic but only in the cued condition for completion. These results are discussed in terms of dual process theory, which postulates activation/integration of a representation as primarily relevant to implicit item information and elaboration of a representation as mainly relevant to semantic (i.e. categorical) information.

  10. Less we forget: retrieval cues and release from retrieval-induced forgetting.

    PubMed

    Jonker, Tanya R; Seli, Paul; Macleod, Colin M

    2012-11-01

    Retrieving some items from memory can impair the subsequent recall of other related but not retrieved items, a phenomenon called retrieval-induced forgetting (RIF). The dominant explanation of RIF-the inhibition account-asserts that forgetting occurs because related items are suppressed during retrieval practice to reduce retrieval competition. This item inhibition persists, making it more difficult to recall the related items on a later test. In our set of experiments, each category was designed such that each exemplar belonged to one of two subcategories (e.g., each BIRD exemplar was either a bird of prey or a pet bird), but this subcategory information was not made explicit during study or retrieval practice. Practicing retrieval of items from only one subcategory led to RIF for items from the other subcategory when cued only with the overall category label (BIRD) at test. However, adapting the technique of Gardiner, Craik, and Birtwistle (Journal of Learning and Verbal Behavior 11:778-783, 1972), providing subcategory cues during the final test eliminated RIF. The results challenge the inhibition account's fundamental assumption of cue independence but are consistent with a cue-based interference account.

  11. A Combined IRT and SEM Approach for Individual-Level Assessment in Test-Retest Studies

    ERIC Educational Resources Information Center

    Ferrando, Pere J.

    2015-01-01

    The standard two-wave multiple-indicator model (2WMIM) commonly used to analyze test-retest data provides information at both the group and item level. Furthermore, when applied to binary and graded item responses, it is related to well-known item response theory (IRT) models. In this article the IRT-2WMIM relations are used to obtain additional…

  12. Detecting Gender Bias Through Test Item Analysis

    NASA Astrophysics Data System (ADS)

    González-Espada, Wilson J.

    2009-03-01

    Many physical science and physics instructors might not be trained in pedagogically appropriate test construction methods. This could lead to test items that do not measure what they are intended to measure. A subgroup of these items might show bias against some groups of students. This paper describes how the author became aware of potentially biased items against females in his examinations, which led to the exploration of fundamental issues related to item validity, gender bias, and differential item functioning, or DIF. A brief discussion of DIF in the context of university courses, as well as practical suggestions to detect possible gender-biased items, follows.

  13. Examination of Different Item Response Theory Models on Tests Composed of Testlets

    ERIC Educational Resources Information Center

    Kogar, Esin Yilmaz; Kelecioglu, Hülya

    2017-01-01

    The purpose of this research is to first estimate the item and ability parameters and the standard error values related to those parameters obtained from Unidimensional Item Response Theory (UIRT), bifactor (BIF) and Testlet Response Theory models (TRT) in the tests including testlets, when the number of testlets, number of independent items, and…

  14. Decomposing the interaction between retention interval and study/test practice: The role of retrievability

    PubMed Central

    Jang, Yoonhee; Wixted, John T.; Pecher, Diane; Zeelenberg, René; Huber, David E.

    2012-01-01

    Even without feedback, test practice enhances delayed performance compared to study practice, but the size of the effect is variable across studies. We investigated the benefit of testing, separating initially retrievable items from initially non-retrievable items. In two experiments, an initial test determined item retrievability. Retrievable or non-retrievable items were subsequently presented for repeated study or test practice. Collapsing across items, in Experiment 1, we obtained the typical crossover interaction between retention interval and practice type. For retrievable items, however, the crossover interaction was quantitatively different, with a small study benefit for an immediate test and a larger testing benefit after a delay. For non-retrievable items, there was a large study benefit for an immediate test, but one week later there was no difference between the study and test practice conditions. In Experiment 2, initially non-retrievable items were given additional study followed by either an immediate test or even more additional study, and one week later performance did not differ between the two conditions. These results indicate that the effect size of study/test practice is due to the relative contribution of retrievable and non-retrievable items. PMID:22304454

  15. Decomposing the interaction between retention interval and study/test practice: the role of retrievability.

    PubMed

    Jang, Yoonhee; Wixted, John T; Pecher, Diane; Zeelenberg, René; Huber, David E

    2012-01-01

    Even without feedback, test practice enhances delayed performance compared to study practice, but the size of the effect is variable across studies. We investigated the benefit of testing, separating initially retrievable items from initially nonretrievable items. In two experiments, an initial test determined item retrievability. Retrievable or nonretrievable items were subsequently presented for repeated study or test practice. Collapsing across items, in Experiment 1, we obtained the typical cross-over interaction between retention interval and practice type. For retrievable items, however, the cross-over interaction was quantitatively different, with a small study benefit for an immediate test and a larger testing benefit after a delay. For nonretrievable items, there was a large study benefit for an immediate test, but one week later there was no difference between the study and test practice conditions. In Experiment 2, initially nonretrievable items were given additional study followed by either an immediate test or even more additional study, and one week later performance did not differ between the two conditions. These results indicate that the effect size of study/test practice is due to the relative contribution of retrievable and nonretrievable items.

  16. A Comparison of the One-and Three-Parameter Logistic Models on Measures of Test Efficiency.

    ERIC Educational Resources Information Center

    Benson, Jeri

    Two methods of item selection were used to select sets of 40 items from a 50-item verbal analogies test, and the resulting item sets were compared for relative efficiency. The BICAL program was used to select the 40 items having the best mean square fit to the one parameter logistic (Rasch) model. The LOGIST program was used to select the 40 items…

  17. A new computerized adaptive test advancing the measurement of health-related quality of life (HRQoL) in children: the Kids-CAT.

    PubMed

    Devine, J; Otto, C; Rose, M; Barthel, D; Fischer, F; Mühlan, H; Mülhan, H; Nolte, S; Schmidt, S; Ottova-Jordan, V; Ravens-Sieberer, U

    2015-04-01

    Assessing health-related quality of life (HRQoL) via Computerized Adaptive Tests (CAT) provides greater measurement precision coupled with a lower test burden compared to conventional tests. Currently, there are no European pediatric HRQoL CATs available. This manuscript aims at describing the development of a HRQoL CAT for children and adolescents: the Kids-CAT, which was developed based on the established KIDSCREEN-27 HRQoL domain structure. The Kids-CAT was developed combining classical test theory and item response theory methods and using large archival data of European KIDSCREEN norm studies (n = 10,577-19,580). Methods were applied in line with the US PROMIS project. Item bank development included the investigation of unidimensionality, local independence, exploration of Differential Item Functioning (DIF), evaluation of Item Response Curves (IRCs), estimation and norming of item parameters as well as first CAT simulations. The Kids-CAT was successfully built covering five item banks (with 26-46 items each) to measure physical well-being, psychological well-being, parent relations, social support and peers, and school well-being. The Kids-CAT item banks proved excellent psychometric properties: high content validity, unidimensionality, local independence, low DIF, and model conform IRCs. In CAT simulations, seven items were needed to achieve a measurement precision between .8 and .9 (reliability). It has a child-friendly design, is easy accessible online and gives immediate feedback reports of scores. The Kids-CAT has the potential to advance pediatric HRQoL measurement by making it less burdensome and enhancing the patient-doctor communication.

  18. Measuring more than we know? An examination of the motivational and situational influences in science achievement

    NASA Astrophysics Data System (ADS)

    Haydel, Angela Michelle

    The purpose of this dissertation was to advance theoretical understanding about fit between the personal resources of individuals and the characteristics of science achievement tasks. Testing continues to be pervasive in schools, yet we know little about how students perceive tests and what they think and feel while they are actually working on test items. This study focused on both the personal (cognitive and motivational) and situational factors that may contribute to individual differences in achievement-related outcomes. 387 eighth grade students first completed a survey including measures of science achievement goals, capability beliefs, efficacy related to multiple-choice items and performance assessments, validity beliefs about multiple-choice items and performance assessments, and other perceptions of these item formats. Students then completed science achievement tests including multiple-choice items and two performance assessments. A sample of students was asked to verbalize both thoughts and feelings as they worked through the test items. These think-alouds were transcribed and coded for evidence of cognitive, metacognitive and motivational engagement. Following each test, all students completed measures of effort, mood, energy level and strategy use during testing. Students reported that performance assessments were more challenging, authentic, interesting and valid than multiple-choice tests. They also believed that comparisons between students were easier using multiple-choice items. Overall, students tried harder, felt better, had higher levels of energy and used more strategies while working on performance assessments. Findings suggested that performance assessments might be more congruent with a mastery achievement goal orientation, while multiple-choice tests might be more congruent with a performance achievement goal orientation. A variable-centered analytic approach including regression analyses provided information about how students, on average, who differed in terms of their teachers' ratings of their science ability, achievement goals, capability beliefs and experiences with science achievement tasks perceived, engaged in, and performed on multiple-choice items and performance assessments. Person-centered analyses provided information about the perceptions, engagement and performance of subgroups of individuals who had different motivational characteristics. Generally, students' personal goals and capability beliefs related more strongly to test perceptions, but not performance, while teacher ratings of ability and test-specific beliefs related to performance.

  19. The quadratic relationship between difficulty of intelligence test items and their correlations with working memory.

    PubMed

    Smolen, Tomasz; Chuderski, Adam

    2015-01-01

    Fluid intelligence (Gf) is a crucial cognitive ability that involves abstract reasoning in order to solve novel problems. Recent research demonstrated that Gf strongly depends on the individual effectiveness of working memory (WM). We investigated a popular claim that if the storage capacity underlay the WM-Gf correlation, then such a correlation should increase with an increasing number of items or rules (load) in a Gf-test. As often no such link is observed, on that basis the storage-capacity account is rejected, and alternative accounts of Gf (e.g., related to executive control or processing speed) are proposed. Using both analytical inference and numerical simulations, we demonstrated that the load-dependent change in correlation is primarily a function of the amount of floor/ceiling effect for particular items. Thus, the item-wise WM correlation of a Gf-test depends on its overall difficulty, and the difficulty distribution across its items. When the early test items yield huge ceiling, but the late items do not approach floor, that correlation will increase throughout the test. If the early items locate themselves between ceiling and floor, but the late items approach floor, the respective correlation will decrease. For a hallmark Gf-test, the Raven-test, whose items span from ceiling to floor, the quadratic relationship is expected, and it was shown empirically using a large sample and two types of WMC tasks. In consequence, no changes in correlation due to varying WM/Gf load, or lack of them, can yield an argument for or against any theory of WM/Gf. Moreover, as the mathematical properties of the correlation formula make it relatively immune to ceiling/floor effects for overall moderate correlations, only minor changes (if any) in the WM-Gf correlation should be expected for many psychological tests.

  20. Optimal Bayesian Adaptive Design for Test-Item Calibration.

    PubMed

    van der Linden, Wim J; Ren, Hao

    2015-06-01

    An optimal adaptive design for test-item calibration based on Bayesian optimality criteria is presented. The design adapts the choice of field-test items to the examinees taking an operational adaptive test using both the information in the posterior distributions of their ability parameters and the current posterior distributions of the field-test parameters. Different criteria of optimality based on the two types of posterior distributions are possible. The design can be implemented using an MCMC scheme with alternating stages of sampling from the posterior distributions of the test takers' ability parameters and the parameters of the field-test items while reusing samples from earlier posterior distributions of the other parameters. Results from a simulation study demonstrated the feasibility of the proposed MCMC implementation for operational item calibration. A comparison of performances for different optimality criteria showed faster calibration of substantial numbers of items for the criterion of D-optimality relative to A-optimality, a special case of c-optimality, and random assignment of items to the test takers.

  1. Measuring stigma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Stigma item bank and short form.

    PubMed

    Kisala, Pamela A; Tulsky, David S; Pace, Natalie; Victorson, David; Choi, Seung W; Heinemann, Allen W

    2015-05-01

    To develop a calibrated item bank and computer adaptive test (CAT) to assess the effects of stigma on health-related quality of life in individuals with spinal cord injury (SCI). Grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, and item response theory (IRT)-based psychometric analyses. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. SCI-QOL Stigma Item Bank A sample of 611 individuals with traumatic SCI completed 30 items assessing SCI-related stigma. After 7 items were iteratively removed, factor analyses confirmed a unidimensional pool of items. Graded Response Model IRT analyses were used to estimate slopes and thresholds for the final 23 items. The SCI-QOL Stigma item bank is unique not only in the assessment of SCI-related stigma but also in the inclusion of individuals with SCI in all phases of its development. Use of confirmatory factor analytic and IRT methods provide flexibility and precision of measurement. The item bank may be administered as a CAT or as a 10-item fixed-length short form and can be used for research and clinical applications.

  2. Measuring stigma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Stigma item bank and short form

    PubMed Central

    Kisala, Pamela A.; Tulsky, David S.; Pace, Natalie; Victorson, David; Choi, Seung W.; Heinemann, Allen W.

    2015-01-01

    Objective To develop a calibrated item bank and computer adaptive test (CAT) to assess the effects of stigma on health-related quality of life in individuals with spinal cord injury (SCI). Design Grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, and item response theory (IRT)-based psychometric analyses. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Main Outcome Measures SCI-QOL Stigma Item Bank Results A sample of 611 individuals with traumatic SCI completed 30 items assessing SCI-related stigma. After 7 items were iteratively removed, factor analyses confirmed a unidimensional pool of items. Graded Response Model IRT analyses were used to estimate slopes and thresholds for the final 23 items. Conclusions The SCI-QOL Stigma item bank is unique not only in the assessment of SCI-related stigma but also in the inclusion of individuals with SCI in all phases of its development. Use of confirmatory factor analytic and IRT methods provide flexibility and precision of measurement. The item bank may be administered as a CAT or as a 10-item fixed-length short form and can be used for research and clinical applications. PMID:26010973

  3. Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar.

    PubMed

    Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald

    2006-11-01

    We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.

  4. Validation of a clinical critical thinking skills test in nursing.

    PubMed

    Shin, Sujin; Jung, Dukyoo; Kim, Sungeun

    2015-01-27

    The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability.

  5. Validation of a clinical critical thinking skills test in nursing

    PubMed Central

    2015-01-01

    Purpose: The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. Methods: This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Results: Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. Conclusion: From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability. PMID:25622716

  6. A Review of the Effects on IRT Item Parameter Estimates with a Focus on Misbehaving Common Items in Test Equating

    PubMed Central

    Michaelides, Michalis P.

    2010-01-01

    Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items. PMID:21833230

  7. A Review of the Effects on IRT Item Parameter Estimates with a Focus on Misbehaving Common Items in Test Equating.

    PubMed

    Michaelides, Michalis P

    2010-01-01

    Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items.

  8. The Act of Answering Questions Elicited Differentiated Responses in a Concealed Information Test.

    PubMed

    Otsuka, Takuro; Mizutani, Mitsuyoshi; Yagi, Akihiro; Katayama, Jun'ichi

    2018-04-17

    The concealed information test (CIT), a psychophysiological detection of deception test, compares physiological responses between crime-related and crime-unrelated items. In previous studies, whether the act of answering questions affected physiological responses was unclear. This study examined effects of both question-related and answer-related processes on physiological responses. Twenty participants received a modified CIT, in which the interval between presentation of questions and answering them was 27 s. Differentiated respiratory movements and cardiovascular responses between items were observed for both questions (items) and answers, while differentiated skin conductance response was observed only for questions. These results suggest that physiological responses to questions reflected orientation to a crime-related item, while physiological responses during answering reflected inhibition of psychological arousal caused by orienting. Regarding the CIT's accuracy, participants' perception of the questions themselves more strongly influenced physiological responses than answering them. © 2018 American Academy of Forensic Sciences.

  9. Calibrating Item Families and Summarizing the Results Using Family Expected Response Functions

    ERIC Educational Resources Information Center

    Sinharay, Sandip; Johnson, Matthew S.; Williamson, David M.

    2003-01-01

    Item families, which are groups of related items, are becoming increasingly popular in complex educational assessments. For example, in automatic item generation (AIG) systems, a test may consist of multiple items generated from each of a number of item models. Item calibration or scoring for such an assessment requires fitting models that can…

  10. PISA Test Items and School-Based Examinations in Greece: Exploring the Relationship between Global and Local Assessment Discourses

    ERIC Educational Resources Information Center

    Anagnostopoulou, Kyriaki; Hatzinikita, Vassilia; Christidou, Vasilia; Dimopoulos, Kostas

    2013-01-01

    The paper explores the relationship of the global and the local assessment discourses as expressed by Programme for International Student Assessment (PISA) test items and school-based examinations, respectively. To this end, the paper compares PISA test items related to living systems and the context of life, health, and environment, with Greek…

  11. Precision-Based Item Selection for Exposure Control in Computerized Adaptive Testing

    ERIC Educational Resources Information Center

    Carroll, Ian A.

    2017-01-01

    Item exposure control is, relative to adaptive testing, a nascent concept that has emerged only in the last two to three decades on an academic basis as a practical issue in high-stakes computerized adaptive tests. This study aims to implement a new strategy in item exposure control by incorporating the standard error of the ability estimate into…

  12. Enhanced Automatic Question Creator--EAQC: Concept, Development and Evaluation of an Automatic Test Item Creation Tool to Foster Modern e-Education

    ERIC Educational Resources Information Center

    Gutl, Christian; Lankmayr, Klaus; Weinhofer, Joachim; Hofler, Margit

    2011-01-01

    Research in automated creation of test items for assessment purposes became increasingly important during the recent years. Due to automatic question creation it is possible to support personalized and self-directed learning activities by preparing appropriate and individualized test items quite easily with relatively little effort or even fully…

  13. Development and initial psychometric evaluation of an item bank created to measure upper extremity function in persons with stroke.

    PubMed

    Higgins, Johanne; Finch, Lois E; Kopec, Jacek; Mayo, Nancy E

    2010-02-01

    To create and illustrate the development of a method to parsimoniously and hierarchically assess upper extremity function in persons after stroke. Data were analyzed using Rasch analysis. Re-analysis of data from 8 studies involving persons after stroke. Over 4000 patients with stroke who participated in various studies in Montreal and elsewhere in Canada. Data comprised 17 tests or indices of upper extremity function and health-related quality of life, for a total of 99 items related to upper extremity function. Tests and indices included, among others, the Box and Block Test, the Nine-Hole Peg Test and the Stroke Impact Scale. Data were collected at various times post-stroke from 3 days to 1 year. Once the data fit the model, a bank of items measuring upper extremity function with persons and items organized hierarchically by difficulty and ability in log units was produced. This bank forms the basis for eventual computer adaptive testing. The calibration of the items should be tested further psychometrically, as should the interpretation of the metric arising from using the item calibration to measure the upper extremity of individuals.

  14. Distinctions between Item Format and Objectivity in Scoring.

    ERIC Educational Resources Information Center

    Terwilliger, James S.

    This paper clarifies important distinctions in item writing and item scoring and considers the implications of these distinctions for developing guidelines related to test construction for training teachers. The terminology used to describe and classify paper and pencil test questions frequently confuses two distinct features of questions:…

  15. Retrieval orientation and the control of recollection: an fMRI study

    PubMed Central

    Morcom, Alexa M.; Rugg, Michael D.

    2012-01-01

    The present study used event-related fMRI to examine the impact of the adoption of different retrieval orientations on the neural correlates of recollection. In each of two study-test blocks, subjects encoded a mixed list of words and pictures, and then performed a recognition memory task with words as the test items. In one block, the requirement was to respond positively to test items corresponding to studied words, and to reject both new items and items corresponding to the studied pictures. In the other block, positive responses were made to test items corresponding to pictures, and items corresponding to words were classified along with the new items. Based on previous event-related potential (ERP) findings, we predicted that in the word task, recollection-related effects would be found for target information only. This prediction was fulfilled. In both tasks, targets elicited the characteristic pattern of recollection-related activity. By contrast, non-targets elicited this pattern in the picture task, but not in the word task. Importantly, the left angular gyrus was among the regions demonstrating this dissociation of non-target recollection effects according to retrieval orientation. The findings for the angular gyrus parallel prior findings for the `left-parietal' ERP old/new effect, and add to the evidence that the effect reflects recollection-related neural activity originating in left ventral parietal cortex. Thus, the results converge with the previous ERP findings to suggest that the processing of retrieval cues can be constrained to prevent the retrieval of goal-irrelevant information. PMID:23110678

  16. Retrieval orientation and the control of recollection: an fMRI study.

    PubMed

    Morcom, Alexa M; Rugg, Michael D

    2012-12-01

    This study used event-related fMRI to examine the impact of the adoption of different retrieval orientations on the neural correlates of recollection. In each of two study-test blocks, participants encoded a mixed list of words and pictures and then performed a recognition memory task with words as the test items. In one block, the requirement was to respond positively to test items corresponding to studied words and to reject both new items and items corresponding to the studied pictures. In the other block, positive responses were made to test items corresponding to pictures, and items corresponding to words were classified along with the new items. On the basis of previous ERP findings, we predicted that in the word task, recollection-related effects would be found for target information only. This prediction was fulfilled. In both tasks, targets elicited the characteristic pattern of recollection-related activity. By contrast, nontargets elicited this pattern in the picture task, but not in the word task. Importantly, the left angular gyrus was among the regions demonstrating this dissociation of nontarget recollection effects according to retrieval orientation. The findings for the angular gyrus parallel prior findings for the "left-parietal" ERP old/new effect and add to the evidence that the effect reflects recollection-related neural activity originating in left ventral parietal cortex. Thus, the results converge with the previous ERP findings to suggest that the processing of retrieval cues can be constrained to prevent the retrieval of goal-irrelevant information.

  17. Development and psychometric characteristics of the SCI-QOL Bladder Management Difficulties and Bowel Management Difficulties item banks and short forms and the SCI-QOL Bladder Complications scale.

    PubMed

    Tulsky, David S; Kisala, Pamela A; Tate, Denise G; Spungen, Ann M; Kirshblum, Steven C

    2015-05-01

    To describe the development and psychometric properties of the Spinal Cord Injury--Quality of Life (SCI-QOL) Bladder Management Difficulties and Bowel Management Difficulties item banks and Bladder Complications scale. Using a mixed-methods design, a pool of items assessing bladder and bowel-related concerns were developed using focus groups with individuals with spinal cord injury (SCI) and SCI clinicians, cognitive interviews, and item response theory (IRT) analytic approaches, including tests of model fit and differential item functioning. Thirty-eight bladder items and 52 bowel items were tested at the University of Michigan, Kessler Foundation Research Center, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters VA Medical Center, Bronx, NY. Seven hundred fifty-seven adults with traumatic SCI. The final item banks demonstrated unidimensionality (Bladder Management Difficulties CFI=0.965; RMSEA=0.093; Bowel Management Difficulties CFI=0.955; RMSEA=0.078) and acceptable fit to a graded response IRT model. The final calibrated Bladder Management Difficulties bank includes 15 items, and the final Bowel Management Difficulties item bank consists of 26 items. Additionally, 5 items related to urinary tract infections (UTI) did not fit with the larger Bladder Management Difficulties item bank but performed relatively well independently (CFI=0.992, RMSEA=0.050) and were thus retained as a separate scale. The SCI-QOL Bladder Management Difficulties and Bowel Management Difficulties item banks are psychometrically robust and are available as computer adaptive tests or short forms. The SCI-QOL Bladder Complications scale is a brief, fixed-length outcomes instrument for individuals with a UTI.

  18. Screening tools for the identification of dementia for adults with age-related acquired hearing or vision impairment: a scoping review.

    PubMed

    Pye, Annie; Charalambous, Anna Pavlina; Leroi, Iracema; Thodi, Chrysoulla; Dawes, Piers

    2017-11-01

    Cognitive screening tests frequently rely on items being correctly heard or seen. We aimed to identify, describe, and evaluate the adaptation, validity, and availability of cognitive screening and assessment tools for dementia which have been developed or adapted for adults with acquired hearing and/or vision impairment. Electronic databases were searched using subject terms "hearing disorders" OR "vision disorders" AND "cognitive assessment," supplemented by exploring reference lists of included papers and via consultation with health professionals to identify additional literature. 1,551 papers were identified, of which 13 met inclusion criteria. Four papers related to tests adapted for hearing impairment; 11 papers related to tests adapted for vision impairment. Frequently adapted tests were the Mini-Mental State Examination (MMSE) and the Montreal Cognitive Assessment (MOCA). Adaptations for hearing impairment involved deleting or creating written versions for hearing-dependent items. Adaptations for vision impairment involved deleting vision-dependent items or spoken/tactile versions of visual tasks. No study reported validity of the test in relation to detection of dementia in people with hearing/vision impairment. Item deletion had a negative impact on the psychometric properties of the test. While attempts have been made to adapt cognitive tests for people with acquired hearing and/or vision impairment, the primary limitation of these adaptations is that their validity in accurately detecting dementia among those with acquired hearing or vision impairment is yet to be established. It is likely that the sensitivity and specificity of the adapted versions are poorer than the original, especially if the adaptation involved item deletion. One solution would involve item substitution in an alternative sensory modality followed by re-validation of the adapted test.

  19. A Knowledge-Based Approach for Item Exposure Control in Computerized Adaptive Testing

    ERIC Educational Resources Information Center

    Doong, Shing H.

    2009-01-01

    The purpose of this study is to investigate a functional relation between item exposure parameters (IEPs) and item parameters (IPs) over parallel pools. This functional relation is approximated by a well-known tool in machine learning. Let P and Q be parallel item pools and suppose IEPs for P have been obtained via a Sympson and Hetter-type…

  20. Neural correlates of differential retrieval orientation: Sustained and item-related components.

    PubMed

    Woodruff, C Chad; Uncapher, Melina R; Rugg, Michael D

    2006-01-01

    Retrieval orientation refers to a cognitive state that biases processing of retrieval cues in service of a specific goal. The present study used a mixed fMRI design to investigate whether adoption of different retrieval orientations - as indexed by differences in the activity elicited by retrieval cues corresponding to unstudied items - is associated with differences in the state-related activity sustained across a block of test trials sharing a common retrieval goal. Subjects studied mixed lists comprising visually presented words and pictures. They then undertook a series of short test blocks in which all test items were visually presented words. The blocks varied according to whether the test items were used to cue retrieval of studied words or studied pictures. In several regions, neural activity elicited by correctly classified new items differed according to whether words or pictures were the targeted material. The loci of these effects suggest that one factor driving differential cue processing is modulation of the degree of overlap between cue and targeted memory representations. In addition to these item-related effects, neural activity sustained throughout the test blocks also differed according to the nature of the targeted material. These findings indicate that the adoption of different retrieval orientations is associated with distinct neural states. The loci of these sustained effects were distinct from those where new item activity varied, suggesting that the effects may play a role in biasing retrieval cue processing in favor of the current retrieval goal.

  1. 75 FR 82407 - Submission for OMB Review; Comment Request; Testing Successful Health Communications Surrounding...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-12-30

    ... surrounding aging-related issues from the National Institute on Aging (NIA). Type of Information Collection... information technology. Direct Comments to OMB: Written comments and/or suggestions regarding the item(s...; Comment Request; Testing Successful Health Communications Surrounding Aging-Related Issues From the...

  2. Item response theory scoring and the detection of curvilinear relationships.

    PubMed

    Carter, Nathan T; Dalal, Dev K; Guan, Li; LoPilato, Alexander C; Withrow, Scott A

    2017-03-01

    Psychologists are increasingly positing theories of behavior that suggest psychological constructs are curvilinearly related to outcomes. However, results from empirical tests for such curvilinear relations have been mixed. We propose that correctly identifying the response process underlying responses to measures is important for the accuracy of these tests. Indeed, past research has indicated that item responses to many self-report measures follow an ideal point response process-wherein respondents agree only to items that reflect their own standing on the measured variable-as opposed to a dominance process, wherein stronger agreement, regardless of item content, is always indicative of higher standing on the construct. We test whether item response theory (IRT) scoring appropriate for the underlying response process to self-report measures results in more accurate tests for curvilinearity. In 2 simulation studies, we show that, regardless of the underlying response process used to generate the data, using the traditional sum-score generally results in high Type 1 error rates or low power for detecting curvilinearity, depending on the distribution of item locations. With few exceptions, appropriate power and Type 1 error rates are achieved when dominance-based and ideal point-based IRT scoring are correctly used to score dominance and ideal point response data, respectively. We conclude that (a) researchers should be theory-guided when hypothesizing and testing for curvilinear relations; (b) correctly identifying whether responses follow an ideal point versus dominance process, particularly when items are not extreme is critical; and (c) IRT model-based scoring is crucial for accurate tests of curvilinearity. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  3. Development of Test Items Related to Selected Concepts Within the Scheme the Particle Nature of Matter.

    ERIC Educational Resources Information Center

    Doran, Rodney L.; Pella, Milton O.

    The purpose of this study was to develop tests items with a minimum reading demand for use with pupils at grade levels two through six. An item was judged to be acceptable if the item satisfied at least four of six criteria. Approximately 250 students in grades 2-6 participated in the study. Half of the students were given instruction to develop…

  4. Linguistic Simplification of Mathematics Items: Effects for Language Minority Students in Germany

    ERIC Educational Resources Information Center

    Haag, Nicole; Heppt, Birgit; Roppelt, Alexander; Stanat, Petra

    2015-01-01

    In large-scale assessment studies, language minority students typically obtain lower test scores in mathematics than native speakers. Although this performance difference was related to the linguistic complexity of test items in some studies, other studies did not find linguistically demanding math items to be disproportionally more difficult for…

  5. Improving Cancer-Related Outcomes with Connected Health - Action Items at a Glance

    Cancer.gov

    Action Item 1.1: Health IT stakeholder groups should continue to collaborate to overcome policy and technical barriers to a nationwide, interoperable health IT system. Action Item 1.2: Technical standards for information related to cancer care across the continuum should be developed, tested, disseminated, and adopted.

  6. Validity and Reliability of the 8-Item Work Limitations Questionnaire.

    PubMed

    Walker, Timothy J; Tullar, Jessica M; Diamond, Pamela M; Kohl, Harold W; Amick, Benjamin C

    2017-12-01

    Purpose To evaluate factorial validity, scale reliability, test-retest reliability, convergent validity, and discriminant validity of the 8-item Work Limitations Questionnaire (WLQ) among employees from a public university system. Methods A secondary analysis using de-identified data from employees who completed an annual Health Assessment between the years 2009-2015 tested research aims. Confirmatory factor analysis (CFA) (n = 10,165) tested the latent structure of the 8-item WLQ. Scale reliability was determined using a CFA-based approach while test-retest reliability was determined using the intraclass correlation coefficient. Convergent/discriminant validity was tested by evaluating relations between the 8-item WLQ with health/performance variables for convergent validity (health-related work performance, number of chronic conditions, and general health) and demographic variables for discriminant validity (gender and institution type). Results A 1-factor model with three correlated residuals demonstrated excellent model fit (CFI = 0.99, TLI = 0.99, RMSEA = 0.03, and SRMR = 0.01). The scale reliability was acceptable (0.69, 95% CI 0.68-0.70) and the test-retest reliability was very good (ICC = 0.78). Low-to-moderate associations were observed between the 8-item WLQ and the health/performance variables while weak associations were observed between the demographic variables. Conclusions The 8-item WLQ demonstrated sufficient reliability and validity among employees from a public university system. Results suggest the 8-item WLQ is a usable alternative for studies when the more comprehensive 25-item WLQ is not available.

  7. Test-retest reliability and construct validity of the ENERGY-child questionnaire on energy balance-related behaviours and their potential determinants: the ENERGY-project.

    PubMed

    Singh, Amika S; Vik, Froydis N; Chinapaw, Mai J M; Uijtdewilligen, Léonie; Verloigne, Maïté; Fernández-Alvira, Juan M; Stomfai, Sarolta; Manios, Yannis; Martens, Marloes; Brug, Johannes

    2011-12-09

    Insight in children's energy balance-related behaviours (EBRBs) and their determinants is important to inform obesity prevention research. Therefore, reliable and valid tools to measure these variables in large-scale population research are needed. To examine the test-retest reliability and construct validity of the child questionnaire used in the ENERGY-project, measuring EBRBs and their potential determinants among 10-12 year old children. We collected data among 10-12 year old children (n = 730 in the test-retest reliability study; n = 96 in the construct validity study) in six European countries, i.e. Belgium, Greece, Hungary, the Netherlands, Norway, and Spain. Test-retest reliability was assessed using the intra-class correlation coefficient (ICC) and percentage agreement comparing scores from two measurements, administered one week apart. To assess construct validity, the agreement between questionnaire responses and a subsequent face-to-face interview was assessed using ICC and percentage agreement. Of the 150 questionnaire items, 115 (77%) showed good to excellent test-retest reliability as indicated by ICCs > .60 or percentage agreement ≥ 75%. Test-retest reliability was moderate for 34 items (23%) and poor for one item. Construct validity appeared to be good to excellent for 70 (47%) of the 150 items, as indicated by ICCs > .60 or percentage agreement ≥ 75%. From the other 80 items, construct validity was moderate for 39 (26%) and poor for 41 items (27%). Our results demonstrate that the ENERGY-child questionnaire, assessing EBRBs of the child as well as personal, family, and school-environmental determinants related to these EBRBs, has good test-retest reliability and moderate to good construct validity for the large majority of items.

  8. Test-retest reliability and construct validity of the ENERGY-child questionnaire on energy balance-related behaviours and their potential determinants: the ENERGY-project

    PubMed Central

    2011-01-01

    Background Insight in children's energy balance-related behaviours (EBRBs) and their determinants is important to inform obesity prevention research. Therefore, reliable and valid tools to measure these variables in large-scale population research are needed. Objective To examine the test-retest reliability and construct validity of the child questionnaire used in the ENERGY-project, measuring EBRBs and their potential determinants among 10-12 year old children. Methods We collected data among 10-12 year old children (n = 730 in the test-retest reliability study; n = 96 in the construct validity study) in six European countries, i.e. Belgium, Greece, Hungary, the Netherlands, Norway, and Spain. Test-retest reliability was assessed using the intra-class correlation coefficient (ICC) and percentage agreement comparing scores from two measurements, administered one week apart. To assess construct validity, the agreement between questionnaire responses and a subsequent face-to-face interview was assessed using ICC and percentage agreement. Results Of the 150 questionnaire items, 115 (77%) showed good to excellent test-retest reliability as indicated by ICCs > .60 or percentage agreement ≥ 75%. Test-retest reliability was moderate for 34 items (23%) and poor for one item. Construct validity appeared to be good to excellent for 70 (47%) of the 150 items, as indicated by ICCs > .60 or percentage agreement ≥ 75%. From the other 80 items, construct validity was moderate for 39 (26%) and poor for 41 items (27%). Conclusions Our results demonstrate that the ENERGY-child questionnaire, assessing EBRBs of the child as well as personal, family, and school-environmental determinants related to these EBRBs, has good test-retest reliability and moderate to good construct validity for the large majority of items. PMID:22152048

  9. Memory in pregnancy and post-partum: Item specific and relational encoding processes in recall and recognition.

    PubMed

    Spataro, Pietro; Saraulli, Daniele; Oriolo, Debora; Costanzi, Marco; Zanetti, Humberto; Cestari, Vincenzo; Rossi-Arnaud, Clelia

    2016-08-01

    It has been recently proposed that pregnant women would perform memory tasks by focusing more on item-specific processes and less on relational processing, compared to post-partum women (Mickes, Wixted, Shapiro & Scarff, ). The present cross-sectional study tested this hypothesis by directly manipulating the type of encoding employed in the study phase. Pregnant, post-partum and control women either rated the pleasantness of word meaning (which induced item-specific elaboration) or named the semantic category to which they belonged (which induced relational elaboration). Memory for the encoded words was later tested in free recall (which emphasizes relational processing) and in recognition (which emphasizes item-specific processing). In line with Mickes et al.'s () conclusions, pregnant women in the item-specific condition performed worse than post-partum women in the relational condition in free recall, but not in recognition. However, compared to the other two groups, pregnant women also exhibited lower recognition accuracy in the item-specific condition. Overall, these results confirm that pregnant women rely on relational encoding less than post-partum women, but additionally suggest that the former group might use item-specific processes less efficiently than post-partum and control women. © 2016 Scandinavian Psychological Associations and John Wiley & Sons Ltd.

  10. No retrieval-induced forgetting using item-specific independent cues: evidence against a general inhibitory account.

    PubMed

    Camp, Gino; Pecher, Diane; Schmidt, Henk G

    2007-09-01

    Retrieval practice with particular items from memory can impair the recall of related items on a later memory test. This retrieval-induced forgetting effect has been ascribed to inhibitory processes (M. C. Anderson & B. A. Spellman, 1995). A critical finding that distinguishes inhibitory from interference explanations is that forgetting is found with independent (or extralist) cues. In 4 experiments, the authors tested whether the forgetting effect is cue-independent. Forgetting was investigated for both studied and unstudied semantically related items. Retrieval-induced forgetting was not found using item-specific independent cues for either studied or unstudied items. However, forgetting was found for both item types when studied categories were used as cues. These results are not in line with a general inhibitory account, because this account predicts retrieval-induced forgetting with independent cues. Interference and context-specific inhibition are discussed as possible explanations for the data. 2007 APA

  11. Assertive Behavior and Cognitive Performance in Preschool Children

    ERIC Educational Resources Information Center

    Dorman, Lynn

    1973-01-01

    Assertive behaviors were related to each other and to intelligence test scores. An item analysis revealed that more assertive children did better on certain intelligence test items: comprehension, verbal, and discrimination. (ST)

  12. The Dependence on Mathematical Theory in TIMSS, PISA and TIMSS Advanced Test Items and Its Relation to Student Achievement

    ERIC Educational Resources Information Center

    Hole, Arne; Grønmo, Liv Sissel; Onstad, Torgeir

    2018-01-01

    Background: This paper discusses a framework for analyzing the dependence on mathematical theory in test items, that is, a framework for discussing to what extent knowledge of mathematical theory is helpful for the student in solving the item. The framework can be applied to any test in which some knowledge of mathematical theory may be useful,…

  13. Social Desirability Bias Against Admitting Anger: Bias in the Test-Taker or Bias in the Test?

    PubMed

    Fernandez, Ephrem; Woldgabreal, Yilma; Guharajan, Deepan; Day, Andrew; Kiageri, Vasiliki; Ramtahal, Nirvana

    2018-05-09

    The veracity of self-report is often questioned, especially in anger, which is particularly susceptible to social desirability bias (SDB). However, could tests of SDB be themselves susceptible to bias? This study aimed to replicate the inverse correlation between a common test of SDB and a test of anger, to deconstruct this relationship according to anger-related versus non-anger-related items, and to reevaluate factor structure and reliability of the SDB test. More than 200 students were administered the Marlowe-Crowne Social Desirability Scale Short Version [M-C1(10)] and the Anger Parameters Scale (APS). Results confirmed that anger and SDB scores were significantly and inversely correlated. This intercorrelation became nonsignificant when the 4 anger-related items were omitted from the M-C1(10). Confirmatory factor analyses showed excellent fit for a model comprising anger items of the M-C1(10) but not for models of the entire instrument or nonanger items. The first model also attained high internal consistency. Thus, the significant negative correlation between anger and SDB is attributable to 4 M-C1(10) anger items, for which low ratings are automatically scored as high SDB; this stems from a tenuous assumption that low anger reports are invariably biased. The SDB test risks false positives of faking good and should be used with caution.

  14. Three controversies over item disclosure in medical licensure examinations.

    PubMed

    Park, Yoon Soo; Yang, Eunbae B

    2015-01-01

    In response to views on public's right to know, there is growing attention to item disclosure - release of items, answer keys, and performance data to the public - in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations - 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure - by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers' right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration.

  15. On Maximizing Item Information and Matching Difficulty with Ability.

    ERIC Educational Resources Information Center

    Bickel, Peter; Buyske, Steven; Chang, Huahua; Ying, Zhiliang

    2001-01-01

    Examined the assumption that matching difficulty levels of test items with an examinee's ability makes a test more efficient and challenged this assumption through a class of one-parameter item response theory models. Found the validity of the fundamental assumption to be closely related to the van Zwet tail ordering of symmetric distributions (W.…

  16. Structural, Linguistic and Topic Variables in Verbal and Computational Problems in Elementary Mathematics.

    ERIC Educational Resources Information Center

    Beardslee, Edward C.; Jerman, Max E.

    Five structural, four linguistic and twelve topic variables are used in regression analyses on results of a 50-item achievement test. The test items are related to 12 topics from the third-grade mathematics curriculum. The items reflect one of two cases of the structural variable, cognitive level; the two levels are characterized, inductive…

  17. Japanese-English language equivalence of the Cognitive Abilities Screening Instrument among Japanese-Americans.

    PubMed

    Gibbons, Laura E; McCurry, Susan; Rhoads, Kristoffer; Masaki, Kamal; White, Lon; Borenstein, Amy R; Larson, Eric B; Crane, Paul K

    2009-02-01

    The Cognitive Abilities Screening Instrument (CASI) was designed for use in cross-cultural studies of Japanese and Japanese-American elderly in Japan and the U.S.A. The measurement equivalence in Japanese and English had not been confirmed in prior studies. We analyzed the 40 CASI items for differential item functioning (DIF) related to test language, as well as self-reported proficiency with written Japanese, age, and educational attainment in two large epidemiologic studies of Japanese-American elderly: the Kame Project (n=1708) and the Honolulu-Asia Aging Study (HAAS; n = 3148). DIF was present if the demographic groups differed in the probability of success on an item, after controlling for their underlying cognitive functioning ability. While seven CASI items had DIF related to language of testing in Kame (registration of one item; recall of one item; similes; judgment; repeating a phrase; reading and performing a command; and following a three-step instruction), the impact of DIF on participants' scores was minimal. Mean scores for Japanese and English speakers in Kame changed by <0.1 SD after accounting for DIF related to test language. In HAAS, insufficient numbers of participants were tested in Japanese to assess DIF related to test language. In both studies, DIF related to written Japanese proficiency, age, and educational attainment had minimal impact. To the extent that DIF could be assessed, the CASI appeared to meet the goal of measuring cognitive function equivalently in Japanese and English. Stratified data collection would be needed to confirm this conclusion. DIF assessment should be used in other studies with multiple language groups to confirm that measures function equivalently or, if not, form scores that account for DIF.

  18. Bias in Testing: A Presentation of Selected Methods.

    ERIC Educational Resources Information Center

    Merz, William R.; Rudner, Lawrence M.

    A variety of terms related to test bias or test fairness have been used in a variety of ways, but in this document the "fair use of tests" is defined as equitable selection procedures by means of intact tests, and "test item bias" refers to the study of separate items with respect to the tests of which they are a part. Seven…

  19. Solving the measurement invariance anchor item problem in item response theory.

    PubMed

    Meade, Adam W; Wright, Natalie A

    2012-09-01

    The efficacy of tests of differential item functioning (measurement invariance) has been well established. It is clear that when properly implemented, these tests can successfully identify differentially functioning (DF) items when they exist. However, an assumption of these analyses is that the metric for different groups is linked using anchor items that are invariant. In practice, however, it is impossible to be certain which items are DF and which are invariant. This problem of anchor items, or referent indicators, has long plagued invariance research, and a multitude of suggested approaches have been put forth. Unfortunately, the relative efficacy of these approaches has not been tested. This study compares 11 variations on 5 qualitatively different approaches from recent literature for selecting optimal anchor items. A large-scale simulation study indicates that for nearly all conditions, an easily implemented 2-stage procedure recently put forth by Lopez Rivas, Stark, and Chernyshenko (2009) provided optimal power while maintaining nominal Type I error. With this approach, appropriate anchor items can be easily and quickly located, resulting in more efficacious invariance tests. Recommendations for invariance testing are illustrated using a pedagogical example of employee responses to an organizational culture measure.

  20. The Child-care Food and Activity Practices Questionnaire (CFAPQ): development and first validation steps.

    PubMed

    Gubbels, Jessica S; Sleddens, Ester Fc; Raaijmakers, Lieke Ch; Gies, Judith M; Kremers, Stef Pj

    2016-08-01

    To develop and validate a questionnaire to measure food-related and activity-related practices of child-care staff, based on existing, validated parenting practices questionnaires. A selection of items from the Comprehensive Feeding Practices Questionnaire (CFPQ) and the Preschooler Physical Activity Parenting Practices (PPAPP) questionnaire was made to include items most suitable for the child-care setting. The converted questionnaire was pre-tested among child-care staff during cognitive interviews and pilot-tested among a larger sample of child-care staff. Factor analyses with Varimax rotation and internal consistencies were used to examine the scales. Spearman correlations, t tests and ANOVA were used to examine associations between the scales and staff's background characteristics (e.g. years of experience, gender). Child-care centres in the Netherlands. The qualitative pre-test included ten child-care staff members. The quantitative pilot test included 178 child-care staff members. The new questionnaire, the Child-care Food and Activity Practices Questionnaire (CFAPQ), consists of sixty-three items (forty food-related and twenty-three activity-related items), divided over twelve scales (seven food-related and five activity-related scales). The CFAPQ scales are to a large extent similar to the original CFPQ and PPAPP scales. The CFAPQ scales show sufficient internal consistency with Cronbach's α ranging between 0·53 and 0·96, and average corrected item-total correlations within acceptable ranges (0·30-0·89). Several of the scales were significantly associated with child-care staff's background characteristics. Scale psychometrics of the CFAPQ indicate it is a valid questionnaire that assesses child-care staff's practices related to both food and activities.

  1. What Is Learned when Concept Learning Fails?--A Theory of Restricted-Domain Relational Learning

    ERIC Educational Resources Information Center

    Wright, Anthony A.; Lickteig, Mark T.

    2010-01-01

    Two matching-to-sample (MTS) and four same/different (S/D) experiments employed tests to distinguish between item-specific learning and relational learning. One MTS experiment showed item-specific learning when concept learning failed (i.e., no novel-stimulus transfer). Another MTS experiment showed item-specific learning when pigeons'…

  2. Comparison promotes learning and transfer of relational categories.

    PubMed

    Kurtz, Kenneth J; Boukrina, Olga; Gentner, Dedre

    2013-07-01

    We investigated the effect of co-presenting training items during supervised classification learning of novel relational categories. Strong evidence exists that comparison induces a structural alignment process that renders common relational structure more salient. We hypothesized that comparisons between exemplars would facilitate learning and transfer of categories that cohere around a common relational property. The effect of comparison was investigated using learning trials that elicited a separate classification response for each item in presentation pairs that could be drawn from the same or different categories. This methodology ensures consideration of both items and invites comparison through an implicit same-different judgment inherent in making the two responses. In a test phase measuring learning and transfer, the comparison group significantly outperformed a control group receiving an equivalent training session of single-item classification learning. Comparison-based learners also outperformed the control group on a test of far transfer, that is, the ability to accurately classify items from a novel domain that was relationally alike, but surface-dissimilar, to the training materials. Theoretical and applied implications of this comparison advantage are discussed. PsycINFO Database Record (c) 2013 APA, all rights reserved.

  3. Food Service Supervisor. Dietetic Support Personnel Achievement Test.

    ERIC Educational Resources Information Center

    Oklahoma State Dept. of Vocational and Technical Education, Stillwater.

    This guide contains a series of multiple-choice items and guidelines to assist instructors in composing criterion-referenced tests for use in the food service supervisor component of Oklahoma's Dietetic Support Personnel training program. Test items addressing each of the following occupational duty areas are provided: human relations; nutrient…

  4. Food Production Worker. Dietetic Support Personnel Achievement Test.

    ERIC Educational Resources Information Center

    Oklahoma State Dept. of Vocational and Technical Education, Stillwater.

    This guide contains a series of multiple-choice items and guidelines to assist instructors in composing criterion-referenced tests for use in the food production worker component of Oklahoma's Dietetic Support Personnel training program. Test items addressing each of the following occupational duty areas are provided: human relations; hygiene and…

  5. Mutual Information Item Selection in Adaptive Classification Testing

    ERIC Educational Resources Information Center

    Weissman, Alexander

    2007-01-01

    A general approach for item selection in adaptive multiple-category classification tests is provided. The approach uses mutual information (MI), a special case of the Kullback-Leibler distance, or relative entropy. MI works efficiently with the sequential probability ratio test and alleviates the difficulties encountered with using other local-…

  6. Food Service Worker. Dietetic Support Personnel Achievement Test.

    ERIC Educational Resources Information Center

    Oklahoma State Dept. of Vocational and Technical Education, Stillwater.

    This guide contains a series of multiple-choice items and guidelines to assist instructors in composing criterion-referenced tests for use in the food service worker component of Oklahoma's Dietetic Support Personnel training program. Test items addressing each of the following occupational duty areas are provided: human relations; personal…

  7. Shortening of an existing generic online health-related quality of life instrument for dogs.

    PubMed

    Reid, J; Wiseman-Orr, L; Scott, M

    2017-10-11

    Development, initial validation and reliability testing of a shortened version of a web-based questionnaire instrument to measure generic health-related quality of life in companion dogs, to facilitate smartphone and online use. The original 46 items were reduced using expert judgment and factor analysis. Items were removed on the basis of item loadings and communalities on factors identified through factor analysis of responses from owners of healthy and unwell dogs, intrafactor item correlations, readability of items in the UK, USA and Australia and ability of individual items to discriminate between healthy and unwell dogs. Validity was assessed through factor analysis and a field trial using a "known groups" approach. Test-retest reliability was assessed using intraclass correlation coefficients. The new instrument comprises 22 items, each of which was rated by dog owners using a 7-point Likert scale. Factor analysis revealed a structure with four health-related quality of life domains (energetic/enthusiastic, happy/content, active/comfortable, and calm/relaxed) accounting for 72% of the variability in the data compared with 64% for the original instrument. The field test involving 153 healthy and unwell dogs demonstrated good discriminative properties and high intraclass correlation coefficients. The 22-item shortened form is superior to the original instrument and can be accessed via a mobile phone app. This is likely to increase the acceptability to dog owners as a routine wellness measure in health care packages and as a therapeutic monitoring tool. © 2017 British Small Animal Veterinary Association.

  8. HIV-Related Stigma Among Spanish-speaking Latinos in an Emerging Immigrant Receiving City.

    PubMed

    Dolwick Grieb, Suzanne M; Shah, Harita; Flores-Miller, Alejandra; Zelaya, Carla; Page, Kathleen R

    2017-08-01

    HIV-related stigma has been associated with a reluctance to test for HIV among Latinos. This study assessed community HIV-related stigma within an emerging Latino immigrant receiving city. We conducted a brief survey among a convenience sample of 312 Spanish-speaking Latinos in Baltimore, Maryland. HIV-related stigma was assessed through six items. Associations between stigma items, socio-demographic characteristics, and HIV testing history were considered. Gender, education, and religiosity were significantly associated with stigmatizing HIV-related beliefs. For example, men were 3.4 times more likely to hold more than three stigmatizing beliefs than women, and were also twice as likely as women to report feeling hesitant to test for HIV for fear of people's reaction if the test is positive. These findings can help inform future stigma interventions in this community. In particular, we were able to distinguish between drivers of stigma such as fear and moralistic attitudes, highlighting specific actionable items.

  9. Using the Rasch Measurement Model in Psychometric Analysis of the Family Effectiveness Measure

    PubMed Central

    McCreary, Linda L.; Conrad, Karen M.; Conrad, Kendon J.; Scott, Christy K; Funk, Rodney R.; Dennis, Michael L.

    2013-01-01

    Background Valid assessment of family functioning can play a vital role in optimizing client outcomes. Because family functioning is influenced by family structure, socioeconomic context, and culture, existing measures of family functioning--primarily developed with nuclear, middle class European American families--may not be valid assessments of families in diverse populations. The Family Effectiveness Measure was developed to address this limitation. Objectives To test the Family Effectiveness Measure with data from a primarily low-income African American convenience sample, using the Rasch measurement model. Method A sample of 607 adult women completed the measure. Rasch analysis was used to assess unidimensionality, response category functioning, item fit, person reliability, differential item functioning by race and parental status, and item hierarchy. Criterion-related validity was tested using correlations with five other variables related to family functioning. Results The Family Effectiveness Measure measures two separate constructs: The effective family functioning construct was a psychometrically sound measure of the target construct that was more efficient due to the deletion of 22 items. The ineffective family functioning construct consisted of 16 of those deleted items but was not as strong psychometrically. Items in both constructs evidenced no differential item functioning by race. Criterion-related validity was supported for both. Discussion In contrast to the prevailing conceptualization that family functioning is a single construct, assessed by positively and negatively worded items, use of the Rasch analysis suggested the existence of two constructs. While the effective family functioning is a strong and efficient measure of family functioning, the ineffective family functioning will require additional item development and psychometric testing. PMID:23636342

  10. Three controversies over item disclosure in medical licensure examinations

    PubMed Central

    Park, Yoon Soo; Yang, Eunbae B.

    2015-01-01

    In response to views on public's right to know, there is growing attention to item disclosure – release of items, answer keys, and performance data to the public – in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations – 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure – by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers’ right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration. PMID:26374693

  11. Modeling the Severity of Drinking Consequences in First-Year College Women: An Item Response Theory Analysis of the Rutgers Alcohol Problem Index*

    PubMed Central

    Cohn, Amy M.; Hagman, Brett T.; Graff, Fiona S.; Noel, Nora E.

    2011-01-01

    Objective: The present study examined the latent continuum of alcohol-related negative consequences among first-year college women using methods from item response theory and classical test theory. Method: Participants (N = 315) were college women in their freshman year who reported consuming any alcohol in the past 90 days and who completed assessments of alcohol consumption and alcohol-related negative consequences using the Rutgers Alcohol Problem Index. Results: Item response theory analyses showed poor model fit for five items identified in the Rutgers Alcohol Problem Index. Two-parameter item response theory logistic models were applied to the remaining 18 items to examine estimates of item difficulty (i.e., severity) and discrimination parameters. The item difficulty parameters ranged from 0.591 to 2.031, and the discrimination parameters ranged from 0.321 to 2.371. Classical test theory analyses indicated that the omission of the five misfit items did not significantly alter the psychometric properties of the construct. Conclusions: Findings suggest that those consequences that had greater severity and discrimination parameters may be used as screening items to identify female problem drinkers at risk for an alcohol use disorder. PMID:22051212

  12. Monitoring memory errors: the influence of the veracity of retrieved information on the accuracy of judgements of learning.

    PubMed

    Rhodes, Matthew G; Tauber, Sarah K

    2011-11-01

    The current study examined the degree to which predictions of memory performance made immediately or at a delay are sensitive to confidently held memory illusions. Participants studied unrelated pairs of words and made judgements of learning (JOLs) for each item, either immediately or after a delay. Half of the unrelated pairs (deceptive items; e.g., nurse-dollar) had a semantically related competitor (e.g., doctor) that was easily accessible when given a test cue (e.g., nurse-do_ _ _r) and half had no semantically related competitor (control items; e.g., subject-dollar). Following the study phase, participants were administered a cued recall test. Results from Experiment 1 showed that memory performance was less accurate for deceptive compared with control items. In addition, delaying judgement improved the relative accuracy of JOLs for control items but not for deceptive items. Subsequent experiments explored the degree to which the relative accuracy of delayed JOLs for deceptive items improved as a result of a warning to ensure that retrieved memories were accurate (Experiment 2) and corrective feedback regarding the veracity of information retrieved prior to making a JOL (Experiment 3). In all, these data suggest that delayed JOLs may be largely insensitive to memory errors unless participants are provided with feedback regarding memory accuracy.

  13. Development of Physical Activity-Related Parenting Practices Scales for Urban Chinese Parents of Preschoolers: Confirmatory Factor Analysis and Reliability.

    PubMed

    Suen, Yi-Nam; Cerin, Ester; Barnett, Anthony; Huang, Wendy Y J; Mellecker, Robin R

    2017-09-01

    Valid instruments of parenting practices related to children's physical activity (PA) are essential to understand how parents affect preschoolers' PA. This study developed and validated a questionnaire of PA-related parenting practices for Chinese-speaking parents of preschoolers in Hong Kong. Parents (n = 394) completed a questionnaire developed using findings from formative qualitative research and literature searches. Test-retest reliability was determined on a subsample (n = 61). Factorial validity was assessed using confirmatory factor analysis. Subscale internal consistency was determined. The scale of parenting practices encouraging PA comprised 2 latent factors: Modeling, structure and participatory engagement in PA (23 items), and Provision of appropriate places for child's PA (4 items). The scale of parenting practices discouraging PA scale encompassed 4 latent factors: Safety concern/overprotection (6 items), Psychological/behavioral control (5 items), Promoting inactivity (4 items), and Promoting screen time (2 items). Test-retest reliabilities were moderate to excellent (0.58 to 0.82), and internal subscale reliabilities were acceptable (0.63 to 0.89). We developed a theory-based questionnaire for assessing PA-related parenting practices among Chinese-speaking parents of Hong Kong preschoolers. While some items were context and culture specific, many were similar to those previously found in other populations, indicating a degree of construct generalizability across cultures.

  14. Development and reliability of a scale of physical-activity related informal social control for parents of Chinese pre-schoolers.

    PubMed

    Suen, Yi-Nam; Cerin, Ester; Mellecker, Robin R

    2014-07-18

    Parents' perceived informal social control, defined as the informal ways residents intervene to create a safe and orderly neighbourhood environment, may influence young children's physical activity (PA) in the neighbourhood. This study aimed to develop and test the reliability of a scale of PA-related informal social control relevant to Chinese parents/caregivers of pre-schoolers (children aged 3 to 5 years) living in Hong Kong. Nominal Group Technique (NGT), a structured, multi-step brainstorming technique, was conducted with two groups of caregivers (mainly parents; n = 11) of Hong Kong pre-schoolers in June 2011. Items collected in the NGT sessions and those generated by a panel of experts were used to compile a list of items (n = 22) for a preliminary version of a questionnaire of informal social control. The newly-developed scale was tested with 20 Chinese-speaking parents/caregivers using cognitive interviews (August 2011). The modified scale, including all 22 original items of which a few were slightly reworded, was subsequently administered on two occasions, a week apart, to 61 Chinese parents/caregivers of Hong Kong pre-schoolers in early 2012. The test-retest reliability and internal consistency of the items and scale were examined using intraclass correlation coefficients (ICC), paired t-tests, relative percentages of shifts in responses to items, and Cronbach's α coefficient. Thirteen items generated by parents/caregivers and nine items generated by the panel of experts (total 22 items) were included in a first working version of the scale and classified into three subscales: "Personal involvement and general informal supervision", "Civic engagement for the creation of a better neighbourhood environment" and "Educating and assisting neighbourhood children". Twenty out of 22 items showed moderate to excellent test-test reliability (ICC range: 0.40-0.81). All three subscales of informal social control showed acceptable levels of internal consistency (Cronbach's α >0.70). A reliable scale examining PA-related informal social control relevant to Chinese parents/caregivers of pre-schoolers living in Hong Kong was developed. Further studies should examine the factorial validity of the scale, its associations with Chinese children's PA and its appropriateness for other populations of parents of young children.

  15. Development of Two-Tier Diagnostic Test Pictorial-Based for Identifying High School Students Misconceptions on the Mole Concept

    NASA Astrophysics Data System (ADS)

    Siswaningsih, W.; Firman, H.; Zackiyah; Khoirunnisa, A.

    2017-02-01

    The aim of this study was to develop the two-tier pictorial-based diagnostic test for identifying student misconceptions on mole concept. The method of this study is used development and validation. The development of the test Obtained through four phases, development of any items, validation, determination key, and application test. Test was developed in the form of pictorial consisting of two tier, the first tier Consist of four possible answers and the second tier Consist of four possible reasons. Based on the results of content validity of 20 items using the CVR (Content Validity Ratio), a number of 18 items declared valid. Based on the results of the reliability test using SPSS, Obtained 17 items with Cronbach’s Alpha value of 0703, the which means that items have accepted. A total of 10 items was conducted to 35 students of senior high school students who have studied the mole concept on one of the high schools in Cimahi. Based on the results of the application test, student misconceptions were identified in each label concept in mole concept with the percentage of misconceptions on the label concept of mole (60.15%), Avogadro’s number (34.28%), relative atomic mass (62, 84%), relative molecule mass (77.08%), molar mass (68.53%), molar volume of gas (57.11%), molarity (71.32%), chemical equation (82.77%), limiting reactants (91.40%), and molecular formula (77.13%).

  16. Parietal cortex and episodic memory retrieval in schizophrenia.

    PubMed

    Lepage, Martin; Pelletier, Marc; Achim, Amélie; Montoya, Alonso; Menear, Matthew; Lal, Sam

    2010-06-30

    People with schizophrenia consistently show memory impairment on varying tasks including item recognition memory. Relative to the correct rejection of distracter items, the correct recognition of studied items consistently produces an effect termed the old/new effect that is characterized by increased activity in parietal and frontal cortical regions. This effect has received only scant attention in schizophrenia. We examined the old/new effect in 15 people with schizophrenia and 18 controls during an item recognition test, and neural activity was examined with event-related functional magnetic resonance imaging. Both groups performed equally well during the recognition test and showed increased activity in a left dorsolateral prefrontal region and in the precuneus bilaterally during the successful recognition of old items relative to the correct rejection of new items. The control group also exhibited increased activity in the dorsal left parietal cortex. This region has been implicated in the top-down modulation of memory which involves control processes that support memory-retrieval search, monitoring and verification. Although these processes may not be of paramount importance in item recognition memory performance, the present findings suggest that people with schizophrenia may have difficulty with such top-down modulation, a finding consistent with many other studies in information processing.

  17. Encoding and retrieval processes involved in the access of source information in the absence of item memory.

    PubMed

    Ball, B Hunter; DeWitt, Michael R; Knight, Justin B; Hicks, Jason L

    2014-09-01

    The current study sought to examine the relative contributions of encoding and retrieval processes in accessing contextual information in the absence of item memory using an extralist cuing procedure in which the retrieval cues used to query memory for contextual information were related to the target item but never actually studied. In Experiments 1 and 2, participants studied 1 category member (e.g., onion) from a variety of different categories and at test were presented with an unstudied category label (e.g., vegetable) to probe memory for item and source information. In Experiments 3 and 4, 1 member of unidirectional (e.g., credit or card) or bidirectional (e.g., salt or pepper) associates was studied, whereas the other unstudied member served as a test probe. When recall failed, source information was accessible only when items were processed deeply during encoding (Experiments 1 and 2) and when there was strong forward associative strength between the retrieval cue and target (Experiments 3 and 4). These findings suggest that a retrieval probe diagnostic of semantically related item information reinstantiates information bound in memory during encoding that results in reactivation of associated contextual information, contingent upon sufficient learning of the item itself and the association between the item and its context information.

  18. Exercise barriers self-efficacy: development and validation of a subcale for individuals with cancer-related lymphedema.

    PubMed

    Buchan, Jena; Janda, Monika; Box, Robyn; Rogers, Laura; Hayes, Sandi

    2015-03-18

    No tool exists to measure self-efficacy for overcoming lymphedema-related exercise barriers in individuals with cancer-related lymphedema. However, an existing scale measures confidence to overcome general exercise barriers in cancer survivors. Therefore, the purpose of this study was to develop, validate and assess the reliability of a subscale, to be used in conjunction with the general barriers scale, for determining exercise barriers self-efficacy in individuals facing lymphedema-related exercise barriers. A lymphedema-specific exercise barriers self-efficacy subscale was developed and validated using a cohort of 106 cancer survivors with cancer-related lymphedema, from Brisbane, Australia. An initial ten-item lymphedema-specific barrier subscale was developed and tested, with participant feedback and principal components analysis results used to guide development of the final version. Validity and test-retest reliability analyses were conducted on the final subscale. The final lymphedema-specific subscale contained five items. Principal components analysis revealed these items loaded highly (>0.75) on a separate factor when tested with a well-established nine-item general barriers scale. The final five-item subscale demonstrated good construct and criterion validity, high internal consistency (Cronbach's alpha = 0.93) and test-retest reliability (ICC = 0.67, p < 0.01). A valid and reliable lymphedema-specific subscale has been developed to assess exercise barriers self-efficacy in individuals with cancer-related lymphedema. This scale can be used in conjunction with an existing general exercise barriers scale to enhance exercise adherence in this understudied patient group.

  19. Functional and Neuroanatomical Specificity of Episodic Memory Dysfunction in Schizophrenia: An fMRI study of the Relational and Item-Specific Encoding Task

    PubMed Central

    Ragland, J. Daniel; Ranganath, Charan; Harms, Michael P.; Barch, Deanna M.; Gold, James M.; Layher, Evan; Lesh, Tyler A.; MacDonald, Angus W.; Niendam, Tara A.; Phillips, Joshua; Silverstein, Steven M.; Yonelinas, Andrew P.; Carter, Cameron S.

    2015-01-01

    Importance Individuals with schizophrenia (SZ) can encode item-specific information to support familiarity-based recognition, but are disproportionately impaired encoding inter-item relationships (relational encoding) and recollecting information. The Relational and Item-Specific Encoding (RiSE) paradigm has been used to disentangle these encoding and retrieval processes, which may be dependent on specific medial temporal lobe (MTL) and prefrontal cortex (PFC) subregions. Functional imaging during RiSE task performance could help to specify dysfunctional neural circuits in SZ that can be targeted for interventions to improve memory and functioning in the illness. Objectives To use functional magnetic resonance imaging (fMRI) to test the hypothesis that SZ disproportionately affects MTL and PFC subregions during relational encoding and retrieval, relative to item-specific memory processes. Imaging results from healthy comparison subjects (HC) will also be used to establish neural construct validity for RiSE. Design, Setting, and Participants This multi-site, case-control, cross-sectional fMRI study was conducted at five CNTRACS sites. The final sample included 52 clinically stable outpatients with SZ, and 57 demographically matched HC. Main Outcomes and Measures Behavioral performance speed and accuracy (d’) on item recognition and associative recognition tasks. Voxelwise statistical parametric maps for a priori MTL and PFC regions of interest (ROI), testing activation differences between relational and item-specific memory during encoding and retrieval. Results Item recognition was disproportionately impaired in SZ patients relative to controls following relational encoding. The differential deficit was accompanied by reduced dorsolateral prefrontal cortex (DLPFC) activation during relational encoding in SZ, relative to HC. Retrieval success (hits > misses) was associated with hippocampal (HI) activation in HC during relational item recognition and associative recognition conditions, and HI activation was specifically reduced in SZ for recognition of relational but not item-specific information. Conclusions In this unique, multi-site fMRI study, HC results supported RiSE construct validity by revealing expected memory effects in PFC and MTL subregions during encoding and retrieval. Comparison of SZ and HC revealed disproportionate memory deficits in SZ for relational versus item-specific information, accompanied by regionally and functionally specific deficits in DLPFC and HI activation. PMID:26200928

  20. Dying to remember, remembering to survive: mortality salience and survival processing.

    PubMed

    Burns, Daniel J; Hart, Joshua; Kramer, Melanie E; Burns, Amy D

    2014-01-01

    Processing items for their relevance to survival improves recall for those items relative to numerous other deep processing encoding techniques. Perhaps related, placing individuals in a mortality salient state has also been shown to enhance retention of items encoded after the morality salience manipulation (e.g., in a pleasantness rating task), a phenomenon we dubbed the "dying-to-remember" (DTR) effect. The experiments reported here further explored the effect and tested the possibility that the DTR effect is related to survival processing. Experiment 1 replicated the effect using different encoding tasks, demonstrating that the effect is not dependent on the pleasantness task. In Experiment 2 the DTR effect was associated with increases in item-specific processing, not relational processing, according to several indices. Experiment 3 replicated the main results of Experiment 2, and tested the effects of mortality salience and survival processing within the same experiment. The DTR effect and its associated difference in item-specific processing were completely eliminated when the encoding task required survival processing. These results are consistent with the interpretation that the mechanisms responsible for survival processing and DTR effects are overlapping.

  1. A Basic Test Theory Generalizable to Tailored Testing. Technical Report No. 1.

    ERIC Educational Resources Information Center

    Cliff, Norman

    Measures of consistency and completeness of order relations derived from test-type data are proposed. The measures are generalized to apply to incomplete data such as tailored testing. The measures are based on consideration of the items-plus-persons by items-plus-persons matrix as an adjacency matrix in which a 1 means that the row element…

  2. Beneficial effects of semantic memory support on older adults' episodic memory: Differential patterns of support of item and associative information.

    PubMed

    Mohanty, Praggyan Pam; Naveh-Benjamin, Moshe; Ratneshwar, Srinivasan

    2016-02-01

    The effects of two types of semantic memory support-meaningfulness of an item and relatedness between items-in mitigating age-related deficits in item and associative, memory are examined in a marketing context. In Experiment 1, participants studied less (vs. more) meaningful brand logo graphics (pictures) paired with meaningful brand names (words) and later were assessed by item (old/new) and associative (intact/recombined) memory recognition tests. Results showed that meaningfulness of items eliminated age deficits in item memory, while equivalently boosting associative memory for older and younger adults. Experiment 2, in which related and unrelated brand logo graphics and brand name pairs served as stimuli, revealed that relatedness between items eliminated age deficits in associative memory, while improving to the same degree item memory in older and younger adults. Experiment 2 also provided evidence for a probable boundary condition that could reconcile seemingly contradictory extant results. Overall, these experiments provided evidence that although the two types of semantic memory support can improve both item and associative memory in older and younger adults, older adults' memory deficits can be eliminated when the type of support provided is compatible with the type of information required to perform well on the test. (c) 2016 APA, all rights reserved).

  3. Home Economics. Sample Test Items. Levels I and II.

    ERIC Educational Resources Information Center

    New York State Education Dept., Albany. Bureau of Elementary and Secondary Educational Testing.

    A sample of behavioral objectives and related test items that could be developed for content modules in Home Economics levels I and II, this book is intended to enable teachers to construct more valid and reliable test materials. Forty-eight one-page modules are presented, and opposite each module are listed two to seven specific behavioral…

  4. Investigating Linguistic Sources of Differential Item Functioning Using Expert Think-Aloud Protocols in Science Achievement Tests

    NASA Astrophysics Data System (ADS)

    Roth, Wolff-Michael; Oliveri, Maria Elena; Dallie Sandilands, Debra; Lyons-Thomas, Juliette; Ercikan, Kadriye

    2013-03-01

    Even if national and international assessments are designed to be comparable, subsequent psychometric analyses often reveal differential item functioning (DIF). Central to achieving comparability is to examine the presence of DIF, and if DIF is found, to investigate its sources to ensure differentially functioning items that do not lead to bias. In this study, sources of DIF were examined using think-aloud protocols. The think-aloud protocols of expert reviewers were conducted for comparing the English and French versions of 40 items previously identified as DIF (N = 20) and non-DIF (N = 20). Three highly trained and experienced experts in verifying and accepting/rejecting multi-lingual versions of curriculum and testing materials for government purposes participated in this study. Although there is a considerable amount of agreement in the identification of differentially functioning items, experts do not consistently identify and distinguish DIF and non-DIF items. Our analyses of the think-aloud protocols identified particular linguistic, general pedagogical, content-related, and cognitive factors related to sources of DIF. Implications are provided for the process of arriving at the identification of DIF, prior to the actual administration of tests at national and international levels.

  5. A Maximin Model for Test Design with Practical Constraints. Project Psychometric Aspects of Item Banking No. 25. Research Report 87-10.

    ERIC Educational Resources Information Center

    van der Linden, Wim J.; Boekkooi-Timminga, Ellen

    A "maximin" model for item response theory based test design is proposed. In this model only the relative shape of the target test information function is specified. It serves as a constraint subject to which a linear programming algorithm maximizes the information in the test. In the practice of test construction there may be several…

  6. Item Analyses of Memory Differences

    PubMed Central

    Salthouse, Timothy A.

    2017-01-01

    Objective Although performance on memory and other cognitive tests is usually assessed with a score aggregated across multiple items, potentially valuable information is also available at the level of individual items. Method The current study illustrates how analyses of variance with item as one of the factors, and memorability analyses in which item accuracy in one group is plotted as a function of item accuracy in another group, can provide a more detailed characterization of the nature of group differences in memory. Data are reported for two memory tasks, word recall and story memory, across age, ability, repetition, delay, and longitudinal contrasts. Results The item-level analyses revealed evidence for largely uniform differences across items in the age, ability, and longitudinal contrasts, but differential patterns across items in the repetition contrast, and unsystematic item relations in the delay contrast. Conclusion Analyses at the level of individual items have the potential to indicate the manner by which group differences in the aggregate test score are achieved. PMID:27618285

  7. The development of a science process assessment for fourth-grade students

    NASA Astrophysics Data System (ADS)

    Smith, Kathleen A.; Welliver, Paul W.

    In this study, a multiple-choice test entitled the Science Process Assessment was developed to measure the science process skills of students in grade four. Based on the Recommended Science Competency Continuum for Grades K to 6 for Pennsylvania Schools, this instrument measured the skills of (1) observing, (2) classifying, (3) inferring, (4) predicting, (5) measuring, (6) communicating, (7) using space/time relations, (8) defining operationally, (9) formulating hypotheses, (10) experimenting, (11) recognizing variables, (12) interpreting data, and (13) formulating models. To prepare the instrument, classroom teachers and science educators were invited to participate in two science education workshops designed to develop an item bank of test questions applicable to measuring process skill learning. Participants formed writing teams and generated 65 test items representing the 13 process skills. After a comprehensive group critique of each item, 61 items were identified for inclusion into the Science Process Assessment item bank. To establish content validity, the item bank was submitted to a select panel of science educators for the purpose of judging item acceptability. This analysis yielded 55 acceptable test items and produced the Science Process Assessment, Pilot 1. Pilot 1 was administered to 184 fourth-grade students. Students were given a copy of the test booklet; teachers read each test aloud to the students. Upon completion of this first administration, data from the item analysis yielded a reliability coefficient of 0.73. Subsequently, 40 test items were identified for the Science Process Assessment, Pilot 2. Using the test-retest method, the Science Process Assessment, Pilot 2 (Test 1 and Test 2) was administered to 113 fourth-grade students. Reliability coefficients of 0.80 and 0.82, respectively, were ascertained. The correlation between Test 1 and Test 2 was 0.77. The results of this study indicate that (1) the Science Process Assessment, Pilot 2, is a valid and reliable instrument applicable to measuring the science process skills of students in grade four, (2) using educational workshops as a means of developing item banks of test questions is viable and productive in the test development process, and (3) involving classroom teachers and science educators in the test development process is educationally efficient and effective.

  8. Contriving transitive conditioned establishing operations to establish derived manding skills in adults with severe developmental disabilities.

    PubMed

    Rosales, Rocio; Rehfeldt, Ruth Anne

    2007-01-01

    The purpose of this study was to demonstrate derived manding skills in 2 adults with severe developmental disabilities and language deficits by contriving transitive conditioned establishing operations. Specifically, we evaluated whether a history of reinforced conditional discrimination learning would ultimately result in a derived mand repertoire, in which participants manded for items that were needed to complete chained tasks. After mastering the first three phases of the picture exchange communication system (PECS), participants were taught to mand for the needed items by exchanging pictures of the items for the items themselves. They were then taught to conditionally relate the dictated names of the items to the corresponding pictures of the items and to relate the dictated names to the corresponding printed words. We then tested, in the absence of reinforcement, whether participants would mand for the items needed to complete the chained tasks using text rather than pictures. Both participants showed the emergence of derived mands and some derived stimulus relations as a result of this instruction. Some of the derived relations were shown to be intact at 1-month follow-up, and scores on derived mand probes were higher at follow-up than before training. In addition, the 2 participants vocally requested the needed items on maintenance test probes, a skill that was never trained and was not previously in their repertoires. These results suggest that a history of reinforced relational responding may facilitate the expansion of a number of verbal skills and emphasize the possibility of a synthesis of Skinner's (1957) analysis of verbal behavior and derived stimulus relations into language-training efforts for persons with significant disabilities.

  9. Development and Initial Validation of Military Deployment-Related TBI Quality-of-Life Item Banks.

    PubMed

    Toyinbo, Peter A; Vanderploeg, Rodney D; Donnell, Alison J; Mutolo, Sandra A; Cook, Karon F; Kisala, Pamela A; Tulsky, David S

    2016-01-01

    To investigate unique factors that affect health-related quality of life (QOL) in individuals with military deployment-related traumatic brain injury (MDR-TBI) and to develop appropriate assessment tools, consistent with the TBI-QOL/PROMIS/Neuro-QOL systems. Three focus groups from each of the 4 Veterans Administration (VA) Polytrauma Rehabilitation Centers, consisting of 20 veterans with mild to severe MDR-TBI, and 36 VA providers were involved in early stage of new item banks development. The item banks were field tested in a sample (N = 485) of veterans enrolled in VA and diagnosed with an MDR-TBI. Focus groups and survey. Developed item banks and short forms for Guilt, Posttraumatic Stress Disorder/Trauma, and Military-Related Loss. Three new item banks representing unique domains of MDR-TBI health outcomes were created: 15 new Posttraumatic Stress Disorder items plus 16 SCI-QOL legacy Trauma items, 37 new Military-Related Loss items plus 18 TBI-QOL legacy Grief/Loss items, and 33 new Guilt items. Exploratory and confirmatory factor analyses plus bifactor analysis of the items supported sufficient unidimensionality of the new item pools. Convergent and discriminant analyses results, as well as known group comparisons, provided initial support for the validity and clinical utility of the new item response theory-calibrated item banks and their short forms. This work provides a unique opportunity to identify issues specific to individuals with MDR-TBI and ensure that they are captured in QOL assessment, thus extending the existing TBI-QOL measurement system.

  10. Survey Development to Assess College Students' Perceptions of the Campus Environment.

    PubMed

    Sowers, Morgan F; Colby, Sarah; Greene, Geoffrey W; Pickett, Mackenzie; Franzen-Castle, Lisa; Olfert, Melissa D; Shelnutt, Karla; Brown, Onikia; Horacek, Tanya M; Kidd, Tandalayo; Kattelmann, Kendra K; White, Adrienne A; Zhou, Wenjun; Riggsbee, Kristin; Yan, Wangcheng; Byrd-Bredbenner, Carol

    2017-11-01

    We developed and tested a College Environmental Perceptions Survey (CEPS) to assess college students' perceptions of the healthfulness of their campus. CEPS was developed in 3 stages: questionnaire development, validity testing, and reliability testing. Questionnaire development was based on an extensive literature review and input from an expert panel to establish content validity. Face validity was established with the target population using cognitive interviews with 100 college students. Concurrent-criterion validity was established with in-depth interviews (N = 30) of college students compared to surveys completed by the same 30 students. Surveys completed by college students from 8 universities (N = 1147) were used to test internal structure (factor analysis) and internal consistency (Cronbach's alpha). After development and testing, 15 items remained from the original 48 items. A 5-factor solution emerged: physical activity (4 items, α = .635), water (3 items, α = .773), vending (2 items, α = .680), healthy food (2 items, α = .631), and policy (2 items, α = .573). The mean total score for all universities was 62.71 (±11.16) on a 100-point scale. CEPS appears to be a valid and reliable tool for assessing college students' perceptions of their health-related campus environment.

  11. Item usage in a multidimensional computerized adaptive test (MCAT) measuring health-related quality of life.

    PubMed

    Paap, Muirne C S; Kroeze, Karel A; Terwee, Caroline B; van der Palen, Job; Veldkamp, Bernard P

    2017-11-01

    Examining item usage is an important step in evaluating the performance of a computerized adaptive test (CAT). We study item usage for a newly developed multidimensional CAT which draws items from three PROMIS domains, as well as a disease-specific one. The multidimensional item bank used in the current study contained 194 items from four domains: the PROMIS domains fatigue, physical function, and ability to participate in social roles and activities, and a disease-specific domain (the COPD-SIB). The item bank was calibrated using the multidimensional graded response model and data of 795 patients with chronic obstructive pulmonary disease. To evaluate the item usage rates of all individual items in our item bank, CAT simulations were performed on responses generated based on a multivariate uniform distribution. The outcome variables included active bank size and item overuse (usage rate larger than the expected item usage rate). For average θ-values, the overall active bank size was 9-10%; this number quickly increased as θ-values became more extreme. For values of -2 and +2, the overall active bank size equaled 39-40%. There was 78% overlap between overused items and active bank size for average θ-values. For more extreme θ-values, the overused items made up a much smaller part of the active bank size: here the overlap was only 35%. Our results strengthen the claim that relatively short item banks may suffice when using polytomous items (and no content constraints/exposure control mechanisms), especially when using MCAT.

  12. Noncompetitive retrieval practice causes retrieval-induced forgetting in cued recall but not in recognition.

    PubMed

    Grundgeiger, Tobias

    2014-04-01

    Retrieving a subset of learned items can lead to the forgetting of related items. Such retrieval-induced forgetting (RIF) can be explained by the inhibition of irrelevant items in order to overcome retrieval competition when the target item is retrieved. According to the retrieval inhibition account, such retrieval competition is a necessary condition for RIF. However, research has indicated that noncompetitive retrieval practice can also cause RIF by strengthening cue-item associations. According to the strength-dependent competition account, the strengthened items interfere with the retrieval of weaker items, resulting in impaired recall of weaker items in the final memory test. The aim of this study was to replicate RIF caused by noncompetitive retrieval practice and to determine whether this forgetting is also observed in recognition tests. In the context of RIF, it has been assumed that recognition tests circumvent interference and, therefore, should not be sensitive to forgetting due to strength-dependent competition. However, this has not been empirically tested, and it has been suggested that participants may reinstate learned cues as retrieval aids during the final test. In the present experiments, competitive practice or noncompetitive practice was followed by either final cued-recall tests or recognition tests. In cued-recall tests, RIF was observed in both competitive and noncompetitive conditions. However, in recognition tests, RIF was observed only in the competitive condition and was absent in the noncompetitive condition. The result underscores the contribution of strength-dependent competition to RIF. However, recognition tests seem to be a reliable way of distinguishing between RIF due to retrieval inhibition or strength-dependent competition.

  13. Medial Temporal Lobe Contributions to Cued Retrieval of Items and Contexts

    PubMed Central

    Hannula, Deborah E.; Libby, Laura A.; Yonelinas, Andrew P.; Ranganath, Charan

    2013-01-01

    Several models have proposed that different regions of the medial temporal lobes contribute to different aspects of episodic memory. For instance, according to one view, the perirhinal cortex represents specific items, parahippocampal cortex represents information regarding the context in which these items were encountered, and the hippocampus represents item-context bindings. Here, we used event-related functional magnetic resonance imaging (fMRI) to test a specific prediction of this model – namely, that successful retrieval of items from context cues will elicit perirhinal recruitment and that successful retrieval of contexts from item cues will elicit parahippocampal cortex recruitment. Retrieval of the bound representation in either case was expected to elicit hippocampal engagement. To test these predictions, we had participants study several item-context pairs (i.e., pictures of objects and scenes, respectively), and then had them attempt to recall items from associated context cues and contexts from associated item cues during a scanned retrieval session. Results based on both univariate and multivariate analyses confirmed a role for hippocampus in content-general relational memory retrieval, and a role for parahippocampal cortex in successful retrieval of contexts from item cues. However, we also found that activity differences in perirhinal cortex were correlated with successful cued recall for both items and contexts. These findings provide partial support for the above predictions and are discussed with respect to several models of medial temporal lobe function. PMID:23466350

  14. Project Physics Tests 1, Concepts of Motion.

    ERIC Educational Resources Information Center

    Harvard Univ., Cambridge, MA. Harvard Project Physics.

    Test items relating to Project Physics Unit 1 are presented in this booklet, consisting of 70 multiple-choice and 20 problem-and-essay questions. Concepts of motion are examined with respect to velocities, acceleration, forces, vectors, Newton's laws, and circular motion. Suggestions are made for time consumption in answering some items. Besides…

  15. General Metals: Grades 7-12.

    ERIC Educational Resources Information Center

    Instructional Objectives Exchange, Los Angeles, CA.

    Ninety objectives and related test items for use in grades 7 through 12 are presented. Each sample contains an objective, test items, and criteria for judging the adequacy of the response. Objectives are organized into the following categories: (1) property of metals; (2) operations and functions; (3) cutting and shearing; (4) filing; (5) cutting…

  16. Fundamentals of Marketing Core Curriculum. Test Items and Assessment Techniques.

    ERIC Educational Resources Information Center

    Smith, Clifton L.; And Others

    This document contains multiple choice test items and assessment techniques for Missouri's fundamentals of marketing core curriculum. The core curriculum is divided into these nine occupational duties: (1) communications in marketing; (2) economics and marketing; (3) employment and advancement; (4) human relations in marketing; (5) marketing…

  17. U.S. History: Grades 7-9. Revised Edition.

    ERIC Educational Resources Information Center

    Instructional Objectives Exchange, Los Angeles, CA.

    Sixty-three behavioral objectives and related test items for United States history in grades seven through nine are presented. Each sample contains the objective, sample test items and directions, and criteria for judging the adequacy of student responses. Fourteen of the 15 categories are content oriented and presented chronologically: (1)…

  18. Aerobic fitness and executive control of relational memory in preadolescent children.

    PubMed

    Chaddock, Laura; Hillman, Charles H; Buck, Sarah M; Cohen, Neal J

    2011-02-01

    the neurocognitive benefits of an active lifestyle in childhood have public health and educational implications, especially as children in today's technological society are becoming increasingly overweight, unhealthy, and unfit. Human and animal studies show that aerobic exercise affects both prefrontal executive control and hippocampal function. This investigation attempts to bridge these research threads by using a cognitive task to examine the relationship between aerobic fitness and executive control of relational memory in preadolescent 9- and 10-yr-old children. higher-fit and lower-fit children studied faces and houses under individual item (i.e., nonrelational) and relational encoding conditions, and the children were subsequently tested with recognition memory trials consisting of previously studied pairs and pairs of completely new items. With each subject participating in both item and relational encoding conditions, and with recognition test trials amenable to the use of both item and relational memory cues, this task afforded a challenge to the flexible use of memory, specifically in the use of appropriate encoding and retrieval strategies. Hence, the task provided a test of both executive control and memory processes. lower-fit children showed poorer recognition memory performance than higher-fit children, selectively in the relational encoding condition. No association between aerobic fitness and recognition performance was found for faces and houses studied as individual items (i.e., nonrelationally). the findings implicate childhood aerobic fitness as a factor in the ability to use effective encoding and retrieval executive control processes for relational memory material and, possibly, in the strategic engagement of prefrontal- and hippocampal-dependent systems.

  19. Procedures to develop a computerized adaptive test to assess patient-reported physical functioning.

    PubMed

    McCabe, Erin; Gross, Douglas P; Bulut, Okan

    2018-06-07

    The purpose of this paper is to demonstrate the procedures to develop and implement a computerized adaptive patient-reported outcome (PRO) measure using secondary analysis of a dataset and items from fixed-format legacy measures. We conducted secondary analysis of a dataset of responses from 1429 persons with work-related lower extremity impairment. We calibrated three measures of physical functioning on the same metric, based on item response theory (IRT). We evaluated efficiency and measurement precision of various computerized adaptive test (CAT) designs using computer simulations. IRT and confirmatory factor analyses support combining the items from the three scales for a CAT item bank of 31 items. The item parameters for IRT were calculated using the generalized partial credit model. CAT simulations show that reducing the test length from the full 31 items to a maximum test length of 8 items, or 20 items is possible without a significant loss of information (95, 99% correlation with legacy measure scores). We demonstrated feasibility and efficiency of using CAT for PRO measurement of physical functioning. The procedures we outlined are straightforward, and can be applied to other PRO measures. Additionally, we have included all the information necessary to implement the CAT of physical functioning in the electronic supplementary material of this paper.

  20. Emotional competencies in geriatric nursing: empirical evidence from a computer based large scale assessment calibration study.

    PubMed

    Kaspar, Roman; Hartig, Johannes

    2016-03-01

    The care of older people was described as involving substantial emotion-related affordances. Scholars in vocational training and nursing disagree whether emotion-related skills could be conceptualized and assessed as a professional competence. Studies on emotion work and empathy regularly neglect the multidimensionality of these phenomena and their relation to the care process, and are rarely conclusive with respect to nursing behavior in practice. To test the status of emotion-related skills as a facet of client-directed geriatric nursing competence, 402 final-year nursing students from 24 German schools responded to a 62-item computer-based test. 14 items were developed to represent emotion-related affordances. Multi-dimensional IRT modeling was employed to assess a potential subdomain structure. Emotion-related test items did not form a separate subdomain, and were found to be discriminating across the whole competence continuum. Tasks concerning emotion work and empathy are reliable indicators for various levels of client-directed nursing competence. Claims for a distinct emotion-related competence in geriatric nursing, however, appear excessive with a process-oriented perspective.

  1. Measuring Ability, Speed, or Both? Challenges, Psychometric Solutions, and What Can Be Gained from Experimental Control

    ERIC Educational Resources Information Center

    Goldhammer, Frank

    2015-01-01

    The main challenge of ability tests relates to the difficulty of items, whereas speed tests demand that test takers complete very easy items quickly. This article proposes a conceptual framework to represent how performance depends on both between-person differences in speed and ability and the speed-ability compromise within persons. Related…

  2. Capuchin monkeys (Cebus apella) treat small and large numbers of items similarly during a relative quantity judgment task.

    PubMed

    Beran, Michael J; Parrish, Audrey E

    2016-08-01

    A key issue in understanding the evolutionary and developmental emergence of numerical cognition is to learn what mechanism(s) support perception and representation of quantitative information. Two such systems have been proposed, one for dealing with approximate representation of sets of items across an extended numerical range and another for highly precise representation of only small numbers of items. Evidence for the first system is abundant across species and in many tests with human adults and children, whereas the second system is primarily evident in research with children and in some tests with non-human animals. A recent paper (Choo & Franconeri, Psychonomic Bulletin & Review, 21, 93-99, 2014) with adult humans also reported "superprecise" representation of small sets of items in comparison to large sets of items, which would provide more support for the presence of a second system in human adults. We first presented capuchin monkeys with a test similar to that of Choo and Franconeri in which small or large sets with the same ratios had to be discriminated. We then presented the same monkeys with an expanded range of comparisons in the small number range (all comparisons of 1-9 items) and the large number range (all comparisons of 10-90 items in 10-item increments). Capuchin monkeys showed no increased precision for small over large sets in making these discriminations in either experiment. These data indicate a difference in the performance of monkeys to that of adult humans, and specifically that monkeys do not show improved discrimination performance for small sets relative to large sets when the relative numerical differences are held constant.

  3. Development of a cross-cultural item bank for measuring quality of life related to mental health in multiple sclerosis patients.

    PubMed

    Michel, Pierre; Auquier, Pascal; Baumstarck, Karine; Pelletier, Jean; Loundou, Anderson; Ghattas, Badih; Boyer, Laurent

    2015-09-01

    Quality of life (QoL) measurements are considered important outcome measures both for research on multiple sclerosis (MS) and in clinical practice. Computerized adaptive testing (CAT) can improve the precision of measurements made using QoL instruments while reducing the burden of testing on patients. Moreover, a cross-cultural approach is also necessary to guarantee the wide applicability of CAT. The aim of this preliminary study was to develop a calibrated item bank that is available in multiple languages and measures QoL related to mental health by combining one generic (SF-36) and one disease-specific questionnaire (MusiQoL). Patients with MS were enrolled in this international, multicenter, cross-sectional study. The psychometric properties of the item bank were based on classical test and item response theories and approaches, including the evaluation of unidimensionality, item response theory model fitting, and analyses of differential item functioning (DIF). Convergent and discriminant validities of the item bank were examined according to socio-demographic, clinical, and QoL features. A total of 1992 patients with MS and from 15 countries were enrolled in this study to calibrate the 22-item bank developed in this study. The strict monotonicity of the Cronbach's alpha curve, the high eigenvalue ratio estimator (5.50), and the adequate CFA model fit (RMSEA = 0.07 and CFI = 0.95) indicated that a strong assumption of unidimensionality was warranted. The infit mean square statistic ranged from 0.76 to 1.27, indicating a satisfactory item fit. DIF analyses revealed no item biases across geographical areas, confirming the cross-cultural equivalence of the item bank. External validity testing revealed that the item bank scores correlated significantly with QoL scores but also showed discriminant validity for socio-demographic and clinical characteristics. This work demonstrated satisfactory psychometric characteristics for a QoL item bank for MS in multiple languages. This work may offer a common measure for the assessment of QoL in different cultural contexts and for international studies conducted on MS.

  4. Validity of Computer Adaptive Tests of Daily Routines for Youth with Spinal Cord Injury

    PubMed Central

    Haley, Stephen M.

    2013-01-01

    Objective: To evaluate the accuracy of computer adaptive tests (CATs) of daily routines for child- and parent-reported outcomes following pediatric spinal cord injury (SCI) and to evaluate the validity of the scales. Methods: One hundred ninety-six daily routine items were administered to 381 youths and 322 parents. Pearson correlations, intraclass correlation coefficients (ICC), and 95% confidence intervals (CI) were calculated to evaluate the accuracy of simulated 5-item, 10-item, and 15-item CATs against the full-item banks and to evaluate concurrent validity. Independent samples t tests and analysis of variance were used to evaluate the ability of the daily routine scales to discriminate between children with tetraplegia and paraplegia and among 5 motor groups. Results: ICC and 95% CI demonstrated that simulated 5-, 10-, and 15-item CATs accurately represented the full-item banks for both child- and parent-report scales. The daily routine scales demonstrated discriminative validity, except between 2 motor groups of children with paraplegia. Concurrent validity of the daily routine scales was demonstrated through significant relationships with the FIM scores. Conclusion: Child- and parent-reported outcomes of daily routines can be obtained using CATs with the same relative precision of a full-item bank. Five-item, 10-item, and 15-item CATs have discriminative and concurrent validity. PMID:23671380

  5. Test-retest reliability and construct validity of the ENERGY-parent questionnaire on parenting practices, energy balance-related behaviours and their potential behavioural determinants: the ENERGY-project.

    PubMed

    Singh, Amika S; Chinapaw, Mai J M; Uijtdewilligen, Léonie; Vik, Froydis N; van Lippevelde, Wendy; Fernández-Alvira, Juan M; Stomfai, Sarolta; Manios, Yannis; van der Sluijs, Maria; Terwee, Caroline; Brug, Johannes

    2012-08-13

    Insight in parental energy balance-related behaviours, their determinants and parenting practices are important to inform childhood obesity prevention. Therefore, reliable and valid tools to measure these variables in large-scale population research are needed. The objective of the current study was to examine the test-retest reliability and construct validity of the parent questionnaire used in the ENERGY-project, assessing parental energy balance-related behaviours, their determinants, and parenting practices among parents of 10-12 year old children. We collected data among parents (n = 316 in the test-retest reliability study; n = 109 in the construct validity study) of 10-12 year-old children in six European countries, i.e. Belgium, Greece, Hungary, the Netherlands, Norway, and Spain. Test-retest reliability was assessed using the intra-class correlation coefficient (ICC) and percentage agreement comparing scores from two measurements, administered one week apart. To assess construct validity, the agreement between questionnaire responses and a subsequent interview was assessed using ICC and percentage agreement.All but one item showed good to excellent test-retest reliability as indicated by ICCs > .60 or percentage agreement ≥ 75%. Construct validity appeared to be good to excellent for 92 out of 121 items, as indicated by ICCs > .60 or percentage agreement ≥ 75%. From the other 29 items, construct validity was moderate for 24 and poor for 5 items. The reliability and construct validity of the items of the ENERGY-parent questionnaire on multiple energy balance-related behaviours, their potential determinants, and parenting practices appears to be good. Based on the results of the validity study, we strongly recommend adapting parts of the ENERGY-parent questionnaire if used in future research.

  6. Evaluation of psychometric properties and differential item functioning of 8-item Child Perceptions Questionnaires using item response theory.

    PubMed

    Yau, David T W; Wong, May C M; Lam, K F; McGrath, Colman

    2015-08-19

    Four-factor structure of the two 8-item short forms of Child Perceptions Questionnaire CPQ11-14 (RSF:8 and ISF:8) has been confirmed. However, the sum scores are typically reported in practice as a proxy of Oral health-related Quality of Life (OHRQoL), which implied a unidimensional structure. This study first assessed the unidimensionality of 8-item short forms of CPQ11-14. Item response theory (IRT) was employed to offer an alternative and complementary approach of validation and to overcome the limitations of classical test theory assumptions. A random sample of 649 12-year-old school children in Hong Kong was analyzed. Unidimensionality of the scale was tested by confirmatory factor analysis (CFA), principle component analysis (PCA) and local dependency (LD) statistic. Graded response model was fitted to the data. Contribution of each item to the scale was assessed by item information function (IIF). Reliability of the scale was assessed by test information function (TIF). Differential item functioning (DIF) across gender was identified by Wald test and expected score functions. Both CPQ11-14 RSF:8 and ISF:8 did not deviate much from the unidimensionality assumption. Results from CFA indicated acceptable fit of the one-factor model. PCA indicated that the first principle component explained >30 % of the total variation with high factor loadings for both RSF:8 and ISF:8. Almost all LD statistic <10 indicated the absence of local dependency. Flat and low IIFs were observed in the oral symptoms items suggesting little contribution of information to the scale and item removal caused little practical impact. Comparing the TIFs, RSF:8 showed slightly better information than ISF:8. In addition to oral symptoms items, the item "Concerned with what other people think" demonstrated a uniform DIF (p < 0.001). The expected score functions were not much different between boys and girls. Items related to oral symptoms were not informative to OHRQoL and deletion of these items is suggested. The impact of DIF across gender on the overall score was minimal. CPQ11-14 RSF:8 performed slightly better than ISF:8 in measurement precision. The 6-item short forms suggested by IRT validation should be further investigated to ensure their robustness, responsiveness and discriminative performance.

  7. Diabetes knowledge in nursing homes and home-based care services: a validation study of the Michigan Diabetes Knowledge Test adapted for use among nursing personnel.

    PubMed

    Haugstvedt, Anne; Aarflot, Morten; Igland, Jannicke; Landbakk, Tilla; Graue, Marit

    2016-01-01

    Providing high-quality diabetes care in nursing homes and home-based care facilities requires suitable instruments to evaluate the level of diabetes knowledge among the health-care providers. Thus, the aim of this study was to examine the psychometric properties of the Michigan Diabetes Knowledge Test adapted for use among nursing personnel. The study included 127 nursing personnel (32 registered nurses, 69 nursing aides and 26 nursing assistants) at three nursing homes and one home-based care facility in Norway. We examined the reliability and content and construct validity of the Michigan Diabetes Knowledge Test. The items in both the general diabetes subscale and the insulin-use subscale were considered relevant and appropriate. The instrument showed satisfactory properties for distinguishing between groups. Item response theory-based measurements and item information curves indicate maximum information at average or lower knowledge scores. Internal consistency and the item-total correlations were quite weak, indicating that the Michigan Diabetes Knowledge Test measures a set of items related to various relevant knowledge topics but not necessarily related to each other. The Michigan Diabetes Knowledge Test measures a broad range of topics relevant to diabetes care. It is an appropriate instrument for identifying individual and distinct needs for diabetes education among nursing personnel. The knowledge gaps identified by the Michigan Diabetes Knowledge Test could also provide useful input for the content of educational activities. However, some revision of the test should be considered.

  8. Emotional Intelligence in Applicant Selection for Care-Related Academic Programs

    ERIC Educational Resources Information Center

    Zysberg, Leehu; Levy, Anat; Zisberg, Anna

    2011-01-01

    Two studies describe the development of the Audiovisual Test of Emotional Intelligence (AVEI), aimed at candidate selection in educational settings. Study I depicts the construction of the test and the preliminary examination of its psychometric properties in a sample of 92 college students. Item analysis allowed the modification of problem items,…

  9. U.S. History: Grades 10-12. Revised Edition.

    ERIC Educational Resources Information Center

    Instructional Objectives Exchange, Los Angeles, CA.

    Seventy-seven behavioral objectives and related test items for United States history in grades 10 through 12 are presented. Each sample contains the objective, sample test items, and criteria for judging the adequacy of student responses. Fourteen of the 15 categories are content-oriented, and presented in chronological groups: (1) discovery of…

  10. Advancing the efficiency and efficacy of patient reported outcomes with multivariate computer adaptive testing.

    PubMed

    Morris, Scott; Bass, Mike; Lee, Mirinae; Neapolitan, Richard E

    2017-09-01

    The Patient Reported Outcomes Measurement Information System (PROMIS) initiative developed an array of patient reported outcome (PRO) measures. To reduce the number of questions administered, PROMIS utilizes unidimensional item response theory and unidimensional computer adaptive testing (UCAT), which means a separate set of questions is administered for each measured trait. Multidimensional item response theory (MIRT) and multidimensional computer adaptive testing (MCAT) simultaneously assess correlated traits. The objective was to investigate the extent to which MCAT reduces patient burden relative to UCAT in the case of PROs. One MIRT and 3 unidimensional item response theory models were developed using the related traits anxiety, depression, and anger. Using these models, MCAT and UCAT performance was compared with simulated individuals. Surprisingly, the root mean squared error for both methods increased with the number of items. These results were driven by large errors for individuals with low trait levels. A second analysis focused on individuals aligned with item content. For these individuals, both MCAT and UCAT accuracies improved with additional items. Furthermore, MCAT reduced the test length by 50%. For the PROMIS Emotional Distress banks, neither UCAT nor MCAT provided accurate estimates for individuals at low trait levels. Because the items in these banks were designed to detect clinical levels of distress, there is little information for individuals with low trait values. However, trait estimates for individuals targeted by the banks were accurate and MCAT asked substantially fewer questions. By reducing the number of items administered, MCAT can allow clinicians and researchers to assess a wider range of PROs with less patient burden. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  11. Developing a Questionnaire to Evaluate College Students' Knowledge, Attitude, Behavior, Self-efficacy, and Environmental Factors Related to Canned Foods.

    PubMed

    Richards, Rickelle; Brown, Lora Beth; Williams, D Pauline; Eggett, Dennis L

    2017-02-01

    Develop a questionnaire to measure students' knowledge, attitude, behavior, self-efficacy, and environmental factors related to the use of canned foods. The Knowledge-Attitude-Behavior Model, Social Cognitive Theory, and Canned Foods Alliance survey were used as frameworks for questionnaire development. Cognitive interviews were conducted with college students (n = 8). Nutrition and survey experts assessed content validity. Reliability was measured via Cronbach α and 2 rounds (1, n = 81; 2, n = 65) of test-retest statistics. Means and frequencies were used. The 65-item questionnaire had a test-retest reliability of .69. Cronbach α scores were .87 for knowledge (9 items), .86 for attitude (30 items), .80 for self-efficacy (12 items), .68 for canned foods use (8 items), and .30 for environment (6 items). A reliable questionnaire was developed to measure perceptions and use of canned foods. Nutrition educators may find this questionnaire useful to evaluate pretest-posttest changes from canned foods-based interventions among college students. Copyright © 2016 Society for Nutrition Education and Behavior. Published by Elsevier Inc. All rights reserved.

  12. Innovative testing of spatial ability: interactive responding and the use of complex stimuli material.

    PubMed

    Jelínek, Martin; Květon, Petr; Vobořil, Dalibor

    2015-02-01

    Despite initial expectations, which have emerged with the advancement of computer technology over the last decade of the twentieth century, scientific literature does not contain many relevant references regarding the development and use of innovative items in psychological testing. Our study presents and evaluates two novel item types. One item type is derived from a standard schematic test item used for the assessment of the spatial perception aspect of spatial ability, enhanced by an interactive response module. The performance on this item type is correlated with the performance on its paper and pencil counterpart. The other innovative item type used complex stimuli in the form of a short video of a ride through a city presented in an on-route perspective, which is intended to measure navigation skills and the ability to keep oneself oriented in space. In this case, the scores were related to the capacity of visuo-spatial working memory and also to the overall score in the paper/pencil test of spatial ability. The second relationship was moderated by gender.

  13. Developing multiple-choices test items as tools for measuring the scientific-generic skills on solar system

    NASA Astrophysics Data System (ADS)

    Bhakti, Satria Seto; Samsudin, Achmad; Chandra, Didi Teguh; Siahaan, Parsaoran

    2017-05-01

    The aim of research is developing multiple-choices test items as tools for measuring the scientific of generic skills on solar system. To achieve the aim that the researchers used the ADDIE model consisting Of: Analyzing, Design, Development, Implementation, dan Evaluation, all of this as a method research. While The scientific of generic skills limited research to five indicator including: (1) indirect observation, (2) awareness of the scale, (3) inference logic, (4) a causal relation, and (5) mathematical modeling. The participants are 32 students at one of junior high schools in Bandung. The result shown that multiple-choices that are constructed test items have been declared valid by the expert validator, and after the tests show that the matter of developing multiple-choices test items be able to measuring the scientific of generic skills on solar system.

  14. Medial temporal lobe contributions to cued retrieval of items and contexts.

    PubMed

    Hannula, Deborah E; Libby, Laura A; Yonelinas, Andrew P; Ranganath, Charan

    2013-10-01

    Several models have proposed that different regions of the medial temporal lobes contribute to different aspects of episodic memory. For instance, according to one view, the perirhinal cortex represents specific items, parahippocampal cortex represents information regarding the context in which these items were encountered, and the hippocampus represents item-context bindings. Here, we used event-related functional magnetic resonance imaging (fMRI) to test a specific prediction of this model-namely, that successful retrieval of items from context cues will elicit perirhinal recruitment and that successful retrieval of contexts from item cues will elicit parahippocampal cortex recruitment. Retrieval of the bound representation in either case was expected to elicit hippocampal engagement. To test these predictions, we had participants study several item-context pairs (i.e., pictures of objects and scenes, respectively), and then had them attempt to recall items from associated context cues and contexts from associated item cues during a scanned retrieval session. Results based on both univariate and multivariate analyses confirmed a role for hippocampus in content-general relational memory retrieval, and a role for parahippocampal cortex in successful retrieval of contexts from item cues. However, we also found that activity differences in perirhinal cortex were correlated with successful cued recall for both items and contexts. These findings provide partial support for the above predictions and are discussed with respect to several models of medial temporal lobe function. Copyright © 2013 Elsevier Ltd. All rights reserved.

  15. Social desirability in personality inventories: Symptoms, diagnosis and prescribed cure

    PubMed Central

    Bäckström, Martin; Björklund, Fredrik

    2013-01-01

    An analysis of social desirability in personality assessment is presented. Starting with the symptoms, Study 1 showed that mean ratings of graded personality items are moderately to strongly linearly related to social desirability (Self Deception, Impression formation, and the first Principal Component), suggesting that item popularity may be a useful heuristic tool for identifying items which elicit socially desirable responding. We diagnose the cause of socially desirable responding as an interaction between the evaluative content of the item and enhancement motivation in the rater. Study 2 introduced a possible cure; evaluative neutralization of items. To test the feasibility of the method lay psychometricians (undergraduates) reformulated existing personality test items according to written instructions. The new items were indeed lower in social desirability while essentially retaining the five factor structure and reliability of the inventory. We conclude that although neutralization is no miracle cure, it is simple and has beneficial effects. PMID:23252410

  16. Test-retest stability of the Task and Ego Orientation Questionnaire.

    PubMed

    Lane, Andrew M; Nevill, Alan M; Bowes, Neal; Fox, Kenneth R

    2005-09-01

    Establishing stability, defined as observing minimal measurement error in a test-retest assessment, is vital to validating psychometric tools. Correlational methods, such as Pearson product-moment, intraclass, and kappa are tests of association or consistency, whereas stability or reproducibility (regarded here as synonymous) assesses the agreement between test-retest scores. Indexes of reproducibility using the Task and Ego Orientation in Sport Questionnaire (TEOSQ; Duda & Nicholls, 1992) were investigated using correlational (Pearson product-moment, intraclass, and kappa) methods, repeated measures multivariate analysis of variance, and calculating the proportion of agreement within a referent value of +/-1 as suggested by Nevill, Lane, Kilgour, Bowes, and Whyte (2001). Two hundred thirteen soccer players completed the TEOSQ on two occasions, 1 week apart. Correlation analyses indicated a stronger test-retest correlation for the Ego subscale than the Task subscale. Multivariate analysis of variance indicated stability for ego items but with significant increases in four task items. The proportion of test-retest agreement scores indicated that all ego items reported relatively poor stability statistics with test-retest scores within a range of +/-1, ranging from 82.7-86.9%. By contrast, all task items showed test-retest difference scores ranging from 92.5-99%, although further analysis indicated that four task subscale items increased significantly. Findings illustrated that correlational methods (Pearson product-moment, intraclass, and kappa) are influenced by the range in scores, and calculating the proportion of agreement of test-retest differences with a referent value of +/-1 could provide additional insight into the stability of the questionnaire. It is suggested that the item-by-item proportion of agreement method proposed by Nevill et al. (2001) should be used to supplement existing methods and could be especially helpful in identifying rogue items in the initial stages of psychometric questionnaire validation.

  17. The influence of strategic encoding on false memory in patients with mild cognitive impairment and Alzheimer's disease dementia.

    PubMed

    Tat, Michelle J; Soonsawat, Anothai; Nagle, Corinne B; Deason, Rebecca G; O'Connor, Maureen K; Budson, Andrew E

    2016-11-01

    Patients with Alzheimer's disease (AD) dementia exhibit high rates of memory distortions in addition to their impairments in episodic memory. Several investigations have demonstrated that when healthy individuals (young and old) engaged in an encoding strategy that emphasized the uniqueness of study items (an item-specific encoding strategy), they were able to improve their discrimination between old items and unstudied critical lure items in a false memory task. In the present study we examined if patients with AD could also improve their memory discrimination when engaging in an item-specific encoding strategy. Healthy older adult controls, patients with mild cognitive impairment (MCI) due to AD, and patients with mild AD dementia were asked to study lists of categorized words. In the Item-Specific condition, participants were asked to provide a unique detail or personal experience with each study item. In the Relational condition, they were asked to determine how each item in the list was related to the others. To assess the influence of both strategies, recall and recognition memory tests were administered. Overall, both patient groups exhibited poorer memory in both recall and recognition tests compared to controls. In terms of recognition, healthy older controls and patients with MCI due to AD exhibited improved memory discrimination in the Item-Specific condition compared to the Relational condition, whereas patients with AD dementia did not. We speculate that patients with MCI due to AD use intact frontal networks to effectively engage in this strategy. Published by Elsevier Inc.

  18. The influence of strategic encoding on false memory in patients with mild cognitive impairment and Alzheimer’s disease dementia

    PubMed Central

    Tat, Michelle J.; Soonsawat, Anothai; Nagle, Corinne B.; Deason, Rebecca G.; O’Connor, Maureen K.; Budson, Andrew E.

    2018-01-01

    Patients with Alzheimer’s disease (AD) dementia exhibit high rates of memory distortions in addition to their impairments in episodic memory. Several investigations have demonstrated that when healthy individuals (young and old) engaged in an encoding strategy that emphasized the uniqueness of study items (an item-specific encoding strategy), they were able to improve their discrimination between old items and unstudied critical lure items in a false memory task. In the present study we examined if patients with AD could also improve their memory discrimination when engaging in an item-specific encoding strategy. Healthy older adult controls, patients with mild cognitive impairment (MCI) due to AD, and patients with mild AD dementia were asked to study lists of categorized words. In the Item-Specific condition, participants were asked to provide a unique detail or personal experience with each study item. In the Relational condition, they were asked to determine how each item in the list was related to the others. To assess the influence of both strategies, recall and recognition memory tests were administered. Overall, both patient groups exhibited poorer memory in both recall and recognition tests compared to controls. In terms of recognition, healthy older controls and patients with MCI due to AD exhibited improved memory discrimination in the Item-Specific condition compared to the Relational condition, whereas patients with AD dementia did not. We speculate that patients with MCI due to AD use intact frontal networks to effectively engage in this strategy. PMID:27643951

  19. The missing link? Testing a schema account of unitization.

    PubMed

    Tibon, Roni; Greve, Andrea; Henson, Richard

    2018-05-09

    Unitization refers to the creation of a new unit from previously distinct items. The concept of unitization has been used to explain how novel pairings between items can be remembered without requiring recollection, by virtue of new, item-like representations that enable familiarity-based retrieval. We tested an alternative account of unitization - a schema account - which suggests that associations between items can be rapidly assimilated into a schema. We used a common operationalization of "unitization" as the difference between two unrelated words being linked by a definition, relative to two words being linked by a sentence, during an initial study phase. During the following relearning phase, a studied word was re-paired with a new word, either related or unrelated to the original associate from study. In a final test phase, memory for the relearned associations was tested. We hypothesized that, if unitized representations act like schemas, then we would observe some generalization to related words, such that memory would be better in the definition than sentence condition for related words, but not for unrelated words. Contrary to the schema hypothesis, evidence favored the null hypothesis of no difference between definition and sentence conditions for related words (Experiment 1), even when each cue was associated with multiple associates, indicating that the associations can be generalized (Experiment 2), or when the schematic information was explicitly re-activated during Relearning (Experiment 3). These results suggest that unitized associations do not generalize to accommodate new information, and therefore provide evidence against the schema account.

  20. Development and initial validation of the appropriate antibiotic use self-efficacy scale.

    PubMed

    Hill, Erin M; Watkins, Kaitlin

    2018-06-04

    While there are various medication self-efficacy scales that exist, none assess self-efficacy for appropriate antibiotic use. The Appropriate Antibiotic Use Self-Efficacy Scale (AAUSES) was developed, pilot tested, and its psychometric properties were examined. Following pilot testing of the scale, a 28-item questionnaire was examined using a sample (n = 289) recruited through the Amazon Mechanical Turk platform. Participants also completed other scales and items, which were used in assessing discriminant, convergent, and criterion-related validity. Test-retest reliability was also examined. After examining the scale and removing items that did not assess appropriate antibiotic use, an exploratory factor analysis was conducted on 13 items from the original scale. Three factors were retained that explained 65.51% of the variance. The scale and its subscales had adequate internal consistency. The scale had excellent test-retest reliability, as well as demonstrated convergent, discriminant, and criterion-related validity. The AAUSES is a valid and reliable scale that assesses three domains of appropriate antibiotic use self-efficacy. The AAUSES may have utility in clinical and research settings in understanding individuals' beliefs about appropriate antibiotic use and related behavioral correlates. Future research is needed to examine the scale's utility in these settings. Copyright © 2018 Elsevier B.V. All rights reserved.

  1. Multiple, correlated covariates associated with differential item functioning (DIF): Accounting for language DIF when education levels differ across languages.

    PubMed

    Gibbons, Laura E; Crane, Paul K; Mehta, Kala M; Pedraza, Otto; Tang, Yuxiao; Manly, Jennifer J; Narasimhalu, Kaavya; Teresi, Jeanne; Jones, Richard N; Mungas, Dan

    2011-04-28

    Differential item functioning (DIF) occurs when a test item has different statistical properties in subgroups, controlling for the underlying ability measured by the test. DIF assessment is necessary when evaluating measurement bias in tests used across different language groups. However, other factors such as educational attainment can differ across language groups, and DIF due to these other factors may also exist. How to conduct DIF analyses in the presence of multiple, correlated factors remains largely unexplored. This study assessed DIF related to Spanish versus English language in a 44-item object naming test. Data come from a community-based sample of 1,755 Spanish- and English-speaking older adults. We compared simultaneous accounting, a new strategy for handling differences in educational attainment across language groups, with existing methods. Compared to other methods, simultaneously accounting for language- and education-related DIF yielded salient differences in some object naming scores, particularly for Spanish speakers with at least 9 years of education. Accounting for factors that vary across language groups can be important when assessing language DIF. The use of simultaneous accounting will be relevant to other cross-cultural studies in cognition and in other fields, including health-related quality of life.

  2. Multiple, correlated covariates associated with differential item functioning (DIF): Accounting for language DIF when education levels differ across languages

    PubMed Central

    Gibbons, Laura E.; Crane, Paul K.; Mehta, Kala M.; Pedraza, Otto; Tang, Yuxiao; Manly, Jennifer J.; Narasimhalu, Kaavya; Teresi, Jeanne; Jones, Richard N.; Mungas, Dan

    2012-01-01

    Differential item functioning (DIF) occurs when a test item has different statistical properties in subgroups, controlling for the underlying ability measured by the test. DIF assessment is necessary when evaluating measurement bias in tests used across different language groups. However, other factors such as educational attainment can differ across language groups, and DIF due to these other factors may also exist. How to conduct DIF analyses in the presence of multiple, correlated factors remains largely unexplored. This study assessed DIF related to Spanish versus English language in a 44-item object naming test. Data come from a community-based sample of 1,755 Spanish- and English-speaking older adults. We compared simultaneous accounting, a new strategy for handling differences in educational attainment across language groups, with existing methods. Compared to other methods, simultaneously accounting for language- and education-related DIF yielded salient differences in some object naming scores, particularly for Spanish speakers with at least 9 years of education. Accounting for factors that vary across language groups can be important when assessing language DIF. The use of simultaneous accounting will be relevant to other cross-cultural studies in cognition and in other fields, including health-related quality of life. PMID:22900138

  3. Analyzing force concept inventory with item response theory

    NASA Astrophysics Data System (ADS)

    Wang, Jing; Bao, Lei

    2010-10-01

    Item response theory is a popular assessment method used in education. It rests on the assumption of a probability framework that relates students' innate ability and their performance on test questions. Item response theory transforms students' raw test scores into a scaled proficiency score, which can be used to compare results obtained with different test questions. The scaled score also addresses the issues of ceiling effects and guessing, which commonly exist in quantitative assessment. We used item response theory to analyze the force concept inventory (FCI). Our results show that item response theory can be useful for analyzing physics concept surveys such as the FCI and produces results about the individual questions and student performance that are beyond the capability of classical statistics. The theory yields detailed measurement parameters regarding the difficulty, discrimination features, and probability of correct guess for each of the FCI questions.

  4. The Multidimensional Assessment of Interoceptive Awareness (MAIA)

    PubMed Central

    Mehling, Wolf E.; Price, Cynthia; Daubenmier, Jennifer J.; Acree, Mike; Bartmess, Elizabeth; Stewart, Anita

    2012-01-01

    This paper describes the development of a multidimensional self-report measure of interoceptive body awareness. The systematic mixed-methods process involved reviewing the current literature, specifying a multidimensional conceptual framework, evaluating prior instruments, developing items, and analyzing focus group responses to scale items by instructors and patients of body awareness-enhancing therapies. Following refinement by cognitive testing, items were field-tested in students and instructors of mind-body approaches. Final item selection was achieved by submitting the field test data to an iterative process using multiple validation methods, including exploratory cluster and confirmatory factor analyses, comparison between known groups, and correlations with established measures of related constructs. The resulting 32-item multidimensional instrument assesses eight concepts. The psychometric properties of these final scales suggest that the Multidimensional Assessment of Interoceptive Awareness (MAIA) may serve as a starting point for research and further collaborative refinement. PMID:23133619

  5. Methodology for Developing and Evaluating the PROMIS® Smoking Item Banks

    PubMed Central

    Cai, Li; Stucky, Brian D.; Tucker, Joan S.; Shadel, William G.; Edelen, Maria Orlando

    2014-01-01

    Introduction: This article describes the procedures used in the PROMIS® Smoking Initiative for the development and evaluation of item banks, short forms (SFs), and computerized adaptive tests (CATs) for the assessment of 6 constructs related to cigarette smoking: nicotine dependence, coping expectancies, emotional and sensory expectancies, health expectancies, psychosocial expectancies, and social motivations for smoking. Methods: Analyses were conducted using response data from a large national sample of smokers. Items related to each construct were subjected to extensive item factor analyses and evaluation of differential item functioning (DIF). Final item banks were calibrated, and SF assessments were developed for each construct. The performance of the SFs and the potential use of the item banks for CAT administration were examined through simulation study. Results: Item selection based on dimensionality assessment and DIF analyses produced item banks that were essentially unidimensional in structure and free of bias. Simulation studies demonstrated that the constructs could be accurately measured with a relatively small number of carefully selected items, either through fixed SFs or CAT-based assessment. Illustrative results are presented, and subsequent articles provide detailed discussion of each item bank in turn. Conclusions: The development of the PROMIS smoking item banks provides researchers with new tools for measuring smoking-related constructs. The use of the calibrated item banks and suggested SF assessments will enhance the quality of score estimates, thus advancing smoking research. Moreover, the methods used in the current study, including innovative approaches to item selection and SF construction, may have general relevance to item bank development and evaluation. PMID:23943843

  6. A knowledge-based theory of rising scores on "culture-free" tests.

    PubMed

    Fox, Mark C; Mitchum, Ainsley L

    2013-08-01

    Secular gains in intelligence test scores have perplexed researchers since they were documented by Flynn (1984, 1987). Gains are most pronounced on abstract, so-called culture-free tests, prompting Flynn (2007) to attribute them to problem-solving skills availed by scientifically advanced cultures. We propose that recent-born individuals have adopted an approach to analogy that enables them to infer higher level relations requiring roles that are not intrinsic to the objects that constitute initial representations of items. This proposal is translated into item-specific predictions about differences between cohorts in pass rates and item-response patterns on the Raven's Matrices (Flynn, 1987), a seemingly culture-free test that registers the largest Flynn effect. Consistent with predictions, archival data reveal that individuals born around 1940 are less able to map objects at higher levels of relational abstraction than individuals born around 1990. Polytomous Rasch models verify predicted violations of measurement invariance, as raw scores are found to underestimate the number of analogical rules inferred by members of the earlier cohort relative to members of the later cohort who achieve the same overall score. The work provides a plausible cognitive account of the Flynn effect, furthers understanding of the cognition of matrix reasoning, and underscores the need to consider how test-takers select item responses. PsycINFO Database Record (c) 2013 APA, all rights reserved.

  7. The emotional carryover effect in memory for words.

    PubMed

    Schmidt, Stephen R; Schmidt, Constance R

    2016-08-01

    Emotional material rarely occurs in isolation; rather it is experienced in the spatial and temporal proximity of less emotional items. Some previous researchers have found that emotional stimuli impair memory for surrounding information, whereas others have reported evidence for memory facilitation. Researchers have not determined which types of emotional items or memory tests produce effects that carry over to surrounding items. Six experiments are reported that measured carryover from emotional words varying in arousal to temporally adjacent neutral words. Taboo, non-taboo emotional, and neutral words were compared using different stimulus onset asynchronies (SOAs), recognition and recall tests, and intentional and incidental memory instructions. Strong emotional memory effects were obtained in all six experiments. However, emotional items influenced memory for temporally adjacent words under limited conditions. Words following taboo words were more poorly remembered than words following neutral words when relatively short SOAs were employed. Words preceding taboo words were affected only when recall tests and relatively short retention intervals were used. These results suggest that increased attention to the emotional items sometimes produces emotional carryover effects; however, retrieval processes also contribute to retrograde amnesia and may extend the conditions under which anterograde amnesia is observed.

  8. Biological Science: An Ecological Approach. BSCS Green Version. Teacher's Resource Book and Test Item Bank. Sixth Edition.

    ERIC Educational Resources Information Center

    Biological Sciences Curriculum Study, Colorado Springs.

    This book consists of four sections: (1) "Supplemental Materials"; (2) "Supplemental Investigations"; (3) "Test Item Bank"; and (4) "Blackline Masters." The first section provides additional background material related to selected chapters and investigations in the student book. Included are a periodic table of the elements, genetics problems and…

  9. Development of an assay of seven biochemical items, HbA1c, and hematocrit using a small amount of blood collected from the fingertip.

    PubMed

    Shinya, Sugimoto; Masaru, Akimoto; Akira, Hayakawa; Eisaku, Hokazono; Susumu, Osawa

    2012-01-18

    Lifestyle-related diseases in Japan account for 30% of the entire medical expenditure of the country and cause 60% of all deaths. For the prevention of lifestyle-related diseases, medical examination by laboratory tests on metabolic syndrome is important. To undertake examination by collection of blood from a fingertip, we developed the "Well Kit". About 65 μl of blood collected from a fingertip was diluted with buffer solution, which contained two internal standard materials. The kit also separated corpuscles and diluted plasma with a special filter. It measured the obtained diluted plasma using the JCA-BM2250. This measurement system was evaluated for the quantitative analysis of 8 items. The uncertainties of tested items of this measurement system were 1.7% to 6.4%. The coefficients of correlation of all tested items between this measurement value and the venous plasma sample value were 0.876-0.991, and hematocrit was 0.958. This system for testing blood collected from a fingertip is simple to use and can be applied in testing for metabolic syndrome. In addition, this testing system is useful in the medical examination of the personal healthcare and inhabitants. Copyright © 2011 Elsevier B.V. All rights reserved.

  10. Accounting for Local Dependence with the Rasch Model: The Paradox of Information Increase.

    PubMed

    Andrich, David

    Test theories imply statistical, local independence. Where local independence is violated, models of modern test theory that account for it have been proposed. One violation of local independence occurs when the response to one item governs the response to a subsequent item. Expanding on a formulation of this kind of violation between two items in the dichotomous Rasch model, this paper derives three related implications. First, it formalises how the polytomous Rasch model for an item constituted by summing the scores of the dependent items absorbs the dependence in its threshold structure. Second, it shows that as a consequence the unit when the dependence is accounted for is not the same as if the items had no response dependence. Third, it explains the paradox, known, but not explained in the literature, that the greater the dependence of the constituent items the greater the apparent information in the constituted polytomous item when it should provide less information.

  11. Age-related increases in false recognition: the role of perceptual and conceptual similarity.

    PubMed

    Pidgeon, Laura M; Morcom, Alexa M

    2014-01-01

    Older adults (OAs) are more likely to falsely recognize novel events than young adults, and recent behavioral and neuroimaging evidence points to a reduced ability to distinguish overlapping information due to decline in hippocampal pattern separation. However, other data suggest a critical role for semantic similarity. Koutstaal et al. [(2003) false recognition of abstract vs. common objects in older and younger adults: testing the semantic categorization account, J. Exp. Psychol. Learn. 29, 499-510] reported that OAs were only vulnerable to false recognition of items with pre-existing semantic representations. We replicated Koutstaal et al.'s (2003) second experiment and examined the influence of independently rated perceptual and conceptual similarity between stimuli and lures. At study, young and OAs judged the pleasantness of pictures of abstract (unfamiliar) and concrete (familiar) items, followed by a surprise recognition test including studied items, similar lures, and novel unrelated items. Experiment 1 used dichotomous "old/new" responses at test, while in Experiment 2 participants were also asked to judge lures as "similar," to increase explicit demands on pattern separation. In both experiments, OAs showed a greater increase in false recognition for concrete than abstract items relative to the young, replicating Koutstaal et al.'s (2003) findings. However, unlike in the earlier study, there was also an age-related increase in false recognition of abstract lures when multiple similar images had been studied. In line with pattern separation accounts of false recognition, OAs were more likely to misclassify concrete lures with high and moderate, but not low degrees of rated similarity to studied items. Results are consistent with the view that OAs are particularly susceptible to semantic interference in recognition memory, and with the possibility that this reflects age-related decline in pattern separation.

  12. Age-related increases in false recognition: the role of perceptual and conceptual similarity

    PubMed Central

    Pidgeon, Laura M.; Morcom, Alexa M.

    2014-01-01

    Older adults (OAs) are more likely to falsely recognize novel events than young adults, and recent behavioral and neuroimaging evidence points to a reduced ability to distinguish overlapping information due to decline in hippocampal pattern separation. However, other data suggest a critical role for semantic similarity. Koutstaal et al. [(2003) false recognition of abstract vs. common objects in older and younger adults: testing the semantic categorization account, J. Exp. Psychol. Learn. 29, 499–510] reported that OAs were only vulnerable to false recognition of items with pre-existing semantic representations. We replicated Koutstaal et al.’s (2003) second experiment and examined the influence of independently rated perceptual and conceptual similarity between stimuli and lures. At study, young and OAs judged the pleasantness of pictures of abstract (unfamiliar) and concrete (familiar) items, followed by a surprise recognition test including studied items, similar lures, and novel unrelated items. Experiment 1 used dichotomous “old/new” responses at test, while in Experiment 2 participants were also asked to judge lures as “similar,” to increase explicit demands on pattern separation. In both experiments, OAs showed a greater increase in false recognition for concrete than abstract items relative to the young, replicating Koutstaal et al.’s (2003) findings. However, unlike in the earlier study, there was also an age-related increase in false recognition of abstract lures when multiple similar images had been studied. In line with pattern separation accounts of false recognition, OAs were more likely to misclassify concrete lures with high and moderate, but not low degrees of rated similarity to studied items. Results are consistent with the view that OAs are particularly susceptible to semantic interference in recognition memory, and with the possibility that this reflects age-related decline in pattern separation. PMID:25368576

  13. No Retrieval-Induced Forgetting Using Item-Specific Independent Cues: Evidence against a General Inhibitory Account

    ERIC Educational Resources Information Center

    Camp, Gino; Pecher, Diane; Schmidt, Henk G.

    2007-01-01

    Retrieval practice with particular items from memory can impair the recall of related items on a later memory test. This retrieval-induced forgetting effect has been ascribed to inhibitory processes (M. C. Anderson & B. A. Spellman, 1995). A critical finding that distinguishes inhibitory from interference explanations is that forgetting is found…

  14. Transfer Appropriate Forgetting: The Cue-Dependent Nature of Retrieval-Induced Forgetting

    ERIC Educational Resources Information Center

    Perfect, Timothy J.; Stark, Louisa-Jayne; Tree, Jeremy J.; Moulin, Christopher J. A.; Ahmed, Lubna; Hutter, Russell

    2004-01-01

    Retrieval-induced forgetting is the failure to recall a previously studied word following repeated retrieval of a related item. It has been argued that this is due to retrieval competition between practiced and unpracticed items, which results in inhibition of the non-recalled item, detectable with an independent cue at final test. Three…

  15. The cost of proactive interference is constant across presentation conditions.

    PubMed

    Endress, Ansgar D; Siddique, Aneela

    2016-10-01

    Proactive interference (PI) severely constrains how many items people can remember. For example, Endress and Potter (2014a) presented participants with sequences of everyday objects at 250ms/picture, followed by a yes/no recognition test. They manipulated PI by either using new images on every trial in the unique condition (thus minimizing PI among items), or by re-using images from a limited pool for all trials in the repeated condition (thus maximizing PI among items). In the low-PI unique condition, the probability of remembering an item was essentially independent of the number of memory items, showing no clear memory limitations; more traditional working memory-like memory limitations appeared only in the high-PI repeated condition. Here, we ask whether the effects of PI are modulated by the availability of long-term memory (LTM) and verbal resources. Participants viewed sequences of 21 images, followed by a yes/no recognition test. Items were presented either quickly (250ms/image) or sufficiently slowly (1500ms/image) to produce LTM representations, either with or without verbal suppression. Across conditions, participants performed better in the unique than in the repeated condition, and better for slow than for fast presentations. In contrast, verbal suppression impaired performance only with slow presentations. The relative cost of PI was remarkably constant across conditions: relative to the unique condition, performance in the repeated condition was about 15% lower in all conditions. The cost of PI thus seems to be a function of the relative strength or recency of target items and interfering items, but relatively insensitive to other experimental manipulations. Copyright © 2016 Elsevier B.V. All rights reserved.

  16. Item Response Theory analysis of Fagerström Test for Cigarette Dependence.

    PubMed

    Svicher, Andrea; Cosci, Fiammetta; Giannini, Marco; Pistelli, Francesco; Fagerström, Karl

    2018-02-01

    The Fagerström Test for Cigarette Dependence (FTCD) and the Heaviness of Smoking Index (HSI) are the gold standard measures to assess cigarette dependence. However, FTCD reliability and factor structure have been questioned and HSI psychometric properties are in need of further investigations. The present study examined the psychometrics properties of the FTCD and the HSI via the Item Response Theory. The study was a secondary analysis of data collected in 862 Italian daily smokers. Confirmatory factor analysis was run to evaluate the dimensionality of FTCD. A Grade Response Model was applied to FTCD and HSI to verify the fit to the data. Both item and test functioning were analyzed and item statistics, Test Information Function, and scale reliabilities were calculated. Mokken Scale Analysis was applied to estimate homogeneity and Loevinger's coefficients were calculated. The FTCD showed unidimensionality and homogeneity for most of the items and for the total score. It also showed high sensitivity and good reliability from medium to high levels of cigarette dependence, although problems related to some items (i.e., items 3 and 5) were evident. HSI had good homogeneity, adequate item functioning, and high reliability from medium to high levels of cigarette dependence. Significant Differential Item Functioning was found for items 1, 4, 5 of the FTCD and for both items of HSI. HSI seems highly recommended in clinical settings addressed to heavy smokers while FTCD would be better used in smokers with a level of cigarette dependence ranging between low and high. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. Development and validation of the Current Opioid Misuse Measure.

    PubMed

    Butler, Stephen F; Budman, Simon H; Fernandez, Kathrine C; Houle, Brian; Benoit, Christine; Katz, Nathaniel; Jamison, Robert N

    2007-07-01

    Clinicians recognize the importance of monitoring aberrant medication-related behaviors of chronic pain patients while being prescribed opioid therapy. The purpose of this study was to develop and validate the Current Opioid Misuse Measure (COMM) for those pain patients already on long-term opioid therapy. An initial pool of 177 items was developed with input from 26 pain management and addiction specialists. Concept mapping identified six primary concepts underlying medication misuse, which were used to develop an initial item pool. Twenty-two pain and addiction specialists rated the items on importance and relevance, resulting in selection of a 40-item alpha COMM. Final item selection was based on empirical evaluation of items with patients taking opioids for chronic, noncancer pain (N=227). One-week test-retest reliability was examined with 55 participants. All participants were administered the alpha version of the COMM, the Prescription Drug Use Questionnaire (PDUQ) interview, and submitted a urine sample for toxicology screening. Physician ratings of patient aberrant behaviors were also obtained. Of the 40 items, 17 items appeared to adequately measure aberrant behavior, demonstrating excellent internal consistency and test-retest reliability. Cutoff scores were examined using ROC curve analysis and reasonable sensitivity and specificity were established. To evaluate the COMM's ability to capture change in patient status, it was tested on a subset of patients (N=86) that were followed and reassessed three months later. The COMM was found to have promise as a brief, self-report measure of current aberrant drug-related behavior. Further cross-validation and replication of these preliminary results is pending.

  18. Ability evaluation by binary tests: Problems, challenges & recent advances

    NASA Astrophysics Data System (ADS)

    Bashkansky, E.; Turetsky, V.

    2016-11-01

    Binary tests designed to measure abilities of objects under test (OUTs) are widely used in different fields of measurement theory and practice. The number of test items in such tests is usually very limited. The response to each test item provides only one bit of information per OUT. The problem of correct ability assessment is even more complicated, when the levels of difficulty of the test items are unknown beforehand. This fact makes the search for effective ways of planning and processing the results of such tests highly relevant. In recent years, there has been some progress in this direction, generated by both the development of computational tools and the emergence of new ideas. The latter are associated with the use of so-called “scale invariant item response models”. Together with maximum likelihood estimation (MLE) approach, they helped to solve some problems of engineering and proficiency testing. However, several issues related to the assessment of uncertainties, replications scheduling, the use of placebo, as well as evaluation of multidimensional abilities still present a challenge for researchers. The authors attempt to outline the ways to solve the above problems.

  19. Validity and Reliability of Persian Version of HIV/AIDS Related Stigma Scale for People Living With HIV/AIDS in Iran.

    PubMed

    Pourmarzi, Davoud; Khoramirad, Ashraf; Ahmari Tehran, Hoda; Abedini, Zahra

    2015-11-01

    To assess the perceived HIV/AIDS related stigma a comprehensive and well developed stigma instrument is necessary. This study aimed to assess validity and reliability of the Persian version of HIV/AIDS related stigma scale which was developed by Kang et al for people living with HIV/AIDS in Iran. Thescale was forward translatedby two bilingual academic members then both translations were discussed by expert team. Back-translation was done by two other bilingual translators then we carried out discussion with both of them. To evaluate understandability the scale was administered to 10 Persons Living with HIV/AIDS (PLWHA). Final Persian version was administered to 80 PLWHA in Qom, Iran in 2014. Test-retest reliability was assessed in a sample of 20 PLWHA after a week by intra-class correlation coefficient (ICC). Cronbach's alpha coefficient for overall scale was 0.85. Also Cronbach's alpha coefficients for the five subscales were as follows: social rejection (9 items, α = 0.84), negative self-worth (4 items, α = 0.70), perceived interpersonal insecurity (2 items, α = 0.57), financial insecurity (3 items, α = 0.70), discretionary disclosure (2 items, α = 0.83). Test-retest reliability was also approved with ICC = 0.78. Correlation between items and their hypothesized subscale is greater than 0.5. Correlation between an item and its own subscale was significantly higher than its correlation with other subscales. This study demonstrate that the Persian version of HIV/AIDS related stigma scale is valid and reliable to assess HIV/AIDS related stigma perceived by people living whit HIV/AIDS in Iran.

  20. The Caregiver Contribution to Heart Failure Self-Care (CACHS): Further Psychometric Testing of a Novel Instrument.

    PubMed

    Buck, Harleah G; Harkness, Karen; Ali, Muhammad Usman; Carroll, Sandra L; Kryworuchko, Jennifer; McGillion, Michael

    2017-04-01

    Caregivers (CGs) contribute important assistance with heart failure (HF) self-care, including daily maintenance, symptom monitoring, and management. Until CGs' contributions to self-care can be quantified, it is impossible to characterize it, account for its impact on patient outcomes, or perform meaningful cost analyses. The purpose of this study was to conduct psychometric testing and item reduction on the recently developed 34-item Caregiver Contribution to Heart Failure Self-care (CACHS) instrument using classical and item response theory methods. Fifty CGs (mean age 63 years ±12.84; 70% female) recruited from a HF clinic completed the CACHS in 2014 and results evaluated using classical test theory and item response theory. Items would be deleted for low (<.05) or high (>.95) endorsement, low (<.3) or high (>.7) corrected item-total correlations, significant pairwise correlation coefficients, floor or ceiling effects, relatively low latent trait and item information function levels (<1.5 and p > .5), and differential item functioning. After analysis, 14 items were excluded, resulting in a 20-item instrument (self-care maintenance eight items; monitoring seven items; and management five items). Most items demonstrated moderate to high discrimination (median 2.13, minimum .77, maximum 5.05), and appropriate item difficulty (-2.7 to 1.4). Internal consistency reliability was excellent (Cronbach α = .94, average inter-item correlation = .41) with no ceiling effects. The newly developed 20-item version of the CACHS is supported by rigorous instrument development and represents a novel instrument to measure CGs' contribution to HF self-care. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  1. Differential age-related effects on conjunctive and relational visual short-term memory binding.

    PubMed

    Bastin, Christine

    2017-12-28

    An age-related associative deficit has been described in visual short-term binding memory tasks. However, separate studies have suggested that ageing disrupts relational binding (to associate distinct items or item and context) more than conjunctive binding (to integrate features within an object). The current study directly compared relational and conjunctive binding with a short-term memory task for object-colour associations in 30 young and 30 older adults. Participants studied a number of object-colour associations corresponding to their individual object span level in a relational task in which objects were associated to colour patches and a conjunctive task where colour was integrated into the object. Memory for individual items and for associations was tested with a recognition memory test. Evidence for an age-related associative deficit was observed in the relational binding task, but not in the conjunctive binding task. This differential impact of ageing on relational and conjunctive short-term binding is discussed by reference to two underlying age-related cognitive difficulties: diminished hippocampally dependent binding and attentional resources.

  2. Single Event Effect (SEE) Test Planning 101

    NASA Technical Reports Server (NTRS)

    LaBel, Kenneth A.; Pellish, Jonathan; Berg, Melanie D.

    2011-01-01

    This is a course on SEE Test Plan development. It is an introductory discussion of the items that go into planning an SEE test that should complement the SEE test methodology used. Material will only cover heavy ion SEE testing and not proton, LASER, or other though many of the discussed items may be applicable. While standards and guidelines for how-to perform single event effects (SEE) testing have existed almost since the first cyclotron testing, guidance on the development of SEE test plans has not been as easy to find. In this section of the short course, we attempt to rectify this lack. We consider the approach outlined here as a "living" document: mission specific constraints and new technology related issues always need to be taken into account. We note that we will use the term "test planning" in the context of those items being included in a test plan.

  3. International field testing of the psychometric properties of an EORTC quality of life module for oral health: the EORTC QLQ-OH15.

    PubMed

    Hjermstad, Marianne J; Bergenmar, Mia; Bjordal, Kristin; Fisher, Sheila E; Hofmeister, Dirk; Montel, Sébastien; Nicolatou-Galitis, Ourania; Pinto, Monica; Raber-Durlacher, Judith; Singer, Susanne; Tomaszewska, Iwona M; Tomaszewski, Krzysztof A; Verdonck-de Leeuw, Irma; Yarom, Noam; Winstanley, Julie B; Herlofson, Bente B

    2016-09-01

    This international EORTC validation study (phase IV) is aimed at testing the psychometric properties of a quality of life (QoL) module related to oral health problems in cancer patients. The phase III module comprised 17 items with four hypothesized multi-item scales and three single items. In phase IV, patients with mixed cancers, in different treatment phases from 10 countries completed the EORTC QLQ-C30, the QLQ-OH module, and a debriefing interview. The hypothesized structure was tested using combinations of classical test theory and item response theory, following EORTC guidelines. Test-retest assessments and responsiveness to change analysis (RCA) were performed after 2 weeks. Five hundred seventy-two patients (median age 60.3, 54 % females) were analyzed. Completion took <10 min for 84 %, 40 % expressed satisfaction that these issues were addressed. Analyses suggested a revision of the phase III hypothesized scale structure. Two items were deleted based on a high degree of item misfit, together with negative patient feedback. The remaining 15 items formed one eight-item scale named OH-QoL score, a two-item information scale, a two-item scale regarding dentures, and three single items (sticky saliva/mouth soreness/sensitivity to food/drink). Face and convergent validity and internal consistency were confirmed. Test-retest reliability (n = 60) was demonstrated as was RCA for patients undergoing chemotherapy (n = 117; p = 0.06). The resulting QLQ-OH15 discriminated between clinically distinct patient groups, e.g., low performance status vs. higher (p < 000.1), and head-and-neck cancer versus other cancers (p < 0.03). The EORTC module QLQ-OH15 is a short, well-accepted assessment tool focusing on oral problems and QoL to improve clinical management. ClinicalTrials.gov Identifier: NCT01724333.

  4. Developing and testing an instrument for identifying performance incentives in the Greek health care sector.

    PubMed

    Paleologou, Victoria; Kontodimopoulos, Nick; Stamouli, Aggeliki; Aletras, Vassilis; Niakas, Dimitris

    2006-09-13

    In the era of cost containment, managers are constantly pursuing increased organizational performance and productivity by aiming at the obvious target, i.e. the workforce. The health care sector, in which production processes are more complicated compared to other industries, is not an exception. In light of recent legislation in Greece in which efficiency improvement and achievement of specific performance targets are identified as undisputable health system goals, the purpose of this study was to develop a reliable and valid instrument for investigating the attitudes of Greek physicians, nurses and administrative personnel towards job-related aspects, and the extent to which these motivate them to improve performance and increase productivity. A methodological exploratory design was employed in three phases: a) content development and assessment, which resulted in a 28-item instrument, b) pilot testing (N = 74) and c) field testing (N = 353). Internal consistency reliability was tested via Cronbach's alpha coefficient and factor analysis was used to identify the underlying constructs. Tests of scaling assumptions, according to the Multitrait-Multimethod Matrix, were used to confirm the hypothesized component structure. Four components, referring to intrinsic individual needs and external job-related aspects, were revealed and explain 59.61% of the variability. They were subsequently labeled: job attributes, remuneration, co-workers and achievement. Nine items not meeting item-scale criteria were removed, resulting in a 19-item instrument. Scale reliability ranged from 0.782 to 0.901 and internal item consistency and discriminant validity criteria were satisfied. Overall, the instrument appears to be a promising tool for hospital administrations in their attempt to identify job-related factors, which motivate their employees. The psychometric properties were good and warrant administration to a larger sample of employees in the Greek healthcare system.

  5. Developing and testing an instrument for identifying performance incentives in the Greek health care sector

    PubMed Central

    Paleologou, Victoria; Kontodimopoulos, Nick; Stamouli, Aggeliki; Aletras, Vassilis; Niakas, Dimitris

    2006-01-01

    Background In the era of cost containment, managers are constantly pursuing increased organizational performance and productivity by aiming at the obvious target, i.e. the workforce. The health care sector, in which production processes are more complicated compared to other industries, is not an exception. In light of recent legislation in Greece in which efficiency improvement and achievement of specific performance targets are identified as undisputable health system goals, the purpose of this study was to develop a reliable and valid instrument for investigating the attitudes of Greek physicians, nurses and administrative personnel towards job-related aspects, and the extent to which these motivate them to improve performance and increase productivity. Methods A methodological exploratory design was employed in three phases: a) content development and assessment, which resulted in a 28-item instrument, b) pilot testing (N = 74) and c) field testing (N = 353). Internal consistency reliability was tested via Cronbach's alpha coefficient and factor analysis was used to identify the underlying constructs. Tests of scaling assumptions, according to the Multitrait-Multimethod Matrix, were used to confirm the hypothesized component structure. Results Four components, referring to intrinsic individual needs and external job-related aspects, were revealed and explain 59.61% of the variability. They were subsequently labeled: job attributes, remuneration, co-workers and achievement. Nine items not meeting item-scale criteria were removed, resulting in a 19-item instrument. Scale reliability ranged from 0.782 to 0.901 and internal item consistency and discriminant validity criteria were satisfied. Conclusion Overall, the instrument appears to be a promising tool for hospital administrations in their attempt to identify job-related factors, which motivate their employees. The psychometric properties were good and warrant administration to a larger sample of employees in the Greek healthcare system. PMID:16970823

  6. Methodology for developing and evaluating the PROMIS smoking item banks.

    PubMed

    Hansen, Mark; Cai, Li; Stucky, Brian D; Tucker, Joan S; Shadel, William G; Edelen, Maria Orlando

    2014-09-01

    This article describes the procedures used in the PROMIS Smoking Initiative for the development and evaluation of item banks, short forms (SFs), and computerized adaptive tests (CATs) for the assessment of 6 constructs related to cigarette smoking: nicotine dependence, coping expectancies, emotional and sensory expectancies, health expectancies, psychosocial expectancies, and social motivations for smoking. Analyses were conducted using response data from a large national sample of smokers. Items related to each construct were subjected to extensive item factor analyses and evaluation of differential item functioning (DIF). Final item banks were calibrated, and SF assessments were developed for each construct. The performance of the SFs and the potential use of the item banks for CAT administration were examined through simulation study. Item selection based on dimensionality assessment and DIF analyses produced item banks that were essentially unidimensional in structure and free of bias. Simulation studies demonstrated that the constructs could be accurately measured with a relatively small number of carefully selected items, either through fixed SFs or CAT-based assessment. Illustrative results are presented, and subsequent articles provide detailed discussion of each item bank in turn. The development of the PROMIS smoking item banks provides researchers with new tools for measuring smoking-related constructs. The use of the calibrated item banks and suggested SF assessments will enhance the quality of score estimates, thus advancing smoking research. Moreover, the methods used in the current study, including innovative approaches to item selection and SF construction, may have general relevance to item bank development and evaluation. © The Author 2013. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  7. Evaluating the Wald Test for Item-Level Comparison of Saturated and Reduced Models in Cognitive Diagnosis

    ERIC Educational Resources Information Center

    de la Torre, Jimmy; Lee, Young-Sun

    2013-01-01

    This article used the Wald test to evaluate the item-level fit of a saturated cognitive diagnosis model (CDM) relative to the fits of the reduced models it subsumes. A simulation study was carried out to examine the Type I error and power of the Wald test in the context of the G-DINA model. Results show that when the sample size is small and a…

  8. The Role of Memory Activation in Creating False Memories of Encoding Context

    ERIC Educational Resources Information Center

    Arndt, Jason

    2010-01-01

    Using 3 experiments, I examined false memory for encoding context by presenting Deese-Roediger-McDermott themes (Deese, 1959; Roediger & McDermott, 1995) in usual-looking fonts and by testing related, but unstudied, lure items in a font that was shown during encoding. In 2 of the experiments, testing lure items in the font used to study their…

  9. Lord-Wingersky Algorithm Version 2.0 for Hierarchical Item Factor Models with Applications in Test Scoring, Scale Alignment, and Model Fit Testing. CRESST Report 830

    ERIC Educational Resources Information Center

    Cai, Li

    2013-01-01

    Lord and Wingersky's (1984) recursive algorithm for creating summed score based likelihoods and posteriors has a proven track record in unidimensional item response theory (IRT) applications. Extending the recursive algorithm to handle multidimensionality is relatively simple, especially with fixed quadrature because the recursions can be defined…

  10. Distribution of Reading Time When Questions are Asked about a Restricted Category of Text Information.

    ERIC Educational Resources Information Center

    Reynolds, Ralph E.; And Others

    1979-01-01

    College students read a text either with or without inserted questions. Question groups performed better, relative to controls, on post-test items that repeated inserted questions, and on new post-test items from the same categories as the inserted questions. A selective attention interpretation of the effect of inserted questions was made.…

  11. A Method for the Comparison of Item Selection Rules in Computerized Adaptive Testing

    ERIC Educational Resources Information Center

    Barrada, Juan Ramon; Olea, Julio; Ponsoda, Vicente; Abad, Francisco Jose

    2010-01-01

    In a typical study comparing the relative efficiency of two item selection rules in computerized adaptive testing, the common result is that they simultaneously differ in accuracy and security, making it difficult to reach a conclusion on which is the more appropriate rule. This study proposes a strategy to conduct a global comparison of two or…

  12. Learning Factors Transfer Analysis: Using Learning Curve Analysis to Automatically Generate Domain Models

    ERIC Educational Resources Information Center

    Pavlik, Philip I. Jr.; Cen, Hao; Koedinger, Kenneth R.

    2009-01-01

    This paper describes a novel method to create a quantitative model of an educational content domain of related practice item-types using learning curves. By using a pairwise test to search for the relationships between learning curves for these item-types, we show how the test results in a set of pairwise transfer relationships that can be…

  13. Fourteen years of progress testing in radiology residency training: experiences from The Netherlands.

    PubMed

    Rutgers, D R; van Raamt, F; van Lankeren, W; Ravesloot, C J; van der Gijp, A; Ten Cate, Th J; van Schaik, J P J

    2018-05-01

    To describe the development of the Dutch Radiology Progress Test (DRPT) for knowledge testing in radiology residency training in The Netherlands from its start in 2003 up to 2016. We reviewed all DRPTs conducted since 2003. We assessed key changes and events in the test throughout the years, as well as resident participation and dispensation for the DRPT, test reliability and discriminative power of test items. The DRPT has been conducted semi-annually since 2003, except for 2015 when one digital DRPT failed. Key changes in these years were improvements in test analysis and feedback, test digitalization (2013) and inclusion of test items on nuclear medicine (2016). From 2003 to 2016, resident dispensation rates increased (Pearson's correlation coefficient 0.74, P-value <0.01) to maximally 16 %. Cronbach´s alpha for test reliability varied between 0.83 and 0.93. The percentage of DRPT test items with negative item-rest-correlations, indicating relatively poor discriminative power, varied between 4 % and 11 %. Progress testing has proven feasible and sustainable in Dutch radiology residency training, keeping up with innovations in the radiological profession. Test reliability and discriminative power of test items have remained fair over the years, while resident dispensation rates have increased. • Progress testing allows for monitoring knowledge development from novice to senior trainee. • In postgraduate medical training, progress testing is used infrequently. • Progress testing is feasible and sustainable in radiology residency training.

  14. Associative memory in aging: the effect of unitization on source memory.

    PubMed

    Bastin, Christine; Diana, Rachel A; Simon, Jessica; Collette, Fabienne; Yonelinas, Andrew P; Salmon, Eric

    2013-03-01

    In normal aging, memory for associations declines more than memory for individual items. Unitization is an encoding process defined by creation of a new single entity to represent a new arbitrary association. The current study tested the hypothesis that age-related differences in associative memory can be reduced by encoding instructions that promote unitization. In two experiments, groups of 20 young and 20 older participants learned new associations between a word and a background color under two conditions. In the item detail condition, they had to imagine that the item is the same color as the background-an instruction promoting unitization of the associations. In the context detail condition, which did not promote unitization, they had to imagine that the item interacted with another colored object. At test, they had to retrieve the color that was associated with each word (source memory). In both experiments, the results showed an age-related decrement in source memory performance in the context detail but not in the item detail condition. Moreover, Experiment 2 examined receiver operating characteristics in older participants and indicated that familiarity contributed more to source memory performance in the item detail than in the context detail condition. These findings suggest that unitization of new associations can overcome the associative memory deficit observed in aging, at least for item-color associations.

  15. Uncertainty in BRCA1 cancer susceptibility testing.

    PubMed

    Baty, Bonnie J; Dudley, William N; Musters, Adrian; Kinney, Anita Y

    2006-11-15

    This study investigated uncertainty in individuals undergoing genetic counseling/testing for breast/ovarian cancer susceptibility. Sixty-three individuals from a single kindred with a known BRCA1 mutation rated uncertainty about 12 items on a five-point Likert scale before and 1 month after genetic counseling/testing. Factor analysis identified a five-item total uncertainty scale that was sensitive to changes before and after testing. The items in the scale were related to uncertainty about obtaining health care, positive changes after testing, and coping well with results. The majority of participants (76%) rated reducing uncertainty as an important reason for genetic testing. The importance of reducing uncertainty was stable across time and unrelated to anxiety or demographics. Yet, at baseline, total uncertainty was low and decreased after genetic counseling/testing (P = 0.004). Analysis of individual items showed that after genetic counseling/testing, there was less uncertainty about the participant detecting cancer early (P = 0.005) and coping well with their result (P < 0.001). Our findings support the importance to clients of genetic counseling/testing as a means of reducing uncertainty. Testing may help clients to reduce the uncertainty about items they can control, and it may be important to differentiate the sources of uncertainty that are more or less controllable. Genetic counselors can help clients by providing anticipatory guidance about the role of uncertainty in genetic testing. (c) 2006 Wiley-Liss, Inc.

  16. Development and validation of a new condition-specific instrument for evaluation of smile esthetics-related quality of life.

    PubMed

    Saltovic, Ema; Lajnert, Vlatka; Saltovic, Sabina; Kovacevic Pavicic, Daniela; Pavlic, Andrej; Spalj, Stjepan

    2018-03-01

    Orofacial esthetics raises psychosocial issues. The purpose was to create and validate new short instrument for psychosocial impacts of altered smile esthetics. A team of an orthodontist, two prosthodontists, psychologist, and a dental student generated items that could draw up specific hypothetical psychosocial dimensions (69 items initially, 39 in final analysis). The sample consisted of 261 Caucasian subjects attending local high schools and university (26% male) aged 14 to 28 years that have self-administrated the designed questionnaire. Factorial analysis, Cronbach's alpha, Pearson correlation, paired samples t-test and analysis of variance were used for analyses of internal consistency, construct validity, responsiveness, and test-retest. Three dimensions of psychosocial impacts of altered smile esthetics were identified: dental self-consciousness, dental self-confidence and social contacts that can be best fitted by 12 items, 4 items in each dimension. Internal consistency was good (α in range 0.85-0.89). Good stability in test-retest was confirmed. In responsiveness testing, tooth whitening induced increase in dental self-confidence (P = 0.002), but no significant changes in other dimensions. The new instrument, Smile Esthetics-Related Quality of Life (SERQoL), is short and has proven to be a good indicator of psychosocial dimensions related to perception of smile esthetics. Smile Esthetics-Related Quality of Life questionnaire might have practical validity when applied in esthetic dental clinical procedures. © 2017 Wiley Periodicals, Inc.

  17. Language development and affecting factors in 3- to 6-year-old children.

    PubMed

    Muluk, Nuray Bayar; Bayoğlu, Birgül; Anlar, Banu

    2014-05-01

    The aim of this study was to assess factors affecting language developmental screening test results in 33.0- to 75.0-month-old children. The study group consists of 402 children, 172 (42.8%) boys and 230 (57.2%) girls, aged 33.0-75.0 months who were examined in four age groups: 3 years (33.0-39.0 months), 4 years (45.0-51.0 months), 5 years (57.0-63.0 months) and 6 years (69.0-75.0 months). Demographic data and medical history obtained by a standard questionnaire and Denver II Developmental Test results were evaluated. Maternal factors such as mother's age, educational level, and socioeconomic status (SES) correlated with language items in all age groups. Linear regression analysis indicated a significant effect of mother's education and higher SES on certain expressive and receptive language items at 3 and 4 years. Fine motor items were closely related to language items at all ages examined, while in the younger (3- and 4-year-old) group gross motor items also were related to language development. Maternal and socioeconomic factors influence language development in children: these effects, already discernible with a screening test, can be potential targets for social and educational interventions. The interpretation of screening test results should take into account the interaction between fine motor and language development in preschool children.

  18. The Effect of SSM Grading on Reliability When Residual Items Have No Discriminating Power.

    ERIC Educational Resources Information Center

    Kane, Michael T.; Moloney, James M.

    Gilman and Ferry have shown that when the student's score on a multiple choice test is the total number of responses necessary to get all items correct, substantial increases in reliability can occur. In contrast, similar procedures giving partial credit on multiple choice items have resulted in relatively small gains in reliability. The analysis…

  19. A Primer on the 2- and 3-Parameter Item Response Theory Models.

    ERIC Educational Resources Information Center

    Thornton, Artist

    Item response theory (IRT) is a useful and effective tool for item response measurement if used in the proper context. This paper discusses the sets of assumptions under which responses can be modeled while exploring the framework of the IRT models relative to response testing. The one parameter model, or one parameter logistic model, is perhaps…

  20. Development and testing of item response theory-based item banks and short forms for eye, skin and lung problems in sarcoidosis.

    PubMed

    Victorson, David E; Choi, Seung; Judson, Marc A; Cella, David

    2014-05-01

    Sarcoidosis is a multisystem disease that can negatively impact health-related quality of life (HRQL) across generic (e.g., physical, social and emotional wellbeing) and disease-specific (e.g., pulmonary, ocular, dermatologic) domains. Measurement of HRQL in sarcoidosis has largely relied on generic patient-reported outcome tools, with little disease-specific measures available. The purpose of this paper is to present the development and testing of disease-specific item banks and short forms of lung, skin and eye problems, which are a part of a new patient-reported outcome (PRO) instrument called the sarcoidosis assessment tool. After prioritizing and selecting the most important disease-specific domains, we wrote new items to reflect disease-specific problems by drawing from patient focus group and clinician expert survey data that were used to create our conceptual model of HRQL in sarcoidosis. Item pools underwent cognitive interviews by sarcoidosis patients (n = 13), and minor modifications were made. These items were administered in a multi-site study (n = 300) to obtain item calibrations and create calibrated short forms using item response theory (IRT) approaches. From the available item pools, we created four new item banks and short forms: (1) skin problems, (2) skin stigma, (3) lung problems, and (4) eye Problems. We also created and tested supplemental forms of the most common constitutional symptoms and negative effects of corticosteroids. Several new sarcoidosis-specific PROs were developed and tested using IRT approaches. These new measures can advance more precise and targeted HRQL assessment in sarcoidosis clinical trials and clinical practice.

  1. Dissociable Effects of Valence and Arousal on Different Subtypes of Old/New Effect: Evidence from Event-Related Potentials

    PubMed Central

    Xu, Huifang; Zhang, Qin; Li, Bingbing; Guo, Chunyan

    2015-01-01

    Here, we utilized the study-test paradigm combined with recognition confidence assessment and behavioral and event-related potential (ERP) measurements to investigate the effects of valence and arousal on the different subtypes of the old-new effect. We also test the effect of valence and arousal at encoding stage to investigate the underlying mechanism of the effect of the two emotional dimension on different retrieval process. In order to test the effects of valence and arousal on old/new effect precisely, we used the “subject-oriented orthogonal design” which manipulated valence and arousal independently according to subjects’ verbal reporting to investigate the effects of valence and arousal on old/new effect respectively. Three subtypes of old/new effect were obtained in the test phase, which were FN400, LPC, and late positivity over right frontal. They are supposed to be associated with familiarity, recollection, and post-retrieval processes respectively according to previous studies. For the FN400 component, valence affected mid-frontal negativity from 350–500 ms. Pleasant items evoked an enhanced ERP old/new effect relative to unpleasant items. However, arousal only affected LPC amplitude from 500–800 ms. The old/new effect for high-arousal items was greater than for low-arousal items. Valence also affected the amplitude of a positive-going slow wave at right frontal sites from 800–1000 ms, possibly serving as an index of post-retrieval processing. At encoding stage, the valence and arousal also have dissociable effect on the frontal slow wave between 350–800 ms and the centro-parietal positivity in 500–800 ms. The pleasant items evoked a more positive frontal slow wave relative to unpleasant ones, and the high arousal items evoked a larger centro-parietal positivity relative to low arousal ones. These results suggest that valence and arousal may differentially impact these different memory processes: valence affects familiarity and post-retrieval processing, whereas arousal affects recollection. These effects may be due to the conceptual encoding strategies for pleasant information and sensory encoding strategies for high arousal information. PMID:26696862

  2. The psychometric properties of the "Reading the Mind in the Eyes" Test: an item response theory (IRT) analysis.

    PubMed

    Preti, Antonio; Vellante, Marcello; Petretto, Donatella R

    2017-05-01

    The "Reading the Mind in the Eyes" Test (hereafter: Eyes Test) is considered an advanced task of the Theory of Mind aimed at assessing the performance of the participant in perspective-takingthat is, the ability to sense or understand other people's cognitive and emotional states. In this study, the item response theory analysis was applied to the adult version of the Eyes Test. The Italian version of the Eyes Test was administered to 200 undergraduate students of both genders (males = 46%). Modified parallel analysis (MPA) was used to test unidimensionality. Marginal maximum likelihood estimation was used to fit the 1-, 2-, and 3-parameter logistic (PL) model to the data. Differential Item Functioning (DIF) due to gender was explored with five independent methods. MPA provided evidence in favour of unidimensionality. The Rasch model (1-PL) was superior to the other two models in explaining participants' responses to the Eyes Test. There was no robust evidence of gender-related DIF in the Eyes Test, although some differences may exist for some items as a reflection of real differences by group. The study results support a one-factor model of the Eyes Test. Performance on the Eyes Test is defined by the participant's ability in perspective-taking. Researchers should cease using arbitrarily selected subscores in assessing the performance of participants to the Eyes Test. Lack of gender-related DIF favours the use of the Eyes Test in the investigation of gender differences concerning empathy and social cognition.

  3. Overview and current management of computerized adaptive testing in licensing/certification examinations.

    PubMed

    Seo, Dong Gi

    2017-01-01

    Computerized adaptive testing (CAT) has been implemented in high-stakes examinations such as the National Council Licensure Examination-Registered Nurses in the United States since 1994. Subsequently, the National Registry of Emergency Medical Technicians in the United States adopted CAT for certifying emergency medical technicians in 2007. This was done with the goal of introducing the implementation of CAT for medical health licensing examinations. Most implementations of CAT are based on item response theory, which hypothesizes that both the examinee and items have their own characteristics that do not change. There are 5 steps for implementing CAT: first, determining whether the CAT approach is feasible for a given testing program; second, establishing an item bank; third, pretesting, calibrating, and linking item parameters via statistical analysis; fourth, determining the specification for the final CAT related to the 5 components of the CAT algorithm; and finally, deploying the final CAT after specifying all the necessary components. The 5 components of the CAT algorithm are as follows: item bank, starting item, item selection rule, scoring procedure, and termination criterion. CAT management includes content balancing, item analysis, item scoring, standard setting, practice analysis, and item bank updates. Remaining issues include the cost of constructing CAT platforms and deploying the computer technology required to build an item bank. In conclusion, in order to ensure more accurate estimations of examinees' ability, CAT may be a good option for national licensing examinations. Measurement theory can support its implementation for high-stakes examinations.

  4. Overview and current management of computerized adaptive testing in licensing/certification examinations

    PubMed Central

    2017-01-01

    Computerized adaptive testing (CAT) has been implemented in high-stakes examinations such as the National Council Licensure Examination-Registered Nurses in the United States since 1994. Subsequently, the National Registry of Emergency Medical Technicians in the United States adopted CAT for certifying emergency medical technicians in 2007. This was done with the goal of introducing the implementation of CAT for medical health licensing examinations. Most implementations of CAT are based on item response theory, which hypothesizes that both the examinee and items have their own characteristics that do not change. There are 5 steps for implementing CAT: first, determining whether the CAT approach is feasible for a given testing program; second, establishing an item bank; third, pretesting, calibrating, and linking item parameters via statistical analysis; fourth, determining the specification for the final CAT related to the 5 components of the CAT algorithm; and finally, deploying the final CAT after specifying all the necessary components. The 5 components of the CAT algorithm are as follows: item bank, starting item, item selection rule, scoring procedure, and termination criterion. CAT management includes content balancing, item analysis, item scoring, standard setting, practice analysis, and item bank updates. Remaining issues include the cost of constructing CAT platforms and deploying the computer technology required to build an item bank. In conclusion, in order to ensure more accurate estimations of examinees’ ability, CAT may be a good option for national licensing examinations. Measurement theory can support its implementation for high-stakes examinations. PMID:28811394

  5. The ontogeny of serial-order behavior in humans (Homo sapiens): representation of a list.

    PubMed

    Guyla, Michelle; Colombo, Michael

    2004-03-01

    The authors trained 3-, 4-, 7-, and 10-year-old children and adults (Homo sapiens) on a nonverbal serial-order task to respond to 5 items in a specific order. Knowledge of each item's sequential position was then examined using pairwise and triplet tests. Adults and 7- and 10-year-olds performed at high levels on both tests, whereas 3- and 4-year-olds did not. The latency to respond to the first item of a test pair or triplet was linearly related to that item's position in the training series for the 7- and 10-year-olds and adults, but not for the 3- and 4-year-olds. These data suggest that older children and adults, but not younger children, developed a well-integrated internal representation of the serial list. ((c) 2004 APA, all rights reserved)

  6. Development and psychometric testing of the Nursing Workplace Relational Environment Scale (NWRES).

    PubMed

    Duddle, Maree; Boughton, Maureen

    2009-03-01

    The aim of this study was to develop and test the psychometric properties of the Nursing Workplace Relational Environment Scale (NWRES). A positive relational environment in the workplace is characterised by a sense of connectedness and belonging, support and cooperation among colleagues, open communication and effectively managed conflict. A poor relational environment in the workplace may contribute to job dissatisfaction and early turnover of staff. Quantitative survey. A three-stage process was used to design and test the NWRES. In Stage 1, an extensive literature review was conducted on professional working relationships and the nursing work environment. Three key concepts; collegiality, workplace conflict and job satisfaction were identified and defined. In Stage 2, a pool of items was developed from the dimensions of each concept and formulated into a 35-item scale which was piloted on a convenience sample of 31 nurses. In Stage 3, the newly refined 28-item scale was administered randomly to a convenience sample of 150 nurses. Psychometric testing was conducted to establish the construct validity and reliability of the scale. Exploratory factor analysis resulted in a 22-item scale. The factor analysis indicated a four-factor structure: collegial behaviours, relational atmosphere, outcomes of conflict and job satisfaction which explained 68.12% of the total variance. Cronbach's alpha coefficient for the NWRES was 0.872 and the subscales ranged from 0.781-0.927. The results of the study confirm the reliability and validity of the NWRES. Replication of this study with a larger sample is indicated to determine relationships among the subscales. The results of this study have implications for health managers in terms of understanding the impact of the relational environment of the workplace on job satisfaction and retention.

  7. How does creating a concept map affect item-specific encoding?

    PubMed

    Grimaldi, Phillip J; Poston, Laurel; Karpicke, Jeffrey D

    2015-07-01

    Concept mapping has become a popular learning tool. However, the processes underlying the task are poorly understood. In the present study, we examined the effect of creating a concept map on the processing of item-specific information. In 2 experiments, subjects learned categorized or ad hoc word lists by making pleasantness ratings, sorting words into categories, or creating a concept map. Memory was tested using a free recall test and a recognition memory test, which is considered to be especially sensitive to item-specific processing. Typically, tasks that promote item-specific processing enhance free recall of categorized lists, relative to category sorting. Concept mapping resulted in lower recall performance than both the pleasantness rating and category sorting condition for categorized words. Moreover, concept mapping resulted in lower recognition memory performance than the other 2 tasks. These results converge on the conclusion that creating a concept map disrupts the processing of item-specific information. (c) 2015 APA, all rights reserved.

  8. The Dysexecutive Questionnaire advanced: item and test score characteristics, 4-factor solution, and severity classification.

    PubMed

    Bodenburg, Sebastian; Dopslaff, Nina

    2008-01-01

    The Dysexecutive Questionnaire (DEX, , Behavioral assessment of the dysexecutive syndrome, 1996) is a standardized instrument to measure possible behavioral changes as a result of the dysexecutive syndrome. Although initially intended only as a qualitative instrument, the DEX has also been used increasingly to address quantitative problems. Until now there have not been more fundamental statistical analyses of the questionnaire's testing quality. The present study is based on an unselected sample of 191 patients with acquired brain injury and reports on the data relating to the quality of the items, the reliability and the factorial structure of the DEX. Item 3 displayed too great an item difficulty, whereas item 11 was not sufficiently discriminating. The DEX's reliability in self-rating is r = 0.85. In addition to presenting the statistical values of the tests, a clinical severity classification of the overall scores of the 4 found factors and of the questionnaire as a whole is carried out on the basis of quartile standards.

  9. Using Data Mining to Predict K-12 Students' Performance on Large-Scale Assessment Items Related to Energy

    ERIC Educational Resources Information Center

    Liu, Xiufeng; Ruiz, Miguel E.

    2008-01-01

    This article reports a study on using data mining to predict K-12 students' competence levels on test items related to energy. Data sources are the 1995 Third International Mathematics and Science Study (TIMSS), 1999 TIMSS-Repeat, 2003 Trend in International Mathematics and Science Study (TIMSS), and the National Assessment of Educational…

  10. FamilyPso - a new questionnaire to assess the impact of psoriasis on partners and family of patients.

    PubMed

    Mrowietz, U; Hartmann, A; Weißmann, W; Zschocke, I

    2017-01-01

    Psoriasis is a lifelong disease for which there is no cure. It has been conclusively shown across all ethnicities that patients suffering from psoriasis have a significantly reduced health-related quality of life and a high disease burden. Surprisingly little is known about the impact of a patient's psoriasis on partners or family members. To address this issue a systematic literature search has been conducted and interviews with relatives of psoriasis patients living in the same household were performed. From this collected information, items were generated that were commonly mentioned to affect living and tested in a large group of relatives before the final item selection was done. A first set of 29 items was selected and tested in a study with 96 patient relatives. After adjustment and statistical analysis, the final FamilyPso questionnaire was condensed to 15 items to assess the burden of partners or family members living together with psoriasis patients. The FamilyPso enables physicians to achieve a better understanding of the impact of psoriasis as a lifelong chronic disease on partners and the family environment. © 2016 European Academy of Dermatology and Venereology.

  11. The associative memory deficit in aging is related to reduced selectivity of brain activity during encoding

    PubMed Central

    Saverino, Cristina; Fatima, Zainab; Sarraf, Saman; Oder, Anita; Strother, Stephen C.; Grady, Cheryl L.

    2016-01-01

    Human aging is characterized by reductions in the ability to remember associations between items, despite intact memory for single items. Older adults also show less selectivity in task-related brain activity, such that patterns of activation become less distinct across multiple experimental tasks. This reduced selectivity, or dedifferentiation, has been found for episodic memory, which is often reduced in older adults, but not for semantic memory, which is maintained with age. We used functional magnetic resonance imaging (fMRI) to investigate whether there is a specific reduction in selectivity of brain activity during associative encoding in older adults, but not during item encoding, and whether this reduction predicts associative memory performance. Healthy young and older adults were scanned while performing an incidental-encoding task for pictures of objects and houses under item or associative instructions. An old/new recognition test was administered outside the scanner. We used agnostic canonical variates analysis and split-half resampling to detect whole brain patterns of activation that predicted item vs. associative encoding for stimuli that were later correctly recognized. Older adults had poorer memory for associations than did younger adults, whereas item memory was comparable across groups. Associative encoding trials, but not item encoding trials, were predicted less successfully in older compared to young adults, indicating less distinct patterns of associative-related activity in the older group. Importantly, higher probability of predicting associative encoding trials was related to better associative memory after accounting for age and performance on a battery of neuropsychological tests. These results provide evidence that neural distinctiveness at encoding supports associative memory and that a specific reduction of selectivity in neural recruitment underlies age differences in associative memory. PMID:27082043

  12. Development of the PROMIS coping expectancies of smoking item banks.

    PubMed

    Shadel, William G; Edelen, Maria Orlando; Tucker, Joan S; Stucky, Brian D; Hansen, Mark; Cai, Li

    2014-09-01

    Smoking is a coping strategy for many smokers who then have difficulty finding new ways to cope with negative affect when they quit. This paper describes analyses conducted to develop and evaluate item banks for assessing the coping expectancies of smoking for daily and nondaily smokers. Using data from a large sample of daily (N = 4,201) and nondaily (N = 1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning (DIF) analyses (according to gender, age, and ethnicity) to arrive at a unidimensional set of items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) for assessing coping expectancies of smoking. For both daily and nondaily smokers, the unidimensional Coping Expectancies item banks (21 items) are relatively DIF free and are highly reliable (0.96 and 0.97, respectively). A common 4-item SF for daily and nondaily smokers also showed good reliability (0.85). Adaptive tests required an average of 4.3 and 3.7 items for simulated daily and nondaily respondents, respectively, and achieved reliabilities of 0.91 for both when the maximum test length was 10 items. This research provides a new set of items that can be used to reliably assess coping expectancies of smoking, through a SF, CAT, or a tailored set selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  13. Development and Application of Methods for Estimating Operating Characteristics of Discrete Test Item Responses without Assuming any Mathematical Form.

    ERIC Educational Resources Information Center

    Samejima, Fumiko

    In latent trait theory the latent space, or space of the hypothetical construct, is usually represented by some unidimensional or multi-dimensional continuum of real numbers. Like the latent space, the item response can either be treated as a discrete variable or as a continuous variable. Latent trait theory relates the item response to the latent…

  14. INTRODUCTION TO PATIENT-REPORTED OUTCOME ITEM BANKS: ISSUES IN MINORITY AGING RESEARCH

    PubMed Central

    Templin, Thomas N; Hays, Ron D; Gershon, Richard C; Rothrock, Nan; Jones, Richard N; Teresi, Jeanne A; Stewart, Anita; Weech-Maldonado, Robert; Wallace, Steve

    2014-01-01

    In 2004 NIH awarded contracts to initiate the development of high quality psychological and neuropsychological outcome measures for improved assessment of health-related outcomes. The workshop introduced these measurement development initiatives, the measures created, and the NIH supported resource (Assessment Center) for internet or tablet-based test administration and scoring. Presentation covered: (a) item response theory (IRT) and assessment of test bias, (b) construction of item banks and computerized adaptive testing, and (c) the different ways in which qualitative analyses contribute to the definition of construct domains and the refinement of outcome constructs. The panel discussion included questions about representativeness of samples, and assessment of cultural bias. PMID:23570428

  15. An approach for estimating item sensitivity to within-person change over time: An illustration using the Alzheimer's Disease Assessment Scale-Cognitive subscale (ADAS-Cog).

    PubMed

    Dowling, N Maritza; Bolt, Daniel M; Deng, Sien

    2016-12-01

    When assessments are primarily used to measure change over time, it is important to evaluate items according to their sensitivity to change, specifically. Items that demonstrate good sensitivity to between-person differences at baseline may not show good sensitivity to change over time, and vice versa. In this study, we applied a longitudinal factor model of change to a widely used cognitive test designed to assess global cognitive status in dementia, and contrasted the relative sensitivity of items to change. Statistically nested models were estimated introducing distinct latent factors related to initial status differences between test-takers and within-person latent change across successive time points of measurement. Models were estimated using all available longitudinal item-level data from the Alzheimer's Disease Assessment Scale-Cognitive subscale, including participants representing the full-spectrum of disease status who were enrolled in the multisite Alzheimer's Disease Neuroimaging Initiative. Five of the 13 Alzheimer's Disease Assessment Scale-Cognitive items demonstrated noticeably higher loadings with respect to sensitivity to change. Attending to performance change on only these 5 items yielded a clearer picture of cognitive decline more consistent with theoretical expectations in comparison to the full 13-item scale. Items that show good psychometric properties in cross-sectional studies are not necessarily the best items at measuring change over time, such as cognitive decline. Applications of the methodological approach described and illustrated in this study can advance our understanding regarding the types of items that best detect fine-grained early pathological changes in cognition. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  16. Developing a model of competence in the operating theatre: psychometric validation of the perceived perioperative competence scale-revised.

    PubMed

    Gillespie, Brigid M; Polit, Denise F; Hamlin, Lois; Chaboyer, Wendy

    2012-01-01

    This paper describes the development and validation of the Revised Perioperative Competence Scale (PPCS-R). There is a lack of a psychometrically tested sound self-assessment tools to measure nurses' perceived competence in the operating room. Content validity was established by a panel of international experts and the original 98-item scale was pilot tested with 345 nurses in Queensland, Australia. Following the removal of several items, a national sample that included all 3209 nurses who were members of the Australian College of Operating Room Nurses was surveyed using the 94-item version. Psychometric testing assessed content validity using exploratory factor analysis, internal consistency using Cronbach's alpha, and construct validity using the "known groups" technique. During item reduction, several preliminary factor analyses were performed on two random halves of the sample (n=550). Usable data for psychometric assessment were obtained from 1122 nurses. The original 94-item scale was reduced to 40 items. The final factor analysis using the entire sample resulted in a 40 item six-factor solution. Cronbach's alpha for the 40-item scale was .96. Construct validation demonstrated significant differences (p<.0001) in perceived competence scores relative to years of operating room experience and receipt of specialty education. On the basis of these results, the psychometric properties of the PPCS-R were considered encouraging. Further testing of the tool in different samples of operating room nurses is necessary to enable cross-cultural comparisons. Copyright © 2011 Elsevier Ltd. All rights reserved.

  17. Reliability of self-rated tinnitus distress and association with psychological symptom patterns.

    PubMed

    Hiller, W; Goebel, G; Rief, W

    1994-05-01

    Psychological complaints were investigated in two samples of 60 and 138 in-patients suffering from chronic tinnitus. We administered the Tinnitus Questionnaire (TQ), a 52-item self-rating scale which differentiates between dimensions of emotional and cognitive distress, intrusiveness, auditory perceptual difficulties, sleep disturbances and somatic complaints. The test-retest reliability was .94 for the TQ global score and between .86 and .93 for subscales. Three independent analyses were conducted to estimate the split-half reliability (internal consistency) which was only slightly lower than the test-retest values for scales with a relatively small number of items. Reliability was sufficient also on the level of single items. Low correlation between the TQ and the Hopkins Symptom Checklist (SCL-90-R) indicate a distinct quality of tinnitus-related and general psychological disturbances.

  18. Development and Validation of a Novel Generic Health-related Quality of Life Instrument With 20 Items (HINT-20).

    PubMed

    Jo, Min-Woo; Lee, Hyeon-Jeong; Kim, Soo Young; Kim, Seon-Ha; Chang, Hyejung; Ahn, Jeonghoon; Ock, Minsu

    2017-01-01

    Few attempts have been made to develop a generic health-related quality of life (HRQoL) instrument and to examine its validity and reliability in Korea. We aimed to do this in our present study. After a literature review of existing generic HRQoL instruments, a focus group discussion, in-depth interviews, and expert consultations, we selected 30 tentative items for a new HRQoL measure. These items were evaluated by assessing their ceiling effects, difficulty, and redundancy in the first survey. To validate the HRQoL instrument that was developed, known-groups validity and convergent/discriminant validity were evaluated and its test-retest reliability was examined in the second survey. Of the 30 items originally assessed for the HRQoL instrument, four were excluded due to high ceiling effects and six were removed due to redundancy. We ultimately developed a HRQoL instrument with a reduced number of 20 items, known as the Health-related Quality of Life Instrument with 20 items (HINT-20), incorporating physical, mental, social, and positive health dimensions. The results of the HINT-20 for known-groups validity were poorer in women, the elderly, and those with a low income. For convergent/discriminant validity, the correlation coefficients of items (except vitality) in the physical health dimension with the physical component summary of the Short Form 36 version 2 (SF-36v2) were generally higher than the correlations of those items with the mental component summary of the SF-36v2, and vice versa. Regarding test-retest reliability, the intraclass correlation coefficient of the total HINT-20 score was 0.813 (p<0.001). A novel generic HRQoL instrument, the HINT-20, was developed for the Korean general population and showed acceptable validity and reliability.

  19. Preliminary development of an ultrabrief two-item bedside test for delirium.

    PubMed

    Fick, Donna M; Inouye, Sharon K; Guess, Jamey; Ngo, Long H; Jones, Richard N; Saczynski, Jane S; Marcantonio, Edward R

    2015-10-01

    Delirium is common, morbid, and costly, yet is greatly under-recognized among hospitalized older adults. To identify the best single and pair of mental status test items that predict the presence of delirium. Diagnostic test evaluation study that enrolled medicine inpatients aged 75 years or older at an academic medical center. Patients underwent a clinical reference standard assessment involving a patient interview, medical record review, and interviews with family members and nurses to determine the presence or absence of Diagnostic and Statistical Manual of Mental Disorders, 4th Edition defined delirium. Participants also underwent the three-dimensional Confusion Assessment Method (3D-CAM), a brief, validated assessment for delirium. Individual items and pairs of items from the 3D-CAM were evaluated to determine sensitivity and specificity relative to the reference standard delirium diagnosis. Of the 201 participants (mean age 84 years, 62% female), 42 (21%) had delirium based on the clinical reference standard. The single item with the best test characteristics was "months of the year backwards" with a sensitivity of 83% (95% confidence interval [CI]: 69%-93%) and specificity of 69% (95% CI: 61%-76%). The best 2-item screen was the combination of "months of the year backwards" and "what is the day of the week?" with a sensitivity of 93% (95% CI: 81%-99%) and specificity of 64% (95% CI: 56%-70%). We identified a single item with >80% and pair of items with >90% sensitivity for delirium. If validated prospectively, these items will serve as an initial innovative screening step for delirium identification in hospitalized older adults. © 2015 Society of Hospital Medicine.

  20. [Instrument to measure adherence in hypertensive patients: contribution of Item Response Theory].

    PubMed

    Rodrigues, Malvina Thaís Pacheco; Moreira, Thereza Maria Magalhaes; Vasconcelos, Alexandre Meira de; Andrade, Dalton Francisco de; Silva, Daniele Braz da; Barbetta, Pedro Alberto

    2013-06-01

    To analyze, by means of "Item Response Theory", an instrument to measure adherence to t treatment for hypertension. Analytical study with 406 hypertensive patients with associated complications seen in primary care in Fortaleza, CE, Northeastern Brazil, 2011 using "Item Response Theory". The stages were: dimensionality test, calibrating the items, processing data and creating a scale, analyzed using the gradual response model. A study of the dimensionality of the instrument was conducted by analyzing the polychoric correlation matrix and factor analysis of complete information. Multilog software was used to calibrate items and estimate the scores. Items relating to drug therapy are the most directly related to adherence while those relating to drug-free therapy need to be reworked because they have less psychometric information and low discrimination. The independence of items, the small number of levels in the scale and low explained variance in the adjustment of the models show the main weaknesses of the instrument analyzed. The "Item Response Theory" proved to be a relevant analysis technique because it evaluated respondents for adherence to treatment for hypertension, the level of difficulty of the items and their ability to discriminate between individuals with different levels of adherence, which generates a greater amount of information. The instrument analyzed is limited in measuring adherence to hypertension treatment, by analyzing the "Item Response Theory" of the item, and needs adjustment. The proper formulation of the items is important in order to accurately measure the desired latent trait.

  1. The relative price of healthy and less healthy foods available in Australian school canteens.

    PubMed

    Billich, Natassja; Adderley, Marijke; Ford, Laura; Keeton, Isabel; Palermo, Claire; Peeters, Anna; Woods, Julie; Backholer, Kathryn

    2018-04-12

    School canteens have an important role in modelling a healthy food environment. Price is a strong predictor of food and beverage choice. This study compared the relative price of healthy and less healthy lunch and snack items sold within Australian school canteens. A convenience sample of online canteen menus from five Australian states were selected (100 primary and 100 secondary schools). State-specific canteen guidelines were used to classify menu items into 'green' (eat most), 'amber' (select carefully) and 'red' (not recommended in schools). The price of the cheapest 'healthy' lunch (vegetable-based 'green') and snack ('green' fruit) item was compared to the cheapest 'less healthy' ('amber/red') lunch and snack item, respectively, using an un-paired t-test. The relative price of the 'healthy' items and the 'less healthy' items was calculated to determine the proportion of schools that sold the 'less healthy' item cheaper. The mean cost of the 'healthy' lunch items was greater than the 'less healthy' lunch items for both primary (AUD $0.70 greater) and secondary schools ($0.50 greater; p < 0.01). For 75% of primary and 57% of secondary schools, the selected 'less healthy' lunch item was cheaper than the 'healthy' lunch item. For 41% of primary and 48% of secondary schools, the selected 'less healthy' snack was cheaper than the 'healthy' snack. These proportions were greatest for primary schools located in more, compared to less, disadvantaged areas. The relative price of foods sold within Australian school canteens appears to favour less healthy foods. School canteen healthy food policies should consider the price of foods sold.

  2. Modulation of the electrophysiological correlates of retrieval cue processing by the specificity of task demands.

    PubMed

    Johnson, Jeffrey D; Rugg, Michael D

    2006-02-03

    Retrieval orientation refers to the differential processing of retrieval cues according to the type of information sought from memory (e.g., words vs. pictures). In the present study, event-related potentials (ERPs) were employed to investigate whether the neural correlates of differential retrieval orientations are sensitive to the specificity of the retrieval demands of the test task. In separate study-test phases, subjects encoded lists of intermixed words and pictures, and then undertook one of two retrieval tests, in both of which the retrieval cues were exclusively words. In the recognition test, subjects performed 'old/new' discriminations on the test items, and old items corresponded to only one class of studied material (words or pictures). In the exclusion test, old items corresponded to both classes of study material, and subjects were required to respond 'old' only to test items corresponding to a designated class of material. Thus, demands for retrieval specificity were greater in the exclusion test than during recognition. ERPs elicited by correctly classified new items in the two types of test were contrasted according to whether words or pictures were the sought-for material. Material-dependent ERP effects were evident in both tests, but the effects onset earlier and offset later in the exclusion test. The findings suggest that differential processing of retrieval cues, and hence the adoption of differential retrieval orientations, varies according to the specificity of the retrieval goal.

  3. Validity and reliability of a Malay version of the brief illness perception questionnaire for patients with type 2 diabetes mellitus.

    PubMed

    Chew, Boon-How; Vos, Rimke C; Heijmans, Monique; Shariff-Ghazali, Sazlina; Fernandez, Aaron; Rutten, Guy E H M

    2017-08-03

    Illness perceptions involve the personal beliefs that patients have about their illness and may influence health behaviours considerably. Since an instrument to measure these perceptions for Malay population in Malaysia is lacking, we translated and examined the psychometric properties of the Malay version of the Brief Illness Perception Questionnaire (MBIPQ) in adult patients with type 2 diabetes mellitus. The MBIPQ has nine items, all use a 0-10 response scale, except the ninth item about causal factors, which is an open-ended item. A standard procedure was used to translate and adapt the English BIPQ into Malay language. Construct validity was examined comparing item scores and scores on the Diabetes Management Self-Efficacy Scale, the Morisky Medication Adherence Scale, the World Health Organization Quality of Life-brief, the 9-item Patient Health Questionnaire, the 17-item Diabetes Distress Scale, HbA1c and the presence of complications. In addition, 2-week and 4-week test-retest reliability were studied. A total of 312 patients completed the MBIPQ. Out of this, 97 and 215 patients completed the 2- or 4-weeks test-retest reliability questionnaire, respectively. Moderate inter-items correlations were observed between illness perception dimensions (r = -0.31 to 0.53). MBIPQ items showed the expected correlations with self-efficacy (r = 0.35), medication adherence (r = 0.29), quality of life (r = -0.17 to 0.31) and depressive symptoms (r = -0.18 to 0.21). People with severe diabetes-related distress also were more concern (t-test = 4.01, p < 0.001) and experienced lower personal control (t-test = 2.07, p = 0.031). People with any diabetes-related complication perceived the consequences as more serious (t-test = 2.04, p = 0.044). The 2-week and 4-week test-retest reliabilities varied between ICC agreement 0.39 to 0.70 and 0.58 to 0.78, respectively. The psychometric properties of items in the MBIPQ are moderate. The MBIPQ showed good cross-cultural validity and moderate construct validity. Test-retest reliability was moderate. Despite the moderate psychometric properties, the MBIPQ may be useful in clinical practice as it is a useful instrument to elicit and communicate on patient's personal thoughts and feelings. Future research is needed to establish its responsiveness and predictive validity. ClinicalTrials.gov NCT02730754 registered on March 29, 2016; NCT02730078 registered on March 29, 2016.

  4. Development and psychometric testing of the Nurse Practitioner Primary Care Organizational Climate Questionnaire.

    PubMed

    Poghosyan, Lusine; Nannini, Angela; Finkelstein, Stacey R; Mason, Emanuel; Shaffer, Jonathan A

    2013-01-01

    Policy makers and healthcare organizations are calling for expansion of the nurse practitioner (NP) workforce in primary care settings to assure timely access and high-quality care for the American public. However, many barriers, including those at the organizational level, exist that may undermine NP workforce expansion and their optimal utilization in primary care. This study developed a new NP-specific survey instrument, Nurse Practitioner Primary Care Organizational Climate Questionnaire (NP-PCOCQ), to measure organizational climate in primary care settings and conducted its psychometric testing. Using instrument development design, the organizational climate domain pertinent for primary care NPs was identified. Items were generated from the evidence and qualitative data. Face and content validity were established through two expert meetings. Content validity index was computed. The 86-item pool was reduced to 55 items, which was pilot tested with 81 NPs using mailed surveys and then field-tested with 278 NPs in New York State. SPSS 18 and Mplus software were used for item analysis, reliability testing, and maximum likelihood exploratory factor analysis. Nurse Practitioner Primary Care Organizational Climate Questionnaire had face and content validity. The content validity index was .90. Twenty-nine items loaded on four subscale factors: professional visibility, NP-administration relations, NP-physician relations, and independent practice and support. The subscales had high internal consistency reliability. Cronbach's alphas ranged from.87 to .95. Having a strong instrument is important to promote future research. Also, administrators can use it to assess organizational climate in their clinics and propose interventions to improve it, thus promoting NP practice and the expansion of NP workforce.

  5. Development of the Attributed Dignity Scale.

    PubMed

    Jacelon, Cynthia S; Dixon, Jane; Knafl, Kathleen A

    2009-07-01

    A sequential, multi-method approach to instrument development beginning with concept analysis, followed by (a) item generation from qualitative data, (b) review of items by expert and lay person panels, (c) cognitive appraisal interviews, (d) pilot testing, and (e) evaluating construct validity was used to develop a measure of attributed dignity in older adults. The resulting positively scored, 23-item scale has three dimensions: Self-Value, Behavioral Respect-Self, and Behavioral Respect-Others. Item-total correlations in the pilot study ranged from 0.39 to 0.85. Correlations between the Attributed Dignity Scale (ADS) and both Rosenberg's Self-Esteem Scale (0.17) and Crowne and Marlowe's Social Desirability Scale (0.36) were modest and in the expected direction, indicating attributed dignity is a related but independent concept. Next steps include testing the ADS with a larger sample to complete factor analysis, test-retest stability, and further study of the relationships between attributed dignity and other concepts.

  6. Relative Validity and Reliability of a 1-Week, Semiquantitative Food Frequency Questionnaire for Women Participating in the Supplemental Nutrition Assistance Program.

    PubMed

    Sanjeevi, Namrata; Freeland-Graves, Jeanne; George, Goldy Chacko

    2017-12-01

    The Supplemental Nutrition Assistance Program (SNAP) plays a critical role in reducing food insecurity by distribution of benefits at a monthly interval to participants. Households that receive assistance from SNAP spend at least three-quarters of benefits within the first 2 weeks of receipt. Because this expenditure pattern may be associated with lower food intake toward the end of the month, it is important to develop a tool that can assess the weekly diets of SNAP participants. The goal of this study was to develop and assess the relative validity and reliability of a semiquantitative 1-week food frequency questionnaire (FFQ) tailored to a population of women participating in SNAP. The FFQ was derived from an existing 195-item FFQ that was based on a reference period of 1 month. This 195-item FFQ has been validated in a population of low-income postpartum women who were recruited from central Texas during 2004. Mean daily servings of each food item in the 195-item FFQ completed by women who took part in the 2004 validation study were calculated to determine the most frequently consumed food items. Emphasis on these items led to the creation of a shorter, 1-week FFQ of only 95 items. This new 1-week instrument was compared with 3-day diet records to evaluate relative validity in a sample of women participating in SNAP. For reliability, the FFQ was administered a second time, separated by a 1-month time interval. The validity study included 70 female SNAP participants who were recruited from the partner agencies of the Central Texas Food Bank from March to June 2015. A subsample of 40 women participated in the reliability study. Outcome measures were mean nutrient intake values obtained from the two tests of the 95-item FFQ and 3-day diet records. Deattenuated Pearson correlation coefficients examined relationships in nutrient intake between the 95-item FFQ and 3-day diet records, and a paired samples t test determined differences in mean nutrient intake. Weighted Cohen's κ indicated agreement in quartile classification of study participants by the 95-item FFQ and 3-day diet records, according to nutrient intake. Test-retest reliability was assessed by intraclass correlations and weighted Cohen's κ. Mean deattenuated Pearson correlation between the FFQ and 3-day diet records was 0.61, and the weighted Cohen's κ=0.39. Finally, the average test-retest correlation and weighted Cohen's κ of the FFQ was 0.66 and 0.50, respectively. These results suggest that the 1-week, 95-item FFQ demonstrated acceptable relative validity and reliability in low-income women participating in SNAP in southwestern United States. Copyright © 2017 Academy of Nutrition and Dietetics. Published by Elsevier Inc. All rights reserved.

  7. Physical performance testing in mucopolysaccharidosis I: a pilot study.

    PubMed

    Dumas, Helene M; Fragala, Maria A; Haley, Stephen M; Skrinar, Alison M; Wraith, James E; Cox, Gerald F

    2004-01-01

    To develop and field-test a physical performance measure (MPS-PPM) for individuals with Mucopolysaccharidosis I (MPS I), a rare genetic disorder. Motor performance and endurance items were developed based on literature review, clinician feedback, feasibility, and equipment and training needs. A standardized testing protocol and scoring rules were created. The MPS-PPM includes: Arm Function (7 items), Leg Function (5 items), and Endurance (2 items). Pilot data were collected for 10 subjects (ages 5-29 years). We calculated Spearman's rho correlations between age, severity and summary z-scores on the MPS-PPM. Subjects had variable presentations, as correlations among the three sub-test scores were not significant. Increasing age was related to greater severity in physical performance (r = 0.72, p<0.05) and lower scores on the Leg Function (r = -0.67, p<0.05) and Endurance (r = -0.65, p<0.05) sub-tests. The MPS-PPM was sensitive to detecting physical performance deficits, as six subjects could not complete the full battery of Arm Function items and eight subjects were unable to complete all Leg Function items. Subjects walked more slowly and expended more energy than typically developing peers. Individuals with MPS I have difficulty with arm and leg function and reduced endurance. The MPS-PPM is a clinically feasible measure that detects limitations in physical performance and may have potential to quantify changes in function following intervention. Copyright 2004 Taylor and Francis Ltd.

  8. Item response theory - A first approach

    NASA Astrophysics Data System (ADS)

    Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar

    2017-07-01

    The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.

  9. Three approaches to investigating the multidimensional nature of a science assessment

    NASA Astrophysics Data System (ADS)

    Gokiert, Rebecca Jayne

    The purpose of this study was to investigate a multi-method approach for collecting validity evidence about the underlying knowledge and skills measured by a large-scale science assessment. The three approaches included analysis of dimensionality, differential item functioning (DIF), and think-aloud interviews. The specific research questions addressed were: (1) Does the 4-factor model previously found by Hamilton et al. (1995) for the grade 8 sample explain the data? (2) Do the performances of male and female students systematically differ? Are these performance differences captured in the dimensions? (3) Can think-aloud reports aid in the generation of hypotheses about the underlying knowledge and skills that are measured by this test? A confirmatory factor analysis of the 4-factor model revealed good model data fit for both the AB and AC tests. Twenty-four of the 83 AB test items and 16 of the 77 AC test items displayed significant DIF, however, items were found, on average, to favour both males and females equally. There were some systematic differences found across the 4-factors; items favouring males tended to be related to earth and space sciences, stereotypical male related activities, and numerical operations. Conversely, females were found to outperform males on items that required careful reading and attention to detail. Concurrent and retrospective verbal reports (Ericsson & Simon, 1993) were collected from 16 grade 8 students (9 male and 7 female) while they solved 12 DIF items. Four general cognitive processing themes were identified from the student protocols that could be used to explain male and female problem solving. The themes included comprehension (verbal and visual), visualization, background knowledge/experience (school or life), and strategy use. There were systematic differences in cognitive processing between the students that answered the items correctly and the students who answered the items incorrectly; however, this did not always correspond with the statistical gender DIF results. Although the multifaceted approach produced interpretable and meaningful validity evidence about the knowledge and skills, these forms of validity evidence only begin to provide a basic understanding of the underlying construct(s) that are being measured.

  10. An investigation of the measurement properties of the Spot-the-Word test in a community sample.

    PubMed

    Mackinnon, Andrew; Christensen, Helen

    2007-12-01

    Intellectual ability is assessed with the Spot-the-Word (STW) test (A. Baddeley, H. Emslie, & I. Nimmo Smith, 1993) by asking respondents to identify a word in a word-nonword item pair. Results in moderate-sized samples suggest this ability is resistant to decline due to dementia. The authors used a 3-parameter item response theory model to investigate the measurement properties of the STW in a large community-dwelling sample (n=2,480) 60 to 64 years of age. A number of poorly performing items were identified. Substantial guessing was present; however, the number of words correctly identified was found to be an accurate index of ability. Performance was moderately related to a number of tests of cognitive performance and was effectively unrelated to visual acuity and to physical or mental health status. The STW is a promising test of ability that, in the future, may be refined by the deletion or replacement of poorly functioning items.

  11. Probing the Relative Importance of Different Attributes in L2 Reading and Listening Comprehension Items: An Application of Cognitive Diagnostic Models

    ERIC Educational Resources Information Center

    Yi, Yeon-Sook

    2017-01-01

    The present study examines the relative importance of attributes within and across items by applying four cognitive diagnostic assessment models. The current study utilizes the function of the models that can indicate inter-attribute relationships that reflect the response behaviors of examinees to analyze scored test-taker responses to four forms…

  12. Gist-Based Memory for Prices and ‘Better Buys’ in Younger and Older Adults

    PubMed Central

    Flores, Cynthia C.; Hargis, Mary B.; McGillivray, Shannon; Friedman, Michael C.; Castel, Alan D.

    2016-01-01

    Aging typically leads to various memory deficits which results in older adults’ tendency to remember more general information and rely on gist memory. The current study examined if younger and older adults could remember which of two comparable grocery items (e.g., two similar but different jams) was paired with a lower price (the “better buy”). Participants studied lists of grocery items and their prices, in which the two items in each category were presented consecutively (Experiment 1), or separated by intervening items (Experiment 2). At test, participants were asked to identify the “better buy” and recall the price of both items. There were negligible age-related differences for the “better buy” in Experiment 1, but age-related differences were present in Experiment 2 when there were greater memory demands involved in comparing the two items. Together, these findings suggest that when price information of two items can be evaluated and compared within a short period of time, older adults can form stable gist-based memory for prices, but that this is impaired with longer delays. We relate the findings to age-related changes in the use of gist and verbatim memory when remembering prices, as well as the associative deficit account of cognitive aging. PMID:27310613

  13. Gist-based memory for prices and "better buys" in younger and older adults.

    PubMed

    Flores, Cynthia C; Hargis, Mary B; McGillivray, Shannon; Friedman, Michael C; Castel, Alan D

    2017-04-01

    Ageing typically leads to various memory deficits which results in older adults' tendency to remember more general information and rely on gist memory. The current study examined if younger and older adults could remember which of two comparable grocery items (e.g., two similar but different jams) was paired with a lower price (the "better buy"). Participants studied lists of grocery items and their prices, in which the two items in each category were presented consecutively (Experiment 1), or separated by intervening items (Experiment 2). At test, participants were asked to identify the "better buy" and recall the price of both items. There were negligible age-related differences for the "better buy" in Experiment 1, but age-related differences were present in Experiment 2 when there were greater memory demands involved in comparing the two items. Together, these findings suggest that when price information of two items can be evaluated and compared within a short period of time, older adults can form stable gist-based memory for prices, but that this is impaired with longer delays. We relate the findings to age-related changes in the use of gist and verbatim memory when remembering prices, as well as the associative deficit account of cognitive ageing.

  14. Does the Cognitive Reflection Test actually capture heuristic versus analytic reasoning styles in older adults?

    PubMed

    Hertzog, Christopher; Smith, R Marit; Ariel, Robert

    2018-01-01

    Background/Study Context: This study evaluated adult age differences in the original three-item Cognitive Reflection Test (CRT; Frederick, 2005, The Journal of Economic Perspectives, 19, 25-42) and an expanded seven-item version of that test (Toplak et al., 2013, Thinking and Reasoning, 20, 147-168). The CRT is a numerical problem-solving test thought to capture a disposition towards either rapid, intuition-based problem solving (Type I reasoning) or a more thoughtful, analytical problem-solving approach (Type II reasoning). Test items are designed to induce heuristically guided errors that can be avoided if using an appropriate numerical representation of the test problems. We evaluated differences between young adults and old adults in CRT performance and correlates of CRT performance. Older adults (ages 60 to 80) were paid volunteers who participated in experiments assessing age differences in self-regulated learning. Young adults (ages 17 to 35) were students participating for pay as part of a project assessing measures of critical thinking skills or as a young comparison group in the self-regulated learning study. There were age differences in the number of CRT correct responses in two independent samples. Results with the original three-item CRT found older adults to have a greater relative proportion of errors based on providing the intuitive lure. However, younger adults actually had a greater proportion of intuitive errors on the long version of the CRT, relative to older adults. Item analysis indicated a much lower internal consistency of CRT items for older adults. These outcomes do not offer full support for the argument that older adults are higher in the use of a "Type I" cognitive style. The evidence was also consistent with an alternative hypothesis that age differences were due to lower levels of numeracy in the older samples. Alternative process-oriented evaluations of how older adults solve CRT items will probably be needed to determine conditions under which older adults manifest an increase in the Type I dispositional tendency to opt for superficial, heuristically guided problem representations in numerical problem-solving tasks.

  15. Assessing Patients’ Experiences with Communication Across the Cancer Care Continuum

    PubMed Central

    Mazor, Kathleen M.; Street, Richard L.; Sue, Valerie M.; Williams, Andrew E.; Rabin, Borsika A.; Arora, Neeraj K.

    2016-01-01

    Objective To evaluate the relevance, performance and potential usefulness of the Patient Assessment of cancer Communication Experiences (PACE) items. Methods Items focusing on specific communication goals related to exchanging information, fostering healing relationships, responding to emotions, making decisions, enabling self-management, and managing uncertainty were tested via a retrospective, cross-sectional survey of adults who had been diagnosed with cancer. Analyses examined response frequencies, inter-item correlations, and coefficient alpha. Results A total of 366 adults were included in the analyses. Relatively few selected “Does Not Apply”, suggesting that items tap relevant communication experiences. Ratings of whether specific communication goals were achieved were strongly correlated with overall ratings of communication, suggesting item content reflects important aspects of communication. Coefficient alpha was ≥.90 for each item set, indicating excellent reliability. Variations in the percentage of respondents selecting the most positive response across items suggest results can identify strengths and weaknesses. Conclusion The PACE items tap relevant, important aspects of communication during cancer care, and may be useful to cancer care teams desiring detailed feedback. PMID:26979476

  16. A comparison of Rasch item-fit and Cronbach's alpha item reduction analysis for the development of a Quality of Life scale for children and adolescents.

    PubMed

    Erhart, M; Hagquist, C; Auquier, P; Rajmil, L; Power, M; Ravens-Sieberer, U

    2010-07-01

    This study compares item reduction analysis based on classical test theory (maximizing Cronbach's alpha - approach A), with analysis based on the Rasch Partial Credit Model item-fit (approach B), as applied to children and adolescents' health-related quality of life (HRQoL) items. The reliability and structural, cross-cultural and known-group validity of the measures were examined. Within the European KIDSCREEN project, 3019 children and adolescents (8-18 years) from seven European countries answered 19 HRQoL items of the Physical Well-being dimension of a preliminary KIDSCREEN instrument. The Cronbach's alpha and corrected item total correlation (approach A) were compared with infit mean squares and the Q-index item-fit derived according to a partial credit model (approach B). Cross-cultural differential item functioning (DIF ordinal logistic regression approach), structural validity (confirmatory factor analysis and residual correlation) and relative validity (RV) for socio-demographic and health-related factors were calculated for approaches (A) and (B). Approach (A) led to the retention of 13 items, compared with 11 items with approach (B). The item overlap was 69% for (A) and 78% for (B). The correlation coefficient of the summated ratings was 0.93. The Cronbach's alpha was similar for both versions [0.86 (A); 0.85 (B)]. Both approaches selected some items that are not strictly unidimensional and items displaying DIF. RV ratios favoured (A) with regard to socio-demographic aspects. Approach (B) was superior in RV with regard to health-related aspects. Both types of item reduction analysis should be accompanied by additional analyses. Neither of the two approaches was universally superior with regard to cultural, structural and known-group validity. However, the results support the usability of the Rasch method for developing new HRQoL measures for children and adolescents.

  17. Improved Classification of Mammograms Following Idealized Training

    PubMed Central

    Hornsby, Adam N.; Love, Bradley C.

    2014-01-01

    People often make decisions by stochastically retrieving a small set of relevant memories. This limited retrieval implies that human performance can be improved by training on idealized category distributions (Giguère & Love, 2013). Here, we evaluate whether the benefits of idealized training extend to categorization of real-world stimuli, namely classifying mammograms as normal or tumorous. Participants in the idealized condition were trained exclusively on items that, according to a norming study, were relatively unambiguous. Participants in the actual condition were trained on a representative range of items. Despite being exclusively trained on easy items, idealized-condition participants were more accurate than those in the actual condition when tested on a range of item types. However, idealized participants experienced difficulties when test items were very dissimilar from training cases. The benefits of idealization, attributable to reducing noise arising from cognitive limitations in memory retrieval, suggest ways to improve real-world decision making. PMID:24955325

  18. Improved Classification of Mammograms Following Idealized Training.

    PubMed

    Hornsby, Adam N; Love, Bradley C

    2014-06-01

    People often make decisions by stochastically retrieving a small set of relevant memories. This limited retrieval implies that human performance can be improved by training on idealized category distributions (Giguère & Love, 2013). Here, we evaluate whether the benefits of idealized training extend to categorization of real-world stimuli, namely classifying mammograms as normal or tumorous. Participants in the idealized condition were trained exclusively on items that, according to a norming study, were relatively unambiguous. Participants in the actual condition were trained on a representative range of items. Despite being exclusively trained on easy items, idealized-condition participants were more accurate than those in the actual condition when tested on a range of item types. However, idealized participants experienced difficulties when test items were very dissimilar from training cases. The benefits of idealization, attributable to reducing noise arising from cognitive limitations in memory retrieval, suggest ways to improve real-world decision making.

  19. Mechanical Drawing: Grades 7-12.

    ERIC Educational Resources Information Center

    Instructional Objectives Exchange, Los Angeles, CA.

    Eighty-five behavioral objectives and related evaluation items for mechanical drawing in grades 7 through 12 are presented. Each sample contains the objective, test items, and means for judging the adequacy of the response. The following categories are included: (1) basic drafting skills; (2) beginning lettering; (3) drawing; (4) orthographic…

  20. Quality of Life: An Exploratory Study.

    ERIC Educational Resources Information Center

    Lankhorst, Gustaaf J.

    1989-01-01

    A 12-item list of human abilities/activities was developed to measure quality of life of 9 rheumatoid arthritis adults from 2 aspects: "present condition" and "relative importance" of each item. Pilot testing indicated that importance and present condition represent different aspects. Differences between self-assessments and physicians'…

  1. Commercial grade item (CGI) dedication of generators for nuclear safety related applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Das, R.K.; Hajos, L.G.

    1993-03-01

    The number of nuclear safety related equipment suppliers and the availability of spare and replacement parts designed specifically for nuclear safety related application are shrinking rapidly. These have made it necessary for utilities to apply commercial grade spare and replacement parts in nuclear safety related applications after implementing proper acceptance and dedication process to verify that such items conform with the requirements of their use in nuclear safety related application. The general guidelines for the commercial grade item (CGI) acceptance and dedication are provided in US Nuclear Regulatory Commission (NRC) Generic Letters and Electric Power Research Institute (EPRI) Report NP-5652,more » Guideline for the Utilization of Commercial Grade Items in Nuclear Safety Related Applications. This paper presents an application of these generic guidelines for procurement, acceptance, and dedication of a commercial grade generator for use as a standby generator at Salem Generating Station Units 1 and 2. The paper identifies the critical characteristics of the generator which once verified, will provide reasonable assurance that the generator will perform its intended safety function. The paper also delineates the method of verification of the critical characteristics through tests and provide acceptance criteria for the test results. The methodology presented in this paper may be used as specific guidelines for reliable and cost effective procurement and dedication of commercial grade generators for use as standby generators at nuclear power plants.« less

  2. Selecting Items for Criterion-Referenced Tests.

    ERIC Educational Resources Information Center

    Mellenbergh, Gideon J.; van der Linden, Wim J.

    1982-01-01

    Three item selection methods for criterion-referenced tests are examined: the classical theory of item difficulty and item-test correlation; the latent trait theory of item characteristic curves; and a decision-theoretic approach for optimal item selection. Item contribution to the standardized expected utility of mastery testing is discussed. (CM)

  3. Mixed methods evaluation of a quality improvement and audit tool for nurse-to-nurse bedside clinical handover in ward settings.

    PubMed

    Redley, Bernice; Waugh, Rachael

    2018-04-01

    Nurse bedside handover quality is influenced by complex interactions related to the content, processes used and the work environment. Audit tools are seldom tested in 'real' settings. Examine the reliability, validity and usability of a quality improvement tool for audit of nurse bedside handover. Naturalistic, descriptive, mixed-methods. Six inpatient wards at a single large not-for-profit private health service in Victoria, Australia. Five nurse experts and 104 nurses involved in 199 change-of-shift bedside handovers. A focus group with experts and pilot test were used to examine content and face validity, and usability of the handover audit tool. The tool was examined for inter-rater reliability and usability using observation audits of handovers across six wards. Data were collected in 2013-2014. Two independent observers for 72 audits demonstrated acceptable inter-observer agreement for 27 (77%) items. Reliability was weak for items examining the handover environment. Seventeen items were not observed reflecting gaps in practices. Across 199 observation audits, gaps in nurse bedside handover practice most often related to process and environment, rather than content items. Usability was impacted by high observer burden, familiarity and non-specific illustrative behaviours. The reliability and validity of most items to audit handover content was acceptable. Gaps in practices for process and environment items were identified. Context specific exemplars and reducing the items used at each handover audit can enhance usability. Further research is needed to develop context specific exemplars and undertake additional reliability testing using a wide range of handover settings. CONTRIBUTION OF THE PAPER. Copyright © 2017 Elsevier Inc. All rights reserved.

  4. Assessing the Life Science Knowledge of Students and Teachers Represented by the K–8 National Science Standards

    PubMed Central

    Sadler, Philip M.; Coyle, Harold; Smith, Nancy Cook; Miller, Jaimie; Mintzes, Joel; Tanner, Kimberly; Murray, John

    2013-01-01

    We report on the development of an item test bank and associated instruments based on the National Research Council (NRC) K–8 life sciences content standards. Utilizing hundreds of studies in the science education research literature on student misconceptions, we constructed 476 unique multiple-choice items that measure the degree to which test takers hold either a misconception or an accepted scientific view. Tested nationally with 30,594 students, following their study of life science, and their 353 teachers, these items reveal a range of interesting results, particularly student difficulties in mastering the NRC standards. Teachers also answered test items and demonstrated a high level of subject matter knowledge reflecting the standards of the grade level at which they teach, but exhibiting few misconceptions of their own. In addition, teachers predicted the difficulty of each item for their students and which of the wrong answers would be the most popular. Teachers were found to generally overestimate their own students’ performance and to have a high level of awareness of the particular misconceptions that their students hold on the K–4 standards, but a low level of awareness of misconceptions related to the 5–8 standards. PMID:24006402

  5. Assessing the life science knowledge of students and teachers represented by the K-8 national science standards.

    PubMed

    Sadler, Philip M; Coyle, Harold; Smith, Nancy Cook; Miller, Jaimie; Mintzes, Joel; Tanner, Kimberly; Murray, John

    2013-01-01

    We report on the development of an item test bank and associated instruments based on the National Research Council (NRC) K-8 life sciences content standards. Utilizing hundreds of studies in the science education research literature on student misconceptions, we constructed 476 unique multiple-choice items that measure the degree to which test takers hold either a misconception or an accepted scientific view. Tested nationally with 30,594 students, following their study of life science, and their 353 teachers, these items reveal a range of interesting results, particularly student difficulties in mastering the NRC standards. Teachers also answered test items and demonstrated a high level of subject matter knowledge reflecting the standards of the grade level at which they teach, but exhibiting few misconceptions of their own. In addition, teachers predicted the difficulty of each item for their students and which of the wrong answers would be the most popular. Teachers were found to generally overestimate their own students' performance and to have a high level of awareness of the particular misconceptions that their students hold on the K-4 standards, but a low level of awareness of misconceptions related to the 5-8 standards.

  6. The revised Generalized Expectancy for Success Scale: a validity and reliability study.

    PubMed

    Hale, W D; Fiedler, L R; Cochran, C D

    1992-07-01

    The Generalized Expectancy for Success Scale (GESS; Fibel & Hale, 1978) was revised and assessed for reliability and validity. The revised version was administered to 199 college students along with other conceptually related measures, including the Rosenberg Self-Esteem Scale, the Life Orientation Test, and Rotter's Internal-External Locus of Control Scale. One subsample of students also completed the Eysenck Personality Inventory, while another subsample performed a criterion-related task that involved risk taking. Item analysis yielded 25 items with correlations of .45 or higher with the total score. Results indicated high internal consistency and test-retest reliability.

  7. Self-Stigma of Mental Illness Scale – Short Form: Reliability and Validity

    PubMed Central

    Corrigan, Patrick W.; Michaels, Patrick J.; Vega, Eduardo; Gause, Michael; Watson, Amy C.; Rüsch, Nicolas

    2012-01-01

    The internalization of public stigma by persons with serious mental illnesses may lead to self-stigma, which harms self-esteem, self-efficacy, and empowerment. Previous research has evaluated a hierarchical model that distinguishes among stereotype awareness, agreement, application to self, and harm to self with the 40-item Self-Stigma of Mental Illness Scale (SSMIS). This study addressed SSMIS critiques (too long, contains offensive items that discourages test completion) by strategically omitting half of the original scale’s items. Here we report reliability and validity of the 20-item short form (SSMIS-SF) based on data from three previous studies. Retained items were rated less offensive by a sample of consumers. Results indicated adequate internal consistencies for each subscale. Repeated measures ANOVAs showed subscale means progressively diminished from awareness to harm. In support of its validity, the harm subscale was found to be inversely and significantly related to self-esteem, self-efficacy, empowerment, and hope. After controlling for level of depression, these relationships remained significant with the exception of the relation between empowerment and harm SSMIS-SF subscale. Future research with the SSMIS-SF should evaluate its sensitivity to change and its stability through test-rest reliability. PMID:22578819

  8. [Effect of variation of lemon intake and walking in daily life on various indicators of muscle mass and blood biochemistry in menopausal middle-aged and elderly women].

    PubMed

    Sato, Kimiko; Domoto, Tokio; Hiramitsu, Masanori; Katagiri, Takao; Kato, Yoji; Miyake, Yukiko; Ishihara, Katsuhide; Umei, Namiko; Takigawa, Atsushi; Harada, Toshihide; Aoi, Satomi; Ikeda, Hiromi

    2014-01-01

    We examined the factors considered to change body composition and blood biochemistry indicators in menopausal middle-aged and elderly women. These changes result from exercise by walking as part of their daily activities and lemon consumption by women who live on the small islands of the Seto Inland Sea, Japan's largest citrus fruit (lemon)-producing region. Between September 2011 and March 2012, we recorded the daily lemon consumption and the number of steps taken by 101 middle-aged and elderly female lemon farmers. We also measured their body dimensions, body compositions, and blood pressure pulse wave velocity and conducted blood tests before and after the survey period. The results before and after the survey period were compared by the t-test and associations were determined on the basis of Pearson's correlation coefficient. Covariance structural analysis was carried out to determine causal associations. From the results of covariance structure analysis, lemon intake did not have a direct impact on each item examined. The third item, i.e., "the factors related to arteriosclerosis," was affected indirectly via citric acid and fatigue, and anticoagulation was shown. The fourth item, i.e., "the factors related to maintenance of muscle mass," which is affected by menopausal years and the change in walking speed, was shown to be associated with the second item, i.e., "the factors related to lipid metabolism." Menopausal years affected the first, third and fourth items. Lemon intake did not have a direct impact on each item. Lemon has been shown to indirectly affect the third item through citric acid. Walking affected the second item, the level of total cholesterol, such as HDL cholesterol, through the fourth item. The importance of providing services that lead to sustained physical activity and a well-balanced metabolism between lipids and carbohydrates has been shown.

  9. Cross-cultural validity of the thyroid-specific quality-of-life patient-reported outcome measure, ThyPRO.

    PubMed

    Watt, Torquil; Barbesino, Giuseppe; Bjorner, Jakob Bue; Bonnema, Steen Joop; Bukvic, Branka; Drummond, Russell; Groenvold, Mogens; Hegedüs, Laszlo; Kantzer, Valeska; Lasch, Kathryn E; Marcocci, Claudio; Mishra, Anjali; Netea-Maier, Romana; Ekker, Merel; Paunovic, Ivan; Quinn, Terence J; Rasmussen, Åse Krogh; Russell, Audrey; Sabaretnam, Mayilvaganan; Smit, Johannes; Törring, Ove; Zivaljevic, Vladan; Feldt-Rasmussen, Ulla

    2015-03-01

    Thyroid diseases are common and often affect quality of life (QoL). No cross-culturally validated patient-reported outcome measuring thyroid-related QoL is available. The purpose of the present study was to test the cross-cultural validity of the newly developed thyroid-related patient-reported outcome ThyPRO, using tests for differential item functioning (DIF) according to language version. The ThyPRO consists of 85 items summarized in 13 multi-item scales and one single item. Scales cover physical and mental symptoms, well-being and function as well as social and daily function and cosmetic concerns. Translation applied standard forward-backward methodology with subsequent cognitive interviews and reviews. Responses (N = 1,810) to the ThyPRO were collected in seven countries: UK (n = 166), The Netherlands (n = 147), Serbia (n = 150), Italy (n = 110), India (n = 148), Denmark (n = 902) and Sweden (n = 187). Translated versions were compared pairwise to the English version by examining uniform and nonuniform DIF, i.e., whether patients from different countries respond differently to a particular item, although they have identical level of the concept measured by the item. Analyses were controlled for thyroid diagnosis. DIF was investigated by ordinal logistic regression, testing for both statistical significance and magnitude (ΔR (2) > 0.02). Scale level was estimated by the sum score, after purification. For twelve of the 84 tested items, DIF was identified in more than one language. Eight of these were small, but four were indicative of possible low translatability. Twenty-one instances of DIF in single languages were identified, indicating potential problems with the particular translation. However, only seven were of a magnitude which could affect scale scores, most of which could be explained by sample differences not controlled for. The ThyPRO has good cross-cultural validity with only minor cross-cultural invariance and is recommended for use in international multicenter studies.

  10. Measuring the effects of online health information for patients: Item generation for an e-health impact questionnaire

    PubMed Central

    Kelly, Laura; Jenkinson, Crispin; Ziebland, Sue

    2013-01-01

    Objective The internet is a valuable resource for accessing health information and support. We are developing an instrument to assess the effects of websites with experiential and factual health information. This study aimed to inform an item pool for the proposed questionnaire. Methods Items were informed through a review of relevant literature and secondary qualitative analysis of 99 narrative interviews relating to patient and carer experiences of health. Statements relating to identified themes were re-cast as questionnaire items and shown for review to an expert panel. Cognitive debrief interviews (n = 21) were used to assess items for face and content validity. Results Eighty-two generic items were identified following secondary qualitative analysis and expert review. Cognitive interviewing confirmed the questionnaire instructions, 62 items and the response options were acceptable to patients and carers. Conclusion Using a clear conceptual basis to inform item generation, 62 items have been identified as suitable to undergo further psychometric testing. Practice implications The final questionnaire will initially be used in a randomized controlled trial examining the effects of online patient's experiences. This will inform recommendations on the best way to present patients’ experiences within health information websites. PMID:23598293

  11. Development and initial validation of a computer-administered health literacy assessment in Spanish and English: FLIGHT/VIDAS.

    PubMed

    Ownby, Raymond L; Acevedo, Amarilis; Waldrop-Valverde, Drenna; Jacobs, Robin J; Caballero, Joshua; Davenport, Rosemary; Homs, Ana-Maria; Czaja, Sara J; Loewenstein, David

    2013-01-01

    Current measures of health literacy have been criticized on a number of grounds, including use of a limited range of content, development on small and atypical patient groups, and poor psychometric characteristics. In this paper, we report the development and preliminary validation of a new computer-administered and -scored health literacy measure addressing these limitations. Items in the measure reflect a wide range of content related to health promotion and maintenance as well as care for diseases. The development process has focused on creating a measure that will be useful in both Spanish and English, while not requiring substantial time for clinician training and individual administration and scoring. The items incorporate several formats, including questions based on brief videos, which allow for the assessment of listening comprehension and the skills related to obtaining information on the Internet. In this paper, we report the interim analyses detailing the initial development and pilot testing of the items (phase 1 of the project) in groups of Spanish and English speakers. We then describe phase 2, which included a second round of testing of the items, in new groups of Spanish and English speakers, and evaluation of the new measure's reliability and validity in relation to other measures. Data are presented that show that four scales (general health literacy, numeracy, conceptual knowledge, and listening comprehension), developed through a process of item and factor analyses, have significant relations to existing measures of health literacy.

  12. Rasch Analysis for Binary Data with Nonignorable Nonresponses

    ERIC Educational Resources Information Center

    Bertoli-Barsotti, Lucio; Punzo, Antonio

    2013-01-01

    This paper introduces a two-dimensional Item Response Theory (IRT) model to deal with nonignorable nonresponses in tests with dichotomous items. One dimension provides information about the omitting behavior, while the other dimension is related to the person's "ability". The idea of embedding an IRT model for missingness into the measurement…

  13. Woodworking: Grades 7-12.

    ERIC Educational Resources Information Center

    Instructional Objectives Exchange, Los Angeles, CA.

    The woodworking collection is composed of 55 objectives and related evaluation items for use in grades 7 through 12. Each sample contains the objective, test items, and criteria for judging the adequacy of the response. Woodworking categories being measured include sharpening, adjusting, using and caring for tools; reading a working drawing; stock…

  14. 77 FR 39519 - Records Schedules; Availability and Request for Comments

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-07-03

    ... compiled from various sources to track and monitor the effects of past nuclear tests on Armed Forces personnel. 3. Department of Energy, Federal Energy Regulatory Commission (N1- 138-12-2, 1 temporary item... temporary item). Project files, including working papers relating to product reports. 7. Department of Labor...

  15. The Value of Item Response Theory in Clinical Assessment: A Review

    ERIC Educational Resources Information Center

    Thomas, Michael L.

    2011-01-01

    Item response theory (IRT) and related latent variable models represent modern psychometric theory, the successor to classical test theory in psychological assessment. Although IRT has become prevalent in the measurement of ability and achievement, its contributions to clinical domains have been less extensive. Applications of IRT to clinical…

  16. The value of item response theory in clinical assessment: a review.

    PubMed

    Thomas, Michael L

    2011-09-01

    Item response theory (IRT) and related latent variable models represent modern psychometric theory, the successor to classical test theory in psychological assessment. Although IRT has become prevalent in the measurement of ability and achievement, its contributions to clinical domains have been less extensive. Applications of IRT to clinical assessment are reviewed to appraise its current and potential value. Benefits of IRT include comprehensive analyses and reduction of measurement error, creation of computer adaptive tests, meaningful scaling of latent variables, objective calibration and equating, evaluation of test and item bias, greater accuracy in the assessment of change due to therapeutic intervention, and evaluation of model and person fit. The theory may soon reinvent the manner in which tests are selected, developed, and scored. Although challenges remain to the widespread implementation of IRT, its application to clinical assessment holds great promise. Recommendations for research, test development, and clinical practice are provided.

  17. Evaluation of the Multiple Sclerosis Walking Scale-12 (MSWS-12) in a Dutch sample: Application of item response theory.

    PubMed

    Mokkink, Lidwine Brigitta; Galindo-Garre, Francisca; Uitdehaag, Bernard Mj

    2016-12-01

    The Multiple Sclerosis Walking Scale-12 (MSWS-12) measures walking ability from the patients' perspective. We examined the quality of the MSWS-12 using an item response theory model, the graded response model (GRM). A total of 625 unique Dutch multiple sclerosis (MS) patients were included. After testing for unidimensionality, monotonicity, and absence of local dependence, a GRM was fit and item characteristics were assessed. Differential item functioning (DIF) for the variables gender, age, duration of MS, type of MS and severity of MS, reliability, total test information, and standard error of the trait level (θ) were investigated. Confirmatory factor analysis showed a unidimensional structure of the 12 items of the scale, explaining 88% of the variance. Item 2 did not fit into the GRM model. Reliability was 0.93. Items 8 and 9 (of the 11 and 12 item version respectively) showed DIF on the variable severity, based on the Expanded Disability Status Scale (EDSS). However, the EDSS is strongly related to the content of both items. Our results confirm the good quality of the MSWS-12. The trait level (θ) scores and item parameters of both the 12- and 11-item versions were highly comparable, although we do not suggest to change the content of the MSWS-12. © The Author(s), 2016.

  18. Development and reliability testing of a self-report instrument to measure the office layout as a correlate of occupational sitting.

    PubMed

    Duncan, Mitch J; Rashid, Mahbub; Vandelanotte, Corneel; Cutumisu, Nicoleta; Plotnikoff, Ronald C

    2013-02-04

    Spatial configurations of office environments assessed by Space Syntax methodologies are related to employee movement patterns. These methods require analysis of floors plans which are not readily available in large population-based studies or otherwise unavailable. Therefore a self-report instrument to assess spatial configurations of office environments using four scales was developed. The scales are: local connectivity (16 items), overall connectivity (11 items), visibility of co-workers (10 items), and proximity of co-workers (5 items). A panel cohort (N = 1154) completed an online survey, only data from individuals employed in office-based occupations (n = 307) were used to assess scale measurement properties. To assess test-retest reliability a separate sample of 37 office-based workers completed the survey on two occasions 7.7 (±3.2) days apart. Redundant scale items were eliminated using factor analysis; Chronbach's α was used to evaluate internal consistency and test re-test reliability (retest-ICC). ANOVA was employed to examine differences between office types (Private, Shared, Open) as a measure of construct validity. Generalized Linear Models were used to examine relationships between spatial configuration scales and the duration of and frequency of breaks in occupational sitting. The number of items on all scales were reduced, Chronbach's α and ICCs indicated good scale internal consistency and test re-test reliability: local connectivity (5 items; α = 0.70; retest-ICC = 0.84), overall connectivity (6 items; α = 0.86; retest-ICC = 0.87), visibility of co-workers (4 items; α = 0.78; retest-ICC = 0.86), and proximity of co-workers (3 items; α = 0.85; retest-ICC = 0.70). Significant (p ≤ 0.001) differences, in theoretically expected directions, were observed for all scales between office types, except overall connectivity. Significant associations were observed between all scales and occupational sitting behaviour (p ≤ 0.05). All scales have good measurement properties indicating the instrument may be a useful alternative to Space Syntax to examine environmental correlates of occupational sitting in population surveys.

  19. Development and reliability testing of a self-report instrument to measure the office layout as a correlate of occupational sitting

    PubMed Central

    2013-01-01

    Background Spatial configurations of office environments assessed by Space Syntax methodologies are related to employee movement patterns. These methods require analysis of floors plans which are not readily available in large population-based studies or otherwise unavailable. Therefore a self-report instrument to assess spatial configurations of office environments using four scales was developed. Methods The scales are: local connectivity (16 items), overall connectivity (11 items), visibility of co-workers (10 items), and proximity of co-workers (5 items). A panel cohort (N = 1154) completed an online survey, only data from individuals employed in office-based occupations (n = 307) were used to assess scale measurement properties. To assess test-retest reliability a separate sample of 37 office-based workers completed the survey on two occasions 7.7 (±3.2) days apart. Redundant scale items were eliminated using factor analysis; Chronbach’s α was used to evaluate internal consistency and test re-test reliability (retest-ICC). ANOVA was employed to examine differences between office types (Private, Shared, Open) as a measure of construct validity. Generalized Linear Models were used to examine relationships between spatial configuration scales and the duration of and frequency of breaks in occupational sitting. Results The number of items on all scales were reduced, Chronbach’s α and ICCs indicated good scale internal consistency and test re-test reliability: local connectivity (5 items; α = 0.70; retest-ICC = 0.84), overall connectivity (6 items; α = 0.86; retest-ICC = 0.87), visibility of co-workers (4 items; α = 0.78; retest-ICC = 0.86), and proximity of co-workers (3 items; α = 0.85; retest-ICC = 0.70). Significant (p ≤ 0.001) differences, in theoretically expected directions, were observed for all scales between office types, except overall connectivity. Significant associations were observed between all scales and occupational sitting behaviour (p ≤ 0.05). Conclusion All scales have good measurement properties indicating the instrument may be a useful alternative to Space Syntax to examine environmental correlates of occupational sitting in population surveys. PMID:23379485

  20. Clinical utility of the MMPI-2-RF SUI items and scale in a forensic inpatient setting: Association with interview self-report and future suicidal behaviors.

    PubMed

    Glassmire, David M; Tarescavage, Anthony M; Burchett, Danielle; Martinez, Jennifer; Gomez, Anthony

    2016-11-01

    In this study, we examined whether the 5 Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2-RF; Ben-Porath & Tellegen, 2008/2011) Suicidal/Death Ideation (SUI) items (93, 120, 164, 251, and 334) would provide incremental suicide-risk assessment information after accounting for information garnered from clinical interview questions. Among 229 forensic inpatients (146 men, 83 women) who were administered the MMPI-2-RF, 34.9% endorsed at least 1 SUI item. We found that patients who endorsed SUI items on the MMPI-2-RF concurrently denied conceptually related suicide-risk information during the clinical interview. For instance, 8% of the sample endorsed Item 93 (indicating recent suicidal ideation), yet denied current suicidal ideation upon interview. Conversely, only 2.2% of the sample endorsed current suicidal ideation during the interview, yet denied recent suicidal ideation on Item 93. The SUI scale, as well as the MMPI-2-RF Demoralization (RCd) and Low Positive Emotions (RC2) scales, correlated significantly and meaningfully with conceptually related suicide-risk information from the interview, including history of suicide attempts, history of suicidal ideation, current suicidal ideation, and months since last suicide attempt. We also found that the SUI scale added incremental variance (after accounting for information garnered from the interview and after accounting for scores on RCd and RC2) to predictions of future suicidal behavior within 1 year of testing. Relative risk ratios indicated that both SUI-item endorsement and the presence of interview-reported risk information significantly and meaningfully increased the risk of suicidal behavior in the year following testing, particularly when endorsement of suicidal ideation occurred for both methods of self-report. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  1. Using Rasch Analysis to Evaluate the Reliability and Validity of the Swallowing Quality of Life Questionnaire: An Item Response Theory Approach.

    PubMed

    Cordier, Reinie; Speyer, Renée; Schindler, Antonio; Michou, Emilia; Heijnen, Bas Joris; Baijens, Laura; Karaduman, Ayşe; Swan, Katina; Clavé, Pere; Joosten, Annette Veronica

    2018-02-01

    The Swallowing Quality of Life questionnaire (SWAL-QOL) is widely used clinically and in research to evaluate quality of life related to swallowing difficulties. It has been described as a valid and reliable tool, but was developed and tested using classic test theory. This study describes the reliability and validity of the SWAL-QOL using item response theory (IRT; Rasch analysis). SWAL-QOL data were gathered from 507 participants at risk of oropharyngeal dysphagia (OD) across four European countries. OD was confirmed in 75.7% of participants via videofluoroscopy and/or fiberoptic endoscopic evaluation, or a clinical diagnosis based on meeting selected criteria. Patients with esophageal dysphagia were excluded. Data were analysed using Rasch analysis. Item and person reliability was good for all the items combined. However, person reliability was poor for 8 subscales and item reliability was poor for one subscale. Eight subscales exhibited poor person separation and two exhibited poor item separation. Overall item and person fit statistics were acceptable. However, at an individual item fit level results indicated unpredictable item responses for 28 items, and item redundancy for 10 items. The item-person dimensionality map confirmed these findings. Results from the overall Rasch model fit and Principal Component Analysis were suggestive of a second dimension. For all the items combined, none of the item categories were 'category', 'threshold' or 'step' disordered; however, all subscales demonstrated category disordered functioning. Findings suggest an urgent need to further investigate the underlying structure of the SWAL-QOL and its psychometric characteristics using IRT.

  2. Head and neck cancer-specific quality of life: instrument validation.

    PubMed

    Terrell, J E; Nanavati, K A; Esclamado, R M; Bishop, J K; Bradford, C R; Wolf, G T

    1997-10-01

    The disfigurement and dysfunction associated with head and neck cancer affect emotional well-being and some of the most basic functions of life. Most cancer-specific quality-of-life assessments give a single composite score for head and neck cancer-related quality of life. To develop and evaluate an improved multidimensional instrument to assess head and neck cancer-related functional status and well-being. The item selection process included literature review, interviews with health care workers, and patient surveys. A survey with 37 disease-specific questions and the SF-12 survey were administered to 253 patients in 3 large medical centers. Factor analysis was performed to identify disease-specific domains. Domain scores were calculated as the standardized score of the component items. These domains were assessed for construct validity based on clinical hypotheses and test-retest reliability. Four relevant domains were identified: Eating (6 items), Communication (4 items), Pain (4 items), and Emotion (6 items). Each had an internal consistency (Cronbach alpha value) of greater than 0.80. Construct validity was demonstrated by moderate correlations with the SF-12 Physical and Mental component scores (r=0.43-0.60). Test-retest reliability for each domain demonstrated strong reliability between the 2 time points. Correlations were strong for each individual question, ranging from 0.53 to 0.93. Construct validity testing demonstrated that the direction of differences for each domain were as hypothesized. The Head and Neck Quality of Life questionnaire is a promising multidimensional tool with which to assess head and neck cancer-specific quality of life.

  3. Health-related quality of life questionnaire for polycystic ovary syndrome (PCOSQ-50): development and psychometric properties.

    PubMed

    Nasiri-Amiri, Fatemeh; Ramezani Tehrani, Fahimeh; Simbar, Masoumeh; Montazeri, Ali; Mohammadpour, Reza Ali

    2016-07-01

    The determinants of the health-related quality of life of women with polycystic ovary syndrome are not fully understood. The aim of this study was to develop a comprehensive instrument to assess the health-related quality of life of Iranian women with PCOS and to assess its psychometric properties. We used a mixed-method, sequential, exploratory design including both qualitative [in-depth interview to define the components of health-related quality of life questionnaire (PCOSQ)] and quantitative approaches (to assess the psychometric properties of PCOSQ). A preliminary questionnaire was developed including 147 items which emerged from the qualitative phase of the study. Considering the optimum cutoff points for content validity ratio (CVR), content validity index (CVI), and impact score, items of the preliminary questionnaire were reduced from 147 to 88 items. Finally, by excluding highly correlated items using the exploratory factor analysis, a 50-item questionnaire was obtained. The Kaiser criteria (eigenvalues >1) and Scree plot tests demonstrated that six factors were optimum with an estimated 47.3 % of variance. Assessment of the psychometric properties of the questionnaire demonstrated a mean CVI = 0.92, CVR = 0.91, Cronbach's alpha for whole questionnaire = 0.88 (0.61-0.88 for subscales), Spearman's correlation coefficients of test-retest = 0.75, and the intra-class correlation coefficient for the PCOS questionnaire subscales ranging from 0.57 to 0.88. Eventually the final questionnaire included 50 items in six domains, 'psychosocial and emotional,' 'fertility,' 'sexual function,' 'obesity and menstrual disorders,' 'hirsutism,' and 'coping' and rated on a 5-point Likert scale. The PCOSQ-50 is a valid and reliable instrument for the assessment of quality of life of women with PCOS, capable of assessing some obscure aspects overlooked by previous HRQL questionnaires.

  4. Gambling-Related Cognition Scale (GRCS): Are skills-based games at a disadvantage?

    PubMed

    Lévesque, David; Sévigny, Serge; Giroux, Isabelle; Jacques, Christian

    2017-09-01

    The Gambling-Related Cognition Scale (GRCS; Raylu & Oei, 2004) was developed to evaluate gambling-related cognitive distortions for all types of gamblers, regardless of their gambling activities (poker, slot machine, etc.). It is therefore imperative to ascertain the validity of its interpretation across different types of gamblers; however, some skills-related items endorsed by players could be interpreted as a cognitive distortion despite the fact that they play skills-related games. Using an intergroup (168 poker players and 73 video lottery terminal [VLT] players) differential item functioning (DIF) analysis, this study examined the possible manifestation of item biases associated with the GRCS. DIF was analyzed with ordinal logistic regressions (OLRs) and Ramsay's (1991) nonparametric kernel smoothing approach with TestGraf. Results show that half of the items display at least moderate DIF between groups and, depending on the type of analysis used, 3 to 7 items displayed large DIF. The 5 items with the most DIF were more significantly endorsed by poker players (uniform DIF) and were all related to skills, knowledge, learning, or probabilities. Poker players' interpretations of some skills-related items may lead to an overestimation of their cognitive distortions due to their total score increased by measurement artifact. Findings indicate that the current structure of the GRCS contains potential biases to be considered when poker players are surveyed. The present study conveys new and important information on bias issues to ponder carefully before using and interpreting the GRCS and other similar wide-range instruments with poker players. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  5. Introducing the Body-QoL®: A New Patient-Reported Outcome Instrument for Measuring Body Satisfaction-Related Quality of Life in Aesthetic and Post-bariatric Body Contouring Patients.

    PubMed

    Danilla, Stefan; Cuevas, Pedro; Aedo, Sócrates; Dominguez, Carlos; Jara, Rocío; Calderón, María E; Al-Himdani, Sarah; Rios, Marco A; Taladriz, Cristián; Rodriguez, Diego; Gonzalez, Rolando; Lazo, Ángel; Erazo, Cristián; Benitez, Susana; Andrades, Patricio; Sepúlveda, Sergio

    2016-02-01

    To develop a new patient-reported outcome instrument (PRO) to measure body-related satisfaction quality of life (QoL). Standard 3-phase PRO design was followed; in the first phase, a qualitative design was used in 45 patients to develop a conceptual framework and to create preliminary scale domains and items. In phase 2, large-scale population testing on 1340 subjects was performed to reduce items and domains. In phase 3, final testing of the developed instrument on 34 patients was performed. Statistics used include Factor, RASCH, and multivariate regression analysis. Psychometric properties measured were internal reliability, item-rest, item-test, and test-retest correlations. The PRO-developed instrument is composed of four domains (satisfaction with the abdomen, sex life, self-esteem and social life, and physical symptoms) and 20 items in total. The score can range from 20 (worst) to 100 (best). Responsiveness was 100 %, internal reliability 93.3 %, and test-retest concordance 97.7 %. Body image-related QoL was superior in men than women (p < 0.001) and decreased with increasing age (p = 0.004) and BMI (p < 0.001). Post-bariatric body contouring patients score lower than cosmetic patients in all domains of the Body-QoL instrument (p < 0.001). After surgery, the score improves by on average 21.9 ± 16.9 (effect size 1.8, p < 0.001). Body satisfaction-related QoL can be measured reliably with the Body-QoL instrument. It can be used to quantify the improvement in cosmetic and post-bariatric patients including non- or minimally invasive procedures, suction assisted lipectomy, abdominoplasty, lipoabdominoplasty, and lower body lift and to give an evidence-based approach to standard practice. This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266.

  6. Item Response Theory Modeling of the Philadelphia Naming Test.

    PubMed

    Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D

    2015-06-01

    In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating explanatory variables to item difficulty. This article describes the statistical model underlying the computer adaptive PNT presented in a companion article (Hula, Kellough, & Fergadiotis, 2015). Using archival data, we evaluated the fit of the PNT to 1- and 2-parameter logistic models and examined the precision of the resulting parameter estimates. We regressed the item difficulty estimates on three predictor variables: word length, age of acquisition, and contextual diversity. The 2-parameter logistic model demonstrated marginally better fit, but the fit of the 1-parameter logistic model was adequate. Precision was excellent for both person ability and item difficulty estimates. Word length, age of acquisition, and contextual diversity all independently contributed to variance in item difficulty. Item-response-theory methods can be productively used to analyze and quantify anomia severity in aphasia. Regression of item difficulty on lexical variables supported the validity of the PNT and interpretation of anomia severity scores in the context of current word-finding models.

  7. A measure of satisfaction with food-related life.

    PubMed

    Grunert, Klaus G; Dean, Moira; Raats, Monique M; Nielsen, Niels Asger; Lumbers, Margaret

    2007-09-01

    A measure of satisfaction with food-related life is developed and tested in three studies in eight European countries. Five items are retained from an original pool of seven; these items exhibit good reliability as measured by Cronbach's alpha, good temporal stability, convergent validity with two related measures, and construct validity as indicated by relationships with other indicators of quality of life, including the Satisfaction With Life and the SF-8 scales. It is concluded that this scale will be useful in studies trying to identify factors contributing to satisfaction with food-related life.

  8. What is the connection between true and false memories? The differential roles of interitem associations in recall and recognition.

    PubMed

    McEvoy, C L; Nelson, D L; Komatsu, T

    1999-09-01

    Veridical memory for presented list words and false memory for nonpresented but related items were tested using the Deese/Roediger and McDermott paradigm. The strength and density of preexisting connections among the list words, and from the list words to the critical items, were manipulated. The likelihood of producing false memories in free recall varied with the strength of connections from the list words to the critical items but was inversely related to the density of the interconnections among the list words. In contrast, veridical recall of list words was positively related to the density of the interconnections. A final recognition test showed that both false and veridical memories were more likely when the list words were more densely interconnected. The results are discussed in terms of an associative model of memory, Processing Implicit and Explicit Representations (PIER 2) that describes the influence of implicitly activated preexisting information on memory performance.

  9. The COPD-SIB: a newly developed disease-specific item bank to measure health-related quality of life in patients with chronic obstructive pulmonary disease.

    PubMed

    Paap, Muirne C S; Lenferink, Lonneke I M; Herzog, Nadine; Kroeze, Karel A; van der Palen, Job

    2016-06-27

    Health-related quality of life (HRQoL) is widely used as an outcome measure in the evaluation of treatment interventions in patients with chronic obstructive pulmonary disease (COPD). In order to address challenges associated with existing fixed-length measures (e.g., too long to be used routinely, too short to ensure both content validity and reliability), a COPD-specific item bank (COPD-SIB) was developed. Items were selected based on literature review and interviews with Dutch COPD patients, with a strong focus on both content validity and item comprehension. The psychometric quality of the item bank was evaluated using Mokken Scale Analysis and parametric Item Response Theory, using data of 666 COPD patients. The final item bank contains 46 items that form a strong scale, tapping into eight important themes that were identified based on literature review and patient interviews: Coping with disease/symptoms, adaptability; Autonomy; Anxiety about the course/end-state of the disease, hopelessness; Positive psychological functioning; Situations triggering or enhancing breathing problems; Symptoms; Activity; Impact. The 46-item COPD-SIB has good psychometric properties and content validity. Items are available in Dutch and English. The COPD-SIB can be used as a stand-alone instrument, or to inform computerised adaptive testing.

  10. Sentimental value and its influence on hedonic adaptation.

    PubMed

    Yang, Yang; Galak, Jeff

    2015-11-01

    Sentimental value is a highly prevalent, yet largely understudied phenomenon. We introduce the construct of sentimental value and investigate how and why sentimental value influences hedonic adaptation. Across 7 studies, we examine the antecedents of sentimental value and demonstrate its effect on hedonic adaptation using both naturally occurring and experimentally manipulated items with sentimental value. We further test the underlying process linking sentimental value and hedonic adaptation by showing that whereas feature-related utility decreases for all items with time, sentimental value typically does not, and that sentimental value moderates the influence of the decrement in feature-related utility on hedonic adaptation. Moreover, this moderating effect of sentimental value is driven by a shift in focus from features of the item to the associations that item possess. We conclude with a discussion of related phenomena and implications for individuals. (c) 2015 APA, all rights reserved).

  11. Specifying the role of the left prefrontal cortex in word selection

    PubMed Central

    Ries, S. K; Karzmark, C. R.; Navarrete, E.; Knight, R. T.; Dronkers, N. F.

    2015-01-01

    Word selection allows us to choose words during language production. This is often viewed as a competitive process wherein a lexical representation is retrieved among semantically-related alternatives. The left prefrontal cortex (LPFC) is thought to help overcome competition for word selection through top-down control. However, whether the LPFC is always necessary for word selection remains unclear. We tested 6 LPFC-injured patients and controls in two picture naming paradigms varying in terms of item repetition. Both paradigms elicited the expected semantic interference effects (SIE), reflecting interference caused by semantically-related representations in word selection. However, LPFC patients as a group showed a larger SIE than controls only in the paradigm involving item repetition. We argue that item repetition increases interference caused by semantically-related alternatives, resulting in increased LPFC-dependent cognitive control demands. The remaining network of brain regions associated with word selection appears to be sufficient when items are not repeated. PMID:26291289

  12. Electrophysiological distinctions between recognition memory with and without awareness

    PubMed Central

    Ko, Philip C.; Duda, Bryant; Hussey, Erin P.; Ally, Brandon A.

    2013-01-01

    The influence of implicit memory representations on explicit recognition may help to explain cases of accurate recognition decisions made with high uncertainty. During a recognition task, implicit memory may enhance the fluency of a test item, biasing decision processes to endorse it as “old”. This model may help explain recognition-without-identification, a remarkable phenomenon in which participants make highly accurate recognition decisions despite the inability to identify the test item. The current study investigated whether recognition-without-identification for pictures elicits a similar pattern of neural activity as other types of accurate recognition decisions made with uncertainty. Further, this study also examined whether recognition-without-identification for pictures could be attained by the use of perceptual and conceptual information from memory. To accomplish this, participants studied pictures and then performed a recognition task under difficult viewing conditions while event-related potentials (ERPs) were recorded. Behavioral results showed that recognition was highly accurate even when test items could not be identified, demonstrating recognition-without identification. The behavioral performance also indicated that recognition-without-identification was mediated by both perceptual and conceptual information, independently of one another. The ERP results showed dramatically different memory related activity during the early 300 to 500 ms epoch for identified items that were studied compared to unidentified items that were studied. Similar to previous work highlighting accurate recognition without retrieval awareness, test items that were not identified, but correctly endorsed as “old,” elicited a negative posterior old/new effect (i.e., N300). In contrast, test items that were identified and correctly endorsed as “old,” elicited the classic positive frontal old/new effect (i.e., FN400). Importantly, both of these effects were elicited under conditions when participants used perceptual information to make recognition decisions. Conceptual information elicited very different ERPs than perceptual information, showing that the informational wealth of pictures can evoke multiple routes to recognition even without awareness of memory retrieval. These results are discussed within the context of current theories regarding the N300 and the FN400. PMID:23287567

  13. Unidimensional IRT Item Parameter Estimates across Equivalent Test Forms with Confounding Specifications within Dimensions

    ERIC Educational Resources Information Center

    Matlock, Ki Lynn; Turner, Ronna

    2016-01-01

    When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…

  14. Sex, Age, and Emotional Valence: Revealing Possible Biases in the 'Reading the Mind in the Eyes' Task.

    PubMed

    Kynast, Jana; Schroeter, Matthias L

    2018-01-01

    The 'Reading the Mind in the Eyes' test (RMET) assesses a specific socio-cognitive ability, i.e., the ability to identify mental states from gaze. The development of this ability in a lifespan perspective is of special interest. Whereas former investigations were limited mainly to childhood and adolescence, the focus has been shifted towards aging, and psychiatric and neurodegenerative diseases recently. Although the RMET is frequently applied in developmental psychology and clinical settings, stimulus characteristics have never been investigated with respect to potential effects on test performance. Here, we analyzed the RMET stimulus set with a special focus on interrelations between sex, age and emotional valence. Forty-three persons rated age and emotional valence of the RMET picture set. Differences in emotional valence and age ratings between male and female items were analyzed. The linear relation between age and emotional valence was tested over all items, and separately for male and female items. Male items were rated older and more negative than female stimuli. Regarding male RMET items, age predicted emotional valence: older age was associated with negative emotions. Contrary, age and valence were not linearly related in female pictures. All ratings were independent of rater characteristics. Our results demonstrate a strong confound between sex, age, and emotional valence in the RMET. Male items presented a greater variability in age ratings compared to female items. Age and emotional valence were negatively associated among male items, but no significant association was found among female stimuli. As personal attributes impact social information processing, our results may add a new perspective on the interpretation of previous findings on interindividual differences in RMET accuracy, particularly in the field of developmental psychology, and age-associated neuropsychiatric diseases. A revision of the RMET might be afforded to overcome confounds identified here.

  15. Sex, Age, and Emotional Valence: Revealing Possible Biases in the ‘Reading the Mind in the Eyes’ Task

    PubMed Central

    Kynast, Jana; Schroeter, Matthias L.

    2018-01-01

    The ‘Reading the Mind in the Eyes’ test (RMET) assesses a specific socio-cognitive ability, i.e., the ability to identify mental states from gaze. The development of this ability in a lifespan perspective is of special interest. Whereas former investigations were limited mainly to childhood and adolescence, the focus has been shifted towards aging, and psychiatric and neurodegenerative diseases recently. Although the RMET is frequently applied in developmental psychology and clinical settings, stimulus characteristics have never been investigated with respect to potential effects on test performance. Here, we analyzed the RMET stimulus set with a special focus on interrelations between sex, age and emotional valence. Forty-three persons rated age and emotional valence of the RMET picture set. Differences in emotional valence and age ratings between male and female items were analyzed. The linear relation between age and emotional valence was tested over all items, and separately for male and female items. Male items were rated older and more negative than female stimuli. Regarding male RMET items, age predicted emotional valence: older age was associated with negative emotions. Contrary, age and valence were not linearly related in female pictures. All ratings were independent of rater characteristics. Our results demonstrate a strong confound between sex, age, and emotional valence in the RMET. Male items presented a greater variability in age ratings compared to female items. Age and emotional valence were negatively associated among male items, but no significant association was found among female stimuli. As personal attributes impact social information processing, our results may add a new perspective on the interpretation of previous findings on interindividual differences in RMET accuracy, particularly in the field of developmental psychology, and age-associated neuropsychiatric diseases. A revision of the RMET might be afforded to overcome confounds identified here. PMID:29755385

  16. The Development and Preliminary Testing of an Instrument for Assessing Fatigue Self-management Outcomes in Patients With Advanced Cancer.

    PubMed

    Chan, Raymond Javan; Yates, Patsy; McCarthy, Alexandra L

    Fatigue is one of the most distressing and commonly experienced symptoms in patients with advanced cancer. Although the self-management (SM) of cancer-related symptoms has received increasing attention, no research instrument assessing fatigue SM outcomes for patients with advanced cancer is available. The aim of this study was to describe the development and preliminary testing of an interviewer-administered instrument for assessing the frequency and perceived levels of effectiveness and self-efficacy associated with fatigue SM behaviors in patients with advanced cancer. The development and testing of the Self-efficacy in Managing Symptoms Scale-Fatigue Subscale for Patients With Advanced Cancer (SMSFS-A) involved a number of procedures: item generation using a comprehensive literature review and semistructured interviews, content validity evaluation using expert panel reviews, and face validity and test-retest reliability evaluation using pilot testing. Initially, 23 items (22 specific behaviors with 1 global item) were generated from the literature review and semistructured interviews. After 2 rounds of expert panel review, the final scale was reduced to 17 items (16 behaviors with 1 global item). Participants in the pilot test (n = 10) confirmed that the questions in this scale were clear and easy to understand. Bland-Altman analysis showed agreement of results over a 1-week interval. The SMSFS-A items were generated using multiple sources. This tool demonstrated preliminary validity and reliability. The SMSFS-A has the potential to be used for clinical and research purposes. Nurses can use this instrument for collecting data to inform the initiation of appropriate fatigue SM support for this population.

  17. Development and validation of the Perceived Food Environment Questionnaire in a French-Canadian population.

    PubMed

    Carbonneau, Elise; Robitaille, Julie; Lamarche, Benoît; Corneau, Louise; Lemieux, Simone

    2017-08-01

    The present study aimed to develop and validate a questionnaire assessing perceived food environment in a French-Canadian population. A questionnaire, the Perceived Food Environment Questionnaire, was developed assessing perceived accessibility to healthy (nine items) and unhealthy foods (three items). A pre-test sample was recruited for a pilot testing of the questionnaire. For the validation study, another sample was recruited and completed the questionnaire twice. Exploratory factor analysis was performed on the items to assess the number of factors (subscales). Cronbach's α was used to measure internal consistency reliability. Test-retest reliability was assessed with Pearson correlations. Online survey. Men and women from the Québec City area (n 31 in the pre-test sample; n 150 in the validation study sample). The pilot testing did not lead to any change in the questionnaire. The exploratory factor analysis revealed a two-subscale structure. The first subscale is composed of six items assessing accessibility to healthy foods and the second includes three items related to accessibility to unhealthy foods. Three items were removed from the questionnaire due to low loading on the two subscales. The subscales demonstrated adequate internal consistency (Cronbach's α=0·77 for healthy foods and 0·62 for unhealthy foods) and test-retest reliability (r=0·59 and 0·60, respectively; both P<0·0001). The Perceived Food Environment Questionnaire was developed for a French-Canadian population and demonstrated good psychometric properties. Further validation is recommended if the questionnaire is to be used in other populations.

  18. Development of a new assessment scale for measuring interaction during staff-assisted transfer of residents in dementia special care units.

    PubMed

    Thunborg, Charlotta; von Heideken Wågert, Petra; Götell, Eva; Ivarsson, Ann-Britt; Söderlund, Anne

    2015-02-10

    Mobility problems and cognitive deficits related to transferring or moving persons suffering from dementia are associated with dependency. Physical assistance provided by staff is an important component of residents' maintenance of mobility in dementia care facilities. Unfortunately, hands-on assistance during transfers is also a source of confusion in persons with dementia, as well as a source of strain in the caregiver. The bidirectional effect of actions in a dementia care dyad involved in transfer is complicated to evaluate. This study aimed to develop an assessment scale for measuring actions related to transferring persons with dementia by dementia care dyads. This study was performed in four phases and guided by the framework of the biopsychosocial model and the approach presented by Social Cognitive Theory. These frameworks provided a starting point for understanding reciprocal effects in dyadic interaction. The four phases were 1) a literature review identifying existing assessment scales; 2) analyses of video-recorded transfer of persons with dementia for further generation of items, 3) computing the item content validity index of the 93 proposed items by 15 experts; and 4) expert opinion on the response scale and feasibility testing of the new assessment scale by video observation of the transfer situations. The development process resulted in a 17-item scale with a seven-point response scale. The scale consists of two sections. One section is related to transfer-related actions (e.g., capability of communication, motor skills performance, and cognitive functioning) of the person with dementia. The other section addresses the caregivers' facilitative actions (e.g., preparedness of transfer aids, interactional skills, and means of communication and interaction). The literature review and video recordings provided ideas for the item pool. Expert opinion decreased the number of items by relevance ratings and qualitative feedback. No further development of items was performed after feasibility testing of the scale. To enable assessment of transfer-related actions in dementia care dyads, our new scale shows potential for bridging the gap in this area. Results from this study could provide health care professionals working in dementia care facilities with a useful tool for assessing transfer-related actions.

  19. Psychometric properties of the polish version of the Job-related Affective Well-being Scale.

    PubMed

    Basińska, Beata A; Gruszczyńska, Ewa; Schaufeli, Wilmar B

    2014-12-01

    The aim of this study was to verify psychometric properties of the Polish version of the Job-related Affective Well-being Scale (JAWS). Specifically, theoretical 4-factor structure (based on the dimensions of pleasure and arousal) and reliability of the original - 20-item JAWS (van Katwyk et al., 2000) and the shortened - 12-item (Schaufeli and Van Rhenen, 2006) versions were tested. Two independent samples were analyzed (police officers, N = 395, and police recruits, N = 202). The Polish version of the original, 20-item, JAWS was used to measure job-related affective states across the past month (van Katwyk et al., 2000). This version of JAWS includes 2 dimensions: valence and arousal, which allow to assess 4 categories of emotions: low-arousal positive emotions, high-arousal positive emotions, low-arousal negative emotions and high-arousal negative emotions. The results of multidimensional scaling analysis showed that the theoretical circumplex model of emotions underlining JAWS was satisfactorily reproduced. Also the hypothesized 4-factor structure of the Polish version of JAWS was confirmed. The 12-item version had better fit with the data than the original, 20-item, version, but the best fit was obtained for the even shorter, 8-item version. This version emerged from a multidimensional scaling of the 12-item version. Reliabilities of the 20- and 12-item versions were good, with lower values for the 8-item JAWS version. The findings confirmed satisfactory psychometric properties of both Polish versions of the Job-related Affective Well-being Scale. Thus, when both psychometric properties and relevance for cross-cultural comparisons are considered, the 12-item JAWS is recommended as a version of choice.

  20. The Relation between Test Formats and Kindergarteners' Expressions of Vocabulary Knowledge

    ERIC Educational Resources Information Center

    Christ, Tanya; Chiu, Ming Ming; Currie, Ashelin; Cipielewski, James

    2014-01-01

    This study tested how 53 kindergarteners' expressions of depth of vocabulary knowledge and use in novel contexts were related to in-context and out-of-context test formats for 16 target words. Applying multilevel, multi-categorical Logit to all 1,696 test item responses, the authors found that kindergarteners were more likely to express deep…

  1. Evolution of a Test Item

    ERIC Educational Resources Information Center

    Spaan, Mary

    2007-01-01

    This article follows the development of test items (see "Language Assessment Quarterly", Volume 3 Issue 1, pp. 71-79 for the article "Test and Item Specifications Development"), beginning with a review of test and item specifications, then proceeding to writing and editing of items, pretesting and analysis, and finally selection of an item for a…

  2. Readability Level of Standardized Test Items and Student Performance: The Forgotten Validity Variable

    ERIC Educational Resources Information Center

    Hewitt, Margaret A.; Homan, Susan P.

    2004-01-01

    Test validity issues considered by test developers and school districts rarely include individual item readability levels. In this study, items from a major standardized test were examined for individual item readability level and item difficulty. The Homan-Hewitt Readability Formula was applied to items across three grade levels. Results of…

  3. The development and psychometric properties of a new scale to measure mental illness related stigma by health care providers: The opening minds scale for Health Care Providers (OMS-HC)

    PubMed Central

    2012-01-01

    Background Research on the attitudes of health care providers towards people with mental illness has repeatedly shown that they may be stigmatizing. Many scales used to measure attitudes towards people with mental illness that exist today are not adequate because they do not have items that relate specifically to the role of the health care provider. Methods We developed and tested a new scale called the Opening Minds Scale for Health Care Providers (OMS-HC). After item-pool generation, stakeholder consultations and content validation, focus groups were held with 64 health care providers/trainees and six people with lived experience of mental illness to develop the scale. The OMS-HC was then tested with 787 health care providers/trainees across Canada to determine its psychometric properties. Results The initial testing OMS-HC scale showed good internal consistency, Cronbach’s alpha = 0.82 and satisfactory test-retest reliability, intraclass correlation = 0.66 (95% CI 0.54 to 0.75). The OMC-HC was only weakly correlated with social desirability, indicating that the social desirability bias was not likely to be a major determinant of OMS-HC scores. A factor analysis favoured a two-factor structure which accounted for 45% of the variance using 12 of the 20 items tested. Conclusions The OMS–HC provides a good starting point for further validation as well as a tool that could be used in the evaluation of programs aimed at reducing mental illness related stigma by health care providers. The OMS-HC incorporates various dimensions of stigma with a modest number of items that can be used with busy health care providers. PMID:22694771

  4. Efficacy of the alcohol use disorders identification test as a screening tool for hazardous alcohol intake and related disorders in primary care: a validity study.

    PubMed Central

    Piccinelli, M.; Tessari, E.; Bortolomasi, M.; Piasere, O.; Semenzin, M.; Garzotto, N.; Tansella, M.

    1997-01-01

    OBJECTIVE: To determine the properties of the alcohol use disorders identification test in screening primary care attenders for alcohol problems. DESIGN: A validity study among consecutive primary care attenders aged 18-65 years. Every third subject completed the alcohol use disorders identification test (a 10 item self report questionnaire on alcohol intake and related problems) and was interviewed by an investigator with the composite international diagnostic interview alcohol use module (a standardised interview for the independent assessment of alcohol intake and related disorders). SETTING: 10 primary care clinics in Verona, north eastern Italy. PATIENTS: 500 subjects were approached and 482 (96.4%) completed evaluation. RESULTS: When the alcohol use disorders identification test was used to detect subjects with alcohol problems the area under the receiver operating characteristic curve was 0.95. The cut off score of 5 was associated with a sensitivity of 0.84, a specificity of 0.90, and a positive predictive value of 0.60. The screening ability of the total score derived from summing the responses to the five items minimising the probability of misclassification between subjects with and without alcohol problems provided an area under the receiver operating characteristic curve of 0.93. A score of 5 or more on the five items was associated with a sensitivity of 0.79, a specificity of 0.95, and a positive predictive value of 0.73. CONCLUSIONS: The alcohol use disorders identification test performs well in detecting subjects with formal alcohol disorders and those with hazardous alcohol intake. Using five of the 10 items on the questionnaire gives reasonable accuracy, and these are recommended as questions of choice to screen patients for alcohol problems. PMID:9040389

  5. The development and psychometric properties of a new scale to measure mental illness related stigma by health care providers: the Opening Minds Scale for Health Care Providers (OMS-HC).

    PubMed

    Kassam, Aliya; Papish, Andriyka; Modgill, Geeta; Patten, Scott

    2012-06-13

    Research on the attitudes of health care providers towards people with mental illness has repeatedly shown that they may be stigmatizing. Many scales used to measure attitudes towards people with mental illness that exist today are not adequate because they do not have items that relate specifically to the role of the health care provider. We developed and tested a new scale called the Opening Minds Scale for Health Care Providers (OMS-HC). After item-pool generation, stakeholder consultations and content validation, focus groups were held with 64 health care providers/trainees and six people with lived experience of mental illness to develop the scale. The OMS-HC was then tested with 787 health care providers/trainees across Canada to determine its psychometric properties. The initial testing OMS-HC scale showed good internal consistency, Cronbach's alpha = 0.82 and satisfactory test-retest reliability, intraclass correlation = 0.66 (95% CI 0.54 to 0.75). The OMC-HC was only weakly correlated with social desirability, indicating that the social desirability bias was not likely to be a major determinant of OMS-HC scores. A factor analysis favoured a two-factor structure which accounted for 45% of the variance using 12 of the 20 items tested. The OMS-HC provides a good starting point for further validation as well as a tool that could be used in the evaluation of programs aimed at reducing mental illness related stigma by health care providers. The OMS-HC incorporates various dimensions of stigma with a modest number of items that can be used with busy health care providers.

  6. School Satisfaction among Adolescents: Testing Different Indicators for Its Measurement and Its Relationship with Overall Life Satisfaction and Subjective Well-Being in Romania and Spain

    ERIC Educational Resources Information Center

    Casas, Ferran; Baltatescu, Sergiu; Bertran, Irma; Gonzalez, Monica; Hatos, Adrian

    2013-01-01

    This paper presents results from two samples of adolescents aged 13-16 from Romania and Spain (N = 930 + 1,945 = 2,875). The original 7-item version of the Personal Well-Being Index (PWI) was used, together with an item on overall life satisfaction (OLS) and a set of six items related to satisfaction with school. A confirmatory factor analysis of…

  7. Development of the Facial Skin Care Index: A Health-Related Outcomes Index for Skin Cancer Patients

    PubMed Central

    Matthews, B. Alex; Rhee, John S.; Neuburg, Marcy; Burzynski, Mary L.; Nattinger, Ann B.

    2006-01-01

    BACKGROUND Existing health-related quality-of-life (HRQOL) tools do not appear to capture patients' specific skin cancer concerns. OBJECTIVE To describe the conceptual foundation, item generation, reduction process, and reliability testing for the Facial Skin Cancer Index (FSCI), a HRQOL outcomes tool for skin cancer researchers and clinicians. METHODS Participants in Phases I to III consisted of adult patients (N = 134) diagnosed with biopsy-proven nonmelanoma cervicofacial skin cancer. Data were collected via self-report surveys and clinical records. RESULTS Seventy-one distinct items were generated in Phase I and rated for their importance by an independent sample during Phase II; 36 items representing six theoretical HRQOL domains were retained. Test–retest I results indicated that four subscales showed adequate reliability coefficients (α = 0.60 to 0.91). Twenty-six items remained for test–retest II. Results indicated excellent internal consistency for emotional, social, appearance, and modified financial/work subscales (range 0.79 to 0.95); test–retest correlation coefficients were consistent across time (range 0.81 to 0.97; lifestyle omitted). CONCLUSION Pretesting afforded the opportunity to select items that optimally met our a priori conceptual and psychometric criteria for high data quality. Phase IV testing (validity and sensitivity before surgery and 4 months after Mohs micrographic surgery) for the 20-item FSCI is under way. PMID:16875475

  8. Evaluating measurement models in clinical research: covariance structure analysis of latent variable models of self-conception.

    PubMed

    Hoyle, R H

    1991-02-01

    Indirect measures of psychological constructs are vital to clinical research. On occasion, however, the meaning of indirect measures of psychological constructs is obfuscated by statistical procedures that do not account for the complex relations between items and latent variables and among latent variables. Covariance structure analysis (CSA) is a statistical procedure for testing hypotheses about the relations among items that indirectly measure a psychological construct and relations among psychological constructs. This article introduces clinical researchers to the strengths and limitations of CSA as a statistical procedure for conceiving and testing structural hypotheses that are not tested adequately with other statistical procedures. The article is organized around two empirical examples that illustrate the use of CSA for evaluating measurement models with correlated error terms, higher-order factors, and measured and latent variables.

  9. Development and validation of oral health-related early childhood quality of life tool for North Indian preschool children.

    PubMed

    Mathur, Vijay Prakash; Dhillon, Jatinder Kaur; Logani, Ajay; Agarwal, Ramesh

    2014-01-01

    The purpose of this study was to develop a reliable instrument [Oral Health related Early Childhood Quality of Life (OH- ECQOL) scale] for measuring oral health related quality of life (OHrQoL) in preschool children in North Indian population. Four pediatric dentists evaluated a pool of 65 items from various QoL questionnaires to assess their relevance to Indian population. These items were discussed with eight independent pediatric dentists and two community dentists who were not a part of this study to assess relevance of these items to preschool age children based on their comprehensiveness and clarity. Based on their responses and feedback a modified pool of items was developed and administered to a convenience sample of 20 parents who rated these items according to their relevance. The test retest reliability was evaluated on another sample of 20 parents of 2-5 year old children. The final questionnaire comprised of 16 items (12 child and 4 family). This was administered to 300 parents of 24-71 months old children divided on the basis of early childhood caries to assess its reliability and validity. OH-ECQOL scores were significantly associated with parental ratings of their child's general and oral health, and the presence of dental disease in the child. Cronbach's alpha was 0.862, and the ICC for test-retest reliability was 0.94. The OH-ECQOL proved reliable and valid tool for assessing the impact of oral disorders on the quality of life of preschool children in Northern India.

  10. Response latencies are alive and well for identifying fakers on a self-report personality inventory: A reconsideration of van Hooft and Born (2012).

    PubMed

    Holden, Ronald R; Lambert, Christine E

    2015-12-01

    Van Hooft and Born (Journal of Applied Psychology 97:301-316, 2012) presented data challenging both the correctness of a congruence model of faking on personality test items and the relative merit (i.e., effect size) of response latencies for identifying fakers. We suggest that their analysis of response times was suboptimal, and that it followed neither from a congruence model of faking nor from published protocols on appropriately filtering the noise in personality test item answering times. Using new data and following recommended analytic procedures, we confirmed the relative utility of response times for identifying personality test fakers, and our obtained results, again, reinforce a congruence model of faking.

  11. Development and validation of the Child Oral Health Impact Profile - Preschool version.

    PubMed

    Ruff, R R; Sischo, L; Chinn, C H; Broder, H L

    2017-09-01

    The Child Oral Health Impact Profile (COHIP) is a validated instrument created to measure the oral health-related quality of life of school-aged children. The purpose of this study was to develop and validate a preschool version of the COHIP (COHIP-PS) for children aged 2-5. The COHIP-PS was developed and validated using a multi-stage process consisting of item selection, face validity testing, item impact testing, reliability and validity testing, and factor analysis. A cross-sectional convenience sample of caregivers having children 2-5 years old from four groups completed item clarity and impact forms. Groups were recruited from pediatric health clinics or preschools/daycare centers, speech clinics, dental clinics, or cleft/craniofacial centers. Participants had a variety of oral health-related conditions, including caries, congenital orofacial anomalies, and speech/language deficiencies such as articulation and language disorders. COHIP-PS. The COHIP-PS was found to have acceptable internal validity (a = 0.71) and high test-retest reliability (0.87), though internal validity was below the accepted threshold for the community sample. While discriminant validity results indicated significant differences across study groups, the overall magnitude of differences was modest. Results from confirmatory factor analyses support the use of a four-factor model consisting of 11 items across oral health, functional well-being, social-emotional well-being, and self-image domains. Quality of life is an integral factor in understanding and assessing children's well-being. The COHIP-PS is a validated oral health-related quality of life measure for preschool children with cleft or other oral conditions. Copyright© 2017 Dennis Barber Ltd.

  12. PSSA Released Reading Items, 2000-2001. The Pennsylvania System of School Assessment.

    ERIC Educational Resources Information Center

    Pennsylvania State Dept. of Education, Harrisburg. Bureau of Curriculum and Academic Services.

    This document contains materials directly related to the actual reading test of the Pennsylvania System of School Assessment (PSSA), including the reading rubric, released passages, selected-response questions with answer keys, performance tasks, and scored samples of students' responses to the tasks. All of these items may be duplicated to…

  13. GED Items. Volume 5, Numbers 1-6.

    ERIC Educational Resources Information Center

    GED Items, 1988

    1988-01-01

    The first of six issues of the GED Items Newsletter publishied in 1988 contains articles on General Educational Development (GED) mathematics instruction, suggestions for teaching writing, and public relations and marketing. Issue 2 has articles on GED science instruction, GED for Marines, holistic scoring, and a review of the new GED tests.…

  14. Improving measures of work-related physical functioning.

    PubMed

    McDonough, Christine M; Ni, Pengsheng; Peterik, Kara; Marfeo, Elizabeth E; Marino, Molly E; Meterko, Mark; Rasch, Elizabeth K; Brandt, Diane E; Jette, Alan M; Chan, Leighton

    2017-03-01

    To expand content of the physical function domain of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration's (SSA) disability determination process. Newly developed questions were administered to 3532 recent SSA applicants for work disability benefits and 2025 US adults. Factor analyses and item response theory (IRT) methods were used to calibrate and link the new items to the existing WD-FAB, and computer-adaptive test simulations were conducted. Factor and IRT analyses supported integration of 44 new items into three existing WD-FAB scales and the addition of a new 11-item scale (Community Mobility). The final physical function domain consisting of: Basic Mobility (56 items), Upper Body Function (34 items), Fine Motor Function (45 items), and Community Mobility (11 items) demonstrated acceptable psychometric properties. The WD-FAB offers an important tool for enhancement of work disability determination. The FAB could provide relevant information about work-related functioning for initial assessment of claimants; identifying denied applicants who may benefit from interventions to improve work and health outcomes; enhancing periodic review of work disability beneficiaries; and assessing outcomes for policies, programs and services targeting people with work disability.

  15. Improving Measures of Work-Related Physical Functioning

    PubMed Central

    McDonough, Christine M.; Ni, Pengsheng; Peterik, Kara; Marfeo, Elizabeth E.; Marino, Molly E.; Meterko, Mark; Rasch, Elizabeth K; Brandt, Diane E.; Jette, Alan M; Chan, Leighton

    2016-01-01

    Purpose To expand content of the physical function domain of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration’s (SSA) disability determination process. Methods Newly developed questions were administered to 3,532 recent SSA applicants for work disability benefits and 2,025 US adults. Factor analyses and item response theory (IRT) methods were used to calibrate and link the new items to existing WD-FAB, and computer-adaptive test simulations were conducted. Results Factor and IRT analyses supported integration of 44 new items into 3 existing WD-FAB scales and the addition of a new 11-item scale (Community Mobility). The final physical function domain consisting of: Basic Mobility (56 items), Upper Body Function (34 items), Fine Motor Function (45 items), and Community Mobility (11 items) demonstrated acceptable psychometric properties. Conclusions The WD-FAB offers an important tool for enhancement of work disability determination. The FAB could provide relevant information about work-related functioning for initial assessment of claimants, identifying denied applicants who may benefit from interventions to improve work and health outcomes; enhancing periodic review of work disability beneficiaries; and assessing outcomes for policies, programs and services targeting people with work disability. PMID:28005243

  16. Assessing patients' experiences with communication across the cancer care continuum.

    PubMed

    Mazor, Kathleen M; Street, Richard L; Sue, Valerie M; Williams, Andrew E; Rabin, Borsika A; Arora, Neeraj K

    2016-08-01

    To evaluate the relevance, performance and potential usefulness of the Patient Assessment of cancer Communication Experiences (PACE) items. Items focusing on specific communication goals related to exchanging information, fostering healing relationships, responding to emotions, making decisions, enabling self-management, and managing uncertainty were tested via a retrospective, cross-sectional survey of adults who had been diagnosed with cancer. Analyses examined response frequencies, inter-item correlations, and coefficient alpha. A total of 366 adults were included in the analyses. Relatively few selected Does Not Apply, suggesting that items tap relevant communication experiences. Ratings of whether specific communication goals were achieved were strongly correlated with overall ratings of communication, suggesting item content reflects important aspects of communication. Coefficient alpha was ≥.90 for each item set, indicating excellent reliability. Variations in the percentage of respondents selecting the most positive response across items suggest results can identify strengths and weaknesses. The PACE items tap relevant, important aspects of communication during cancer care, and may be useful to cancer care teams desiring detailed feedback. The PACE is a new tool for eliciting patients' perspectives on communication during cancer care. It is freely available online for practitioners, researchers and others. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  17. Context reinstatement and memory for intrinsic versus extrinsic context: the role of item generation at encoding or retrieval.

    PubMed

    Nieznański, Marek

    2014-10-01

    According to many theoretical accounts, reinstating study context at the time of test creates optimal circumstances for item retrieval. The role of context reinstatement was tested in reference to context memory in several experiments. On the encoding phase, participants were presented with words printed in two different font colors (intrinsic context) or two different sides of the computer screen (extrinsic context). At test, the context was reinstated or changed and participants were asked to recognize words and recollect their study context. Moreover, a read-generate manipulation was introduced at encoding and retrieval, which was intended to influence the relative salience of item and context information. The results showed that context reinstatement had no effect on memory for extrinsic context but affected memory for intrinsic context when the item was generated at encoding and read at test. These results supported the hypothesis that context information is reconstructed at retrieval only when context was poorly encoded at study. © 2014 Scandinavian Psychological Associations and John Wiley & Sons Ltd.

  18. The development of a computer assisted instruction and assessment system in pharmacology.

    PubMed

    Madsen, B W; Bell, R C

    1977-01-01

    We describe the construction of a computer based system for instruction and assessment in pharmacology, utilizing a large bank of multiple choice questions. Items were collected from many sources, edited and coded for student suitability, topic, taxonomy and difficulty and text references. Students reserve a time during the day, specify the type of test desired and questions are presented randomly from the subset satisfying their criteria. Answers are scored after each question and a summary given at the end of every test; details on item performance are recorded automatically. The biggest hurdle in implementation was the assembly, review, classification and editing of items, while the programming was relatively straight-forward. A number of modifications had to be made to the initial plans and changes will undoubtedly continue with further experience. When fully operational the system will possess a number of advantages including: elimination of test preparation, editing and marking; facilitated item review opportunities; increased objectivity, feedback, flexibility and descreased anxiety in students.

  19. Crows spontaneously exhibit analogical reasoning.

    PubMed

    Smirnova, Anna; Zorina, Zoya; Obozova, Tanya; Wasserman, Edward

    2015-01-19

    Analogical reasoning is vital to advanced cognition and behavioral adaptation. Many theorists deem analogical thinking to be uniquely human and to be foundational to categorization, creative problem solving, and scientific discovery. Comparative psychologists have long been interested in the species generality of analogical reasoning, but they initially found it difficult to obtain empirical support for such thinking in nonhuman animals (for pioneering efforts, see [2, 3]). Researchers have since mustered considerable evidence and argument that relational matching-to-sample (RMTS) effectively captures the essence of analogy, in which the relevant logical arguments are presented visually. In RMTS, choice of test pair BB would be correct if the sample pair were AA, whereas choice of test pair EF would be correct if the sample pair were CD. Critically, no items in the correct test pair physically match items in the sample pair, thus demanding that only relational sameness or differentness is available to support accurate choice responding. Initial evidence suggested that only humans and apes can successfully learn RMTS with pairs of sample and test items; however, monkeys have subsequently done so. Here, we report that crows too exhibit relational matching behavior. Even more importantly, crows spontaneously display relational responding without ever having been trained on RMTS; they had only been trained on identity matching-to-sample (IMTS). Such robust and uninstructed relational matching behavior represents the most convincing evidence yet of analogical reasoning in a nonprimate species, as apes alone have spontaneously exhibited RMTS behavior after only IMTS training. Copyright © 2015 Elsevier Ltd. All rights reserved.

  20. The Effect of the Position of an Item within a Test on the Item Difficulty Value.

    ERIC Educational Resources Information Center

    Rubin, Lois S.; Mott, David E. W.

    An investigation of the effect on the difficulty value of an item due to position placement within a test was made. Using a 60-item operational test comprised of 5 subtests, 60 items were placed as experimental items on a number of spiralled test forms in three different positions (first, middle, last) within the subtest composed of like items.…

  1. Relevance of Item Analysis in Standardizing an Achievement Test in Teaching of Physical Science in B.Ed Syllabus

    ERIC Educational Resources Information Center

    Marie, S. Maria Josephine Arokia; Edannur, Sreekala

    2015-01-01

    This paper focused on the analysis of test items constructed in the paper of teaching Physical Science for B.Ed. class. It involved the analysis of difficulty level and discrimination power of each test item. Item analysis allows selecting or omitting items from the test, but more importantly item analysis is a tool to help the item writer improve…

  2. The Replicability of the Negative Testing Effect: Differences across Participant Populations

    ERIC Educational Resources Information Center

    Mulligan, Neil W.; Rawson, Katherine A.; Peterson, Daniel J.; Wissman, Kathryn T.

    2018-01-01

    Although memory retrieval often enhances subsequent memory, Peterson and Mulligan (2013) reported conditions under which retrieval produces poorer subsequent recall--the negative testing effect. The item-specific--relational account proposes that the effect occurs when retrieval disrupts interitem organizational processing relative to the restudy…

  3. Calibration of the Test of Relational Reasoning.

    PubMed

    Dumas, Denis; Alexander, Patricia A

    2016-10-01

    Relational reasoning, or the ability to discern meaningful patterns within a stream of information, is a critical cognitive ability associated with academic and professional success. Importantly, relational reasoning has been described as taking multiple forms, depending on the type of higher order relations being drawn between and among concepts. However, the reliable and valid measurement of such a multidimensional construct of relational reasoning has been elusive. The Test of Relational Reasoning (TORR) was designed to tap 4 forms of relational reasoning (i.e., analogy, anomaly, antinomy, and antithesis). In this investigation, the TORR was calibrated and scored using multidimensional item response theory in a large, representative undergraduate sample. The bifactor model was identified as the best-fitting model, and used to estimate item parameters and construct reliability. To improve the usefulness of the TORR to educators, scaled scores were also calculated and presented. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  4. EEG oscillations and recognition memory: theta correlates of memory retrieval and decision making.

    PubMed

    Jacobs, Joshua; Hwang, Grace; Curran, Tim; Kahana, Michael J

    2006-08-15

    Studies of memory retrieval have identified electroencephalographic (EEG) correlates of a test item's old-new status, reaction time, and memory load. In the current study, we used a multivariate analysis to disentangle the effects of these correlated variables. During retrieval, power of left-parietal theta (4-8 Hz) oscillations increased in proportion to how well a test item was remembered, and theta in central regions correlated with decision making. We also studied how these oscillatory dynamics complemented event-related potentials. These findings are the first to demonstrate that distinct patterns of theta oscillations can simultaneously relate to different aspects of behavior.

  5. Using more different and more familiar targets improves the detection of concealed information.

    PubMed

    Suchotzki, Kristina; De Houwer, Jan; Kleinberg, Bennett; Verschuere, Bruno

    2018-04-01

    When embedded among a number of plausible irrelevant options, the presentation of critical (e.g., crime-related or autobiographical) information is associated with a marked increase in response time (RT). This RT effect crucially depends on the inclusion of a target/non-target discrimination task with targets being a dedicated set of items that require a unique response (press YES; for all other items press NO). Targets may be essential because they share a feature - familiarity - with the critical items. Whereas irrelevant items have not been encountered before, critical items are known from the event or the facts of the investigation. Target items are usually learned before the test, and thereby made familiar to the participants. Hence, familiarity-based responding needs to be inhibited on the critical items and may therefore explain the RT increase on the critical items. This leads to the hypothesis that the more participants rely on familiarity, the more pronounced the RT increase on critical items may be. We explored two ways to increase familiarity-based responding: (1) Increasing the number of different target items, and (2) using familiar targets. In two web-based studies (n = 357 and n = 499), both the number of different targets and the use of familiar targets facilitated concealed information detection. The effect of the number of different targets was small yet consistent across both studies, the effect of target familiarity was large in both studies. Our results support the role of familiarity-based responding in the Concealed Information Test and point to ways on how to improve validity of the Concealed Information Test. Copyright © 2018 Elsevier B.V. All rights reserved.

  6. Incidental histopathological findings in hearts of control beagle dogs in toxicity studies.

    PubMed

    Bodié, Karen; Decker, Joshua H

    2014-08-01

    In preclinical studies of pharmaceutical agents, the beagle dog is a commonly used model for the detection of cardiotoxicity. Incidental findings, postmortem changes, and artifacts must be distinguished histopathologically from test item-related findings in the heart. In this retrospective analysis, cardiac sections from 88 control beagles (41 male, 47 female; ages 5-18 months) in preclinical studies were examined histopathologically. The most common finding was thickening of the tunica media of intramural coronary arteries, most likely a postmortem change. The second most common finding was the presence of vacuoles within Purkinje fibers. Dilated lymphatic and blood vessels at the insertion of chordae tendineae were noted more commonly in males than in females and were considered a normal anatomic feature. Mesothelial-lined papillary fronds along the epicardial surface of the atria were present in several dogs, as were small infiltrates of inflammatory cells usually within the myocardium. In summary, control beagles' hearts frequently have incidental findings that must be differentiated from test item-related pathologic changes. Historical control data can be useful for the interpretation of incidental and test item-related findings in the beagle heart. © 2013 by The Author(s).

  7. Development of a brief tool for monitoring aberrant behaviours among patients receiving long-term opioid therapy: The Opioid-Related Behaviours In Treatment (ORBIT) scale.

    PubMed

    Larance, Briony; Bruno, Raimondo; Lintzeris, Nicholas; Degenhardt, Louisa; Black, Emma; Brown, Amanda; Nielsen, Suzanne; Dunlop, Adrian; Holland, Rohan; Cohen, Milton; Mattick, Richard P

    2016-02-01

    Early identification of problems is essential in minimising the unintended consequences of opioid therapy. This study aimed to develop a brief scale that identifies and quantifies recent aberrant behaviour among diverse patient populations receiving long-term opioid treatment. 40 scale items were generated via literature review and expert panel (N=19) and tested in surveys of: (i) N=41 key experts, and (ii) N=426 patients prescribed opioids >3 months (222 pain patients and 204 opioid substitution therapy (OST) patients). We employed item and scale psychometrics (exploratory factor analyses, confirmatory factor analyses and item-response theory statistics) to refine items to a brief scale. Following removal of problematic items (poor retest-reliability or wording, semantic redundancy, differential item functioning, collinearity or rarity) iterative factor analytic procedures identified a 10-item unifactorial scale with good model fit in the total sample (N=426; CFI=0.981, TLI=0.975, RMSEA=0.057), and among pain (CFI=0.969, TLI=0.960, RMSEA=0.062) and OST subgroups (CFI=0.989, TFI=0.986, RMSEA=0.051). The 10 items provided good discrimination between groups, demonstrated acceptable test-retest reliability (ICC 0.80, 95% CI 0.60-0.89; Cronbach's alpha=0.89), were moderately correlated with related constructs, including opioid dependence (SDS), depression and stress (DASS subscales) and Social Relationships and Environment domains of the WHO-QoL, and had strong face validity among advising clinicians. The Opioid-Related Behaviours In Treatment (ORBIT) scale is brief, reliable and validated for use in diverse patient groups receiving opioids. The ORBIT has potential applications as a checklist to prompt clinical discussions and as a tool to quantify aberrant behaviour and assess change over time. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  8. Development and Validation of a Novel Generic Health-related Quality of Life Instrument With 20 Items (HINT-20)

    PubMed Central

    2017-01-01

    Objectives Few attempts have been made to develop a generic health-related quality of life (HRQoL) instrument and to examine its validity and reliability in Korea. We aimed to do this in our present study. Methods After a literature review of existing generic HRQoL instruments, a focus group discussion, in-depth interviews, and expert consultations, we selected 30 tentative items for a new HRQoL measure. These items were evaluated by assessing their ceiling effects, difficulty, and redundancy in the first survey. To validate the HRQoL instrument that was developed, known-groups validity and convergent/discriminant validity were evaluated and its test-retest reliability was examined in the second survey. Results Of the 30 items originally assessed for the HRQoL instrument, four were excluded due to high ceiling effects and six were removed due to redundancy. We ultimately developed a HRQoL instrument with a reduced number of 20 items, known as the Health-related Quality of Life Instrument with 20 items (HINT-20), incorporating physical, mental, social, and positive health dimensions. The results of the HINT-20 for known-groups validity were poorer in women, the elderly, and those with a low income. For convergent/discriminant validity, the correlation coefficients of items (except vitality) in the physical health dimension with the physical component summary of the Short Form 36 version 2 (SF-36v2) were generally higher than the correlations of those items with the mental component summary of the SF-36v2, and vice versa. Regarding test-retest reliability, the intraclass correlation coefficient of the total HINT-20 score was 0.813 (p<0.001). Conclusions A novel generic HRQoL instrument, the HINT-20, was developed for the Korean general population and showed acceptable validity and reliability. PMID:28173686

  9. Influence of inter-item symmetry in visual search.

    PubMed

    Roggeveen, Alexa B; Kingstone, Alan; Enns, James T

    2004-01-01

    Does visual search involve a serial inspection of individual items (Feature Integration Theory) or are items grouped and segregated prior to their consideration as a possible target (Attentional Engagement Theory)? For search items defined by motion and shape there is strong support for prior grouping (Kingstone and Bischof, 1999). The present study tested for grouping based on inter-item shape symmetry. Results showed that target-distractor symmetry strongly influenced search whereas distractor-distractor symmetry influenced search more weakly. This indicates that static shapes are evaluated for similarity to one another prior to their explicit identification as 'target' or 'distractor'. Possible reasons for the unequal contributions of target-distractor and distractor-distractor relations are discussed.

  10. Mixed-Format Test Score Equating: Effect of Item-Type Multidimensionality, Length and Composition of Common-Item Set, and Group Ability Difference

    ERIC Educational Resources Information Center

    Wang, Wei

    2013-01-01

    Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under…

  11. Testing primary-school children's understanding of the nature of science.

    PubMed

    Koerber, Susanne; Osterhaus, Christopher; Sodian, Beate

    2015-03-01

    Understanding the nature of science (NOS) is a critical aspect of scientific reasoning, yet few studies have investigated its developmental beginnings and initial structure. One contributing reason is the lack of an adequate instrument. Two studies assessed NOS understanding among third graders using a multiple-select (MS) paper-and-pencil test. Study 1 investigated the validity of the MS test by presenting the items to 68 third graders (9-year-olds) and subsequently interviewing them on their underlying NOS conception of the items. All items were significantly related between formats, indicating that the test was valid. Study 2 applied the same instrument to a larger sample of 243 third graders, and their performance was compared to a multiple-choice (MC) version of the test. Although the MC format inflated the guessing probability, there was a significant relation between the two formats. In summary, the MS format was a valid method revealing third graders' NOS understanding, thereby representing an economical test instrument. A latent class analysis identified three groups of children with expertise in qualitatively different aspects of NOS, suggesting that there is not a single common starting point for the development of NOS understanding; instead, multiple developmental pathways may exist. © 2014 The British Psychological Society.

  12. Developing an item bank to measure the coping strategies of people with hereditary retinal diseases.

    PubMed

    Prem Senthil, Mallika; Khadka, Jyoti; De Roach, John; Lamey, Tina; McLaren, Terri; Campbell, Isabella; Fenwick, Eva K; Lamoureux, Ecosse L; Pesudovs, Konrad

    2018-05-05

    Our understanding of the coping strategies used by people with visual impairment to manage stress related to visual loss is limited. This study aims to develop a sophisticated coping instrument in the form of an item bank implemented via Computerised adaptive testing (CAT) for hereditary retinal diseases. Items on coping were extracted from qualitative interviews with patients which were supplemented by items from a literature review. A systematic multi-stage process of item refinement was carried out followed by expert panel discussion and cognitive interviews. The final coping item bank had 30 items. Rasch analysis was used to assess the psychometric properties. A CAT simulation was carried out to estimate an average number of items required to gain precise measurement of hereditary retinal disease-related coping. One hundred eighty-nine participants answered the coping item bank (median age = 58 years). The coping scale demonstrated good precision and targeting. The standardised residual loadings for items revealed six items grouped together. Removal of the six items reduced the precision of the main coping scale and worsened the variance explained by the measure. Therefore, the six items were retained within the main scale. Our CAT simulation indicated that, on average, less than 10 items are required to gain a precise measurement of coping. This is the first study to develop a psychometrically robust coping instrument for hereditary retinal diseases. CAT simulation indicated that on an average, only four and nine items were required to gain measurement at moderate and high precision, respectively.

  13. Who was that masked man? Conjoint representations of intrinsic motions with actor appearance.

    PubMed

    Kersten, Alan W; Earles, Julie L; Negri, Leehe

    2018-09-01

    Motion plays an important role in recognising animate creatures. This research supports a distinction between intrinsic and extrinsic motions in their relationship to identifying information about the characters performing the motions. Participants viewed events involving costumed human characters. Intrinsic motions involved relative movements of a character's body parts, whereas extrinsic motions involved movements with respect to external landmarks. Participants were later tested for recognition of the motions and who had performed them. The critical test items involved familiar characters performing motions that had previously been performed by other characters. Participants falsely recognised extrinsic conjunction items, in which characters followed the paths of other characters, more often than intrinsic conjunction items, in which characters moved in the manner of other characters. In contrast, participants falsely recognised new extrinsic motions less often than new intrinsic motions, suggesting that they remembered extrinsic motions but had difficulty remembering who had performed them. Modelling of receiver operating characteristics indicated that participants discriminated old items from intrinsic conjunction items via familiarity, consistent with conjoint representations of intrinsic motion and identity information. In contrast, participants used recollection to distinguish old items from extrinsic conjunction items, consistent with separate but associated representations of extrinsic motion and identity information.

  14. Does Task Affordance Moderate Age-related Deficits in Strategy Production?

    PubMed Central

    Bottiroli, Sara; Dunlosky, John; Guerini, Kate; Cavallini, Elena; Hertzog, Christopher

    2011-01-01

    According to the task-affordance hypothesis, people will be more likely to use a specific strategy as tasks more readily afford its use. To evaluate this hypothesis, we examined the degree to which older and younger adults used a self-testing strategy to learn items, because previous studies suggest that age-related differences in the use of this powerful strategy vary across tasks. These tasks (words affixed to a board vs. pairs on flashcards) differentially afford the use of the self-testing strategy and may moderate the age-related effects on strategy use. Participants performed a recall-readiness task in which they continued to study items until they were ready for the criterion test. As predicted, self testing was used less often on tasks that least afforded its use. Namely, participants used self testing less when they studied single words affixed to a board than when they studied pairs on flashcards. Most important, age-related deficits in strategy use were greater for the former task and nonexistent for the latter one, suggesting that task affordance moderates age differences in strategy use. PMID:20552461

  15. Does task affordance moderate age-related deficits in strategy production?

    PubMed

    Bottiroli, Sara; Dunlosky, John; Guerini, Kate; Cavallini, Elena; Hertzog, Christopher

    2010-09-01

    According to the task-affordance hypothesis, people will be more likely to use a specific strategy as tasks more readily afford its use. To evaluate this hypothesis, we examined the degree to which older and younger adults used a self-testing strategy to learn items, because previous studies suggest that age-related differences in the use of this powerful strategy vary across tasks. These tasks (words affixed to a board vs. pairs on flashcards) differentially afford the use of the self-testing strategy and may moderate the age-related effects on strategy use. Participants performed a recall-readiness task in which they continued to study items until they were ready for the criterion test. As predicted, self testing was used less often on tasks that least afforded its use. Namely, participants used self testing less when they studied single words affixed to a board than when they studied pairs on flashcards. Most important, age-related deficits in strategy use were greater for the former task and nonexistent for the latter one, suggesting that task affordance moderates age differences in strategy use.

  16. Development and psychometric testing of the Canine Owner-Reported Quality of Life questionnaire, an instrument designed to measure quality of life in dogs with cancer.

    PubMed

    Giuffrida, Michelle A; Brown, Dorothy Cimino; Ellenberg, Susan S; Farrar, John T

    2018-05-01

    OBJECTIVE To describe development and initial psychometric testing of an owner-reported questionnaire designed to standardize measurement of general quality of life (QOL) in dogs with cancer. DESIGN Key-informant interviews, questionnaire development, and field trial. SAMPLE Owners of 25 dogs with cancer for item development and pretesting and owners of 90 dogs with cancer for reliability and validity testing. PROCEDURES Standard methods for development and testing of questionnaire instruments intended to measure subjective states were used. Items were generated, selected, scaled, and pretested for content, meaning, and readability. Response items were evaluated with exploratory factor analysis and by assessing internal consistency (Cronbach α) and convergence with global QOL as determined with a visual analog scale. Preliminary tests of stability and responsiveness were performed. RESULTS The final questionnaire-which was named the Canine Owner-Reported Quality of Life (CORQ) questionnaire-contained 17 items related to observable behaviors commonly used by owners to evaluate QOL in their dogs. Several items pertaining to physical symptoms performed poorly and were omitted. The 17 items were assigned to 4 factors-vitality, companionship, pain, and mobility-on the basis of the items they contained. The CORQ questionnaire and its factors had high internal consistency (Cronbach α = 0.68 to 0.90) and moderate to strong correlations (r = 0.49 to 0.71) with global QOL as measured on a visual analog scale. Preliminary testing indicated good test-retest reliability and responsiveness to improvements in overall QOL. CONCLUSIONS AND CLINICAL RELEVANCE The CORQ questionnaire was a valid, reliable owner-reported questionnaire that measured general QOL in dogs with cancer and showed promise as a clinical trial outcome measure for quantifying changes in individual dog QOL occurring in response to cancer treatment and progression.

  17. Development and psychometric characteristics of the SCI-QOL Pressure Ulcers scale and short form.

    PubMed

    Kisala, Pamela A; Tulsky, David S; Choi, Seung W; Kirshblum, Steven C

    2015-05-01

    To develop a self-reported measure of the subjective impact of pressure ulcers on health-related quality of life (HRQOL) in individuals with spinal cord injury (SCI) as part of the SCI quality of life (SCI-QOL) measurement system. Grounded-theory based qualitative item development methods, large-scale item calibration testing, confirmatory factor analysis (CFA), and item response theory-based psychometric analysis. Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. SCI-QOL Pressure Ulcers scale. 189 individuals with traumatic SCI who experienced a pressure ulcer within the past 7 days completed 30 items related to pressure ulcers. CFA confirmed a unidimensional pool of items. IRT analyses were conducted. A constrained Graded Response Model with a constant slope parameter was used to estimate item thresholds for the 12 retained items. The 12-item SCI-QOL Pressure Ulcers scale is unique in that it is specifically targeted to individuals with spinal cord injury and at every stage of development has included input from individuals with SCI. Furthermore, use of CFA and IRT methods provide flexibility and precision of measurement. The scale may be administered in its entirety or as a 7-item "short form" and is available for both research and clinical practice.

  18. The role of unconscious memory errors in judgments of confidence for sentence recognition.

    PubMed

    Sampaio, Cristina; Brewer, William F

    2009-03-01

    The present experiment tested the hypothesis that unconscious reconstructive memory processing can lead to the breakdown of the relationship between memory confidence and memory accuracy. Participants heard deceptive schema-inference sentences and nondeceptive sentences and were tested with either simple or forced-choice recognition. The nondeceptive items showed a positive relation between confidence and accuracy in both simple and forced-choice recognition. However, the deceptive items showed a strong negative confidence/accuracy relationship in simple recognition and a low positive relationship in forced choice. The mean levels of confidence for erroneous responses for deceptive items were inappropriately high in simple recognition but lower in forced choice. These results suggest that unconscious reconstructive memory processes involved in memory for the deceptive schema-inference items led to inaccurate confidence judgments and that, when participants were made aware of the deceptive nature of the schema-inference items through the use of a forced-choice procedure, they adjusted their confidence accordingly.

  19. Item response theory detects differential item functioning between healthy and ill children in QoL measures

    PubMed Central

    Langer, Michelle M.; Hill, Cheryl D.; Thissen, David; Burwinkle, Tasha M.; Varni, James W.; DeWalt, Darren A.

    2008-01-01

    Objective To demonstrate the value of item response theory (IRT) and differential item functioning (DIF) methods in examining a health-related quality of life (HRQOL) measure in children and adolescents. Study Design and Setting This illustration uses data from 5,429 children using the four subscales of the PedsQL™ 4.0 Generic Core Scales. The IRT model-based likelihood ratio test was used to detect and evaluate DIF between healthy children and children with a chronic condition. Results DIF was detected for a majority of items but cancelled out at the total test score level due to opposing directions of DIF. Post-hoc analysis indicated that this pattern of results may be due to multidimensionality. We discuss issues in detecting and handling DIF. Conclusion This paper describes how to perform DIF analyses in validating a questionnaire to ensure that scores have equivalent meaning across subgroups. It offers insight into ways information gained through the analysis can be used to evaluate an existing scale. PMID:18226750

  20. The Selection of Test Items for Decision Making with a Computer Adaptive Test.

    ERIC Educational Resources Information Center

    Spray, Judith A.; Reckase, Mark D.

    The issue of test-item selection in support of decision making in adaptive testing is considered. The number of items needed to make a decision is compared for two approaches: selecting items from an item pool that are most informative at the decision point or selecting items that are most informative at the examinee's ability level. The first…

  1. Evaluating HIV Knowledge Questionnaires Among Men Who Have Sex with Men: A Multi-Study Item Response Theory Analysis.

    PubMed

    Janulis, Patrick; Newcomb, Michael E; Sullivan, Patrick; Mustanski, Brian

    2018-01-01

    Knowledge about the transmission, prevention, and treatment of HIV remains a critical element in psychosocial models of HIV risk behavior and is commonly used as an outcome in HIV prevention interventions. However, most HIV knowledge questions have not undergone rigorous psychometric testing such as using item response theory. The current study used data from six studies of men who have sex with men (MSM; n = 3565) to (1) examine the item properties of HIV knowledge questions, (2) test for differential item functioning on commonly studied characteristics (i.e., age, race/ethnicity, and HIV risk behavior), (3) select items with the optimal item characteristics, and (4) leverage this combined dataset to examine the potential moderating effect of age on the relationship between condomless anal sex (CAS) and HIV knowledge. Findings indicated that existing questions tend to poorly differentiate those with higher levels of HIV knowledge, but items were relatively robust across diverse individuals. Furthermore, age moderated the relationship between CAS and HIV knowledge with older MSM having the strongest association. These findings suggest that additional items are required in order to capture a more nuanced understanding of HIV knowledge and that the association between CAS and HIV knowledge may vary by age.

  2. Development and psychometric evaluation of an information literacy self-efficacy survey and an information literacy knowledge test.

    PubMed

    Tepe, Rodger; Tepe, Chabha

    2015-03-01

    To develop and psychometrically evaluate an information literacy (IL) self-efficacy survey and an IL knowledge test. In this test-retest reliability study, a 25-item IL self-efficacy survey and a 50-item IL knowledge test were developed and administered to a convenience sample of 53 chiropractic students. Item analyses were performed on all questions. The IL self-efficacy survey demonstrated good reliability (test-retest correlation = 0.81) and good/very good internal consistency (mean κ = .56 and Cronbach's α = .92). A total of 25 questions with the best item analysis characteristics were chosen from the 50-item IL knowledge test, resulting in a 25-item IL knowledge test that demonstrated good reliability (test-retest correlation = 0.87), very good internal consistency (mean κ = .69, KR20 = 0.85), and good item discrimination (mean point-biserial = 0.48). This study resulted in the development of three instruments: a 25-item IL self-efficacy survey, a 50-item IL knowledge test, and a 25-item IL knowledge test. The information literacy self-efficacy survey and the 25-item version of the information literacy knowledge test have shown preliminary evidence of adequate reliability and validity to justify continuing study with these instruments.

  3. A New Item Selection Procedure for Mixed Item Type in Computerized Classification Testing.

    ERIC Educational Resources Information Center

    Lau, C. Allen; Wang, Tianyou

    This paper proposes a new Information-Time index as the basis for item selection in computerized classification testing (CCT) and investigates how this new item selection algorithm can help improve test efficiency for item pools with mixed item types. It also investigates how practical constraints such as item exposure rate control, test…

  4. A preliminary investigation into the neural basis of the production effect.

    PubMed

    Hassall, Cameron D; Quinlan, Chelsea K; Turk, David J; Taylor, Tracy L; Krigolson, Olave E

    2016-06-01

    Items that are produced (e.g., read aloud) during encoding typically are better remembered than items that are not produced (e.g., read silently). This "production effect" has been explained by distinctiveness: Produced items have more distinct features than nonproduced items, leading to enhanced retrieval. The goal of the current study was to use electroencephalography (EEG) to examine the neural basis of the production effect. During study, participants were presented with words that they were required to read silently, read aloud, or sing while EEG data were recorded. Subsequent memory performance was tested using a yes/no recognition test. Analysis focused on the event-related brain potentials (ERPs) evoked by the encoding instruction cue for each instruction condition. Our data revealed enhanced memory performance for produced items and a greater P300 ERP amplitude for instructions to sing or read aloud compared with instructions to read silently. Our results demonstrate that the amplitude of the P300 is modulated by at least 1 aspect of production, vocalization (singing/reading aloud relative to reading silently), and are consistent with the distinctiveness account of the production effect. The ERP methodology is a viable tool for investigating the production effect. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  5. A Process for Reviewing and Evaluating Generated Test Items

    ERIC Educational Resources Information Center

    Gierl, Mark J.; Lai, Hollis

    2016-01-01

    Testing organization needs large numbers of high-quality items due to the proliferation of alternative test administration methods and modern test designs. But the current demand for items far exceeds the supply. Test items, as they are currently written, evoke a process that is both time-consuming and expensive because each item is written,…

  6. Measurement of self-evaluative motives: a shopping scenario.

    PubMed

    Wajda, Theresa A; Kolbe, Richard; Hu, Michael Y; Cui, Annie Peng

    2008-08-01

    To develop measures of consumers' self-evaluative motives of Self-verification, Self-enhancement, and Self-improvement within the context of a mall shopping environment, an initial set of 49 items was generated by conducting three focus-group sessions. These items were subsequently converted into shopping-dependent motive statements. 250 undergraduate college students responded on a 7-point scale to each statement as these related to the acquisition of recent personal shopping goods. An exploratory factor analysis yielded five factors, accounting for 57.7% of the variance, three of which corresponded to the Self-verification motive (five items), Self-enhancement motive (three items), and Self-improvement motive (six items). These 14 items, along with 9 reconstructed items, yielded 23 items retained and subjected to additional testing. In a final round of data collection, 169 college students provided data for exploratory factor analysis. 11 items were used in confirmatory factor analysis. Analysis indicated that the 11-item scale adequately captured measures of the three self-evaluative motives. However, further data reduction produced a 9-item scale with marked improvement in statistical fit over the 11-item scale.

  7. Developing and evaluating an instrument to measure Recovery After INtensive care: the RAIN instrument.

    PubMed

    Bergbom, Ingegerd; Karlsson, Veronika; Ringdal, Mona

    2018-01-01

    Measuring and evaluating patients' recovery, following intensive care, is essential for assessing their recovery process. By using a questionnaire, which includes spiritual and existential aspects, possibilities for identifying appropriate nursing care activities may be facilitated. The study describes the development and evaluation of a recovery questionnaire and its validity and reliability. A questionnaire consisting of 30 items on a 5-point Likert scale was completed by 169 patients (103 men, 66 women), 18 years or older (m=69, SD 12.5) at 2, 6, 12 or 24 months following discharge from an ICU. An exploratory factor analysis, including a principal component analysis with orthogonal varimax rotation, was conducted. Ten initial items, with loadings below 0.40, were removed. The internal item/scale structure obtained in the principal component analysis was tested in relation to convergent and discrimination validity with a multi-trait analysis. Items consistency and reliability were assessed by Cronbach's alpha and internal item consistency. Test of scale quality, the proportion of missing values and respondents' scoring at maximum and minimum levels were also conducted. A total of 20 items in six factors - forward looking, supporting relations, existential ruminations, revaluation of life, physical and mental strength and need of social support were extracted with eigen values above one. Together, they explained 75% of the variance. The half-scale criterion showed that the proportion of incomplete scale scores ranged from 0% to 4.3%. When testing the scale's ability to differentiate between levels of the assessed concept, we found that the observed range of scale scores covered the theoretical range. Substantial proportions of respondents, who scored at the ceiling for forward looking and supporting relations and at floor for the need of social support, were found. These findings should be further investigated. The factor analysis, including discriminant validity and the mean value for the item correlations, was found to be excellent. The RAIN instrument could be used to assess recovery following intensive care. It could provide post-ICU clinics and community/primary healthcare nurses with valuable information on which areas patients may need more support.

  8. What's in a Topic? Exploring the Interaction between Test-Taker Age and Item Content in High-Stakes Testing

    ERIC Educational Resources Information Center

    Banerjee, Jayanti; Papageorgiou, Spiros

    2016-01-01

    The research reported in this article investigates differential item functioning (DIF) in a listening comprehension test. The study explores the relationship between test-taker age and the items' language domains across multiple test forms. The data comprise test-taker responses (N = 2,861) to a total of 133 unique items, 46 items of which were…

  9. Memory deficit in patients with schizophrenia and posttraumatic stress disorder: relational vs item-specific memory

    PubMed Central

    Jung, Wookyoung; Lee, Seung-Hwan

    2016-01-01

    It has been well established that patients with schizophrenia have impairments in cognitive functioning and also that patients who experienced traumatic events suffer from cognitive deficits. Of the cognitive deficits revealed in schizophrenia or posttraumatic stress disorder (PTSD) patients, the current article provides a brief review of deficit in episodic memory, which is highly predictive of patients’ quality of life and global functioning. In particular, we have focused on studies that compared relational and item-specific memory performance in schizophrenia and PTSD, because measures of relational and item-specific memory are considered the most promising constructs for immediate tangible development of clinical trial paradigm. The behavioral findings of schizophrenia are based on the tasks developed by the Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia (CNTRICS) initiative and the Cognitive Neuroscience Test Reliability and Clinical Applications for Schizophrenia (CNTRACS) Consortium. The findings we reviewed consistently showed that schizophrenia and PTSD are closely associated with more severe impairments in relational memory compared to item-specific memory. Candidate brain regions involved in relational memory impairment in schizophrenia and PTSD are also discussed. PMID:27274250

  10. 42 CFR 419.2 - Basis of payment.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... prospective payment system establishes a national payment rate, standardized for geographic wage differences...) Capital-related costs; (9) Implantable items used in connection with diagnostic X-ray tests, diagnostic laboratory tests, and other diagnostic tests; (10) Durable medical equipment that is implantable; (11...

  11. Differential Item Functioning in Primary Healthcare Evaluation Instruments by French/English Version, Educational Level and Urban/Rural Location

    PubMed Central

    Haggerty, Jeannie L.; Bouharaoui, Fatima; Santor, Darcy A.

    2011-01-01

    Evaluating the extent to which groups or subgroups of individuals differ with respect to primary healthcare experience depends on first ruling out the possibility of bias. Objective: To determine whether item or subscale performance differs systematically between French/English, high/low education subgroups and urban/rural residency. Method: A sample of 645 adult users balanced by French/English language (in Quebec and Nova Scotia, respectively), high/low education and urban/rural residency responded to six validated instruments: the Primary Care Assessment Survey (PCAS); the Primary Care Assessment Tool – Short Form (PCAT-S); the Components of Primary Care Index (CPCI); the first version of the EUROPEP (EUROPEP-I); the Interpersonal Processes of Care Survey, version II (IPC-II); and part of the Veterans Affairs National Outpatient Customer Satisfaction Survey (VANOCSS). We normalized subscale scores to a 0-to-10 scale and tested for between-group differences using ANOVA tests. We used a parametric item response model to test for differences between subgroups in item discriminability and item difficulty. We re-examined group differences after removing items with differential item functioning. Results: Experience of care was assessed more positively in the English-speaking (Nova Scotia) than in the French-speaking (Quebec) respondents. We found differential English/French item functioning in 48% of the 153 items: discriminability in 20% and differential difficulty in 28%. English items were more discriminating generally than the French. Removing problematic items did not change the differences in French/English assessments. Differential item functioning by high/low education status affected 27% of items, with items being generally more discriminating in high-education groups. Between-group comparisons were unchanged. In contrast, only 9% of items showed differential item functioning by geography, affecting principally the accessibility attribute. Removing problematic items reversed a previously non-significant finding, revealing poorer first-contact access in rural than in urban areas. Conclusion: Differential item functioning does not bias or invalidate French/English comparisons on subscales, but additional development is required to make French and English items equivalent. These instruments are relatively robust by educational status and geography, but results suggest potential differences in the underlying construct in low-education and rural respondents. PMID:23205035

  12. Development of cultural belief scales for mammography screening.

    PubMed

    Russell, Kathleen M; Champion, Victoria L; Perkins, Susan M

    2003-01-01

    To develop instruments to measure culturally related variables that may influence mammography screening behaviors in African American women. Instrumentation methodology. Community organizations and public housing in the Indianapolis, IN, area. 111 African American women with a mean age of 60.2 years and 64 Caucasian women with a mean age of 60 years. After item development, scales were administered. Data were analyzed by factor analysis, item analysis via internal consistency reliability using Cronbach's alpha, and independent t tests and logistic regression analysis to test theoretical relationships. Personal space preferences, health temporal orientation, and perceived personal control. Space items were factored into interpersonal and physical scales. Temporal orientation items were loaded on one factor, creating a one-dimensional scale. Control items were factored into internal and external control scales. Cronbach's alpha coefficients for the scales ranged from 0.76-0.88. Interpersonal space preference, health temporal orientation, and perceived internal control scales each were predictive of mammography screening adherence. The three tested scales were reliable and valid. Scales, on average, did not differ between African American and Caucasian populations. These scales may be useful in future investigations aimed at increasing mammography screening in African American and Caucasian women.

  13. Item validity vs. item discrimination index: a redundancy?

    NASA Astrophysics Data System (ADS)

    Panjaitan, R. L.; Irawati, R.; Sujana, A.; Hanifah, N.; Djuanda, D.

    2018-03-01

    In several literatures about evaluation and test analysis, it is common to find that there are calculations of item validity as well as item discrimination index (D) with different formula for each. Meanwhile, other resources said that item discrimination index could be obtained by calculating the correlation between the testee’s score in a particular item and the testee’s score on the overall test, which is actually the same concept as item validity. Some research reports, especially undergraduate theses tend to include both item validity and item discrimination index in the instrument analysis. It seems that these concepts might overlap for both reflect the test quality on measuring the examinees’ ability. In this paper, examples of some results of data processing on item validity and item discrimination index were compared. It would be discussed whether item validity and item discrimination index can be represented by one of them only or it should be better to present both calculations for simple test analysis, especially in undergraduate theses where test analyses were included.

  14. Examining Differential Item Functions of Different Item Ordered Test Forms According to Item Difficulty Levels

    ERIC Educational Resources Information Center

    Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem

    2016-01-01

    The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…

  15. On the Status of Cue Independence as a Criterion for Memory Inhibition: Evidence against the Covert Blocking Hypothesis

    ERIC Educational Resources Information Center

    Weller, Peter D.; Anderson, Michael C.; Gómez-Ariza, Carlos J.; Bajo, M. Teresa

    2013-01-01

    Retrieving memories can impair recall of other related traces. Items affected by this retrieval-induced forgetting (RIF) are often less accessible when tested with independent probes, a characteristic known as cue independence. Cue independence has been interpreted as evidence for inhibitory mechanisms that suppress competing items during…

  16. A Mixture Rasch Model with a Covariate: A Simulation Study via Bayesian Markov Chain Monte Carlo Estimation

    ERIC Educational Resources Information Center

    Dai, Yunyun

    2013-01-01

    Mixtures of item response theory (IRT) models have been proposed as a technique to explore response patterns in test data related to cognitive strategies, instructional sensitivity, and differential item functioning (DIF). Estimation proves challenging due to difficulties in identification and questions of effect size needed to recover underlying…

  17. Modeling Skipped and Not-Reached Items Using IRTrees

    ERIC Educational Resources Information Center

    Debeer, Dries; Janssen, Rianne; De Boeck, Paul

    2017-01-01

    When dealing with missing responses, two types of omissions can be discerned: items can be skipped or not reached by the test taker. When the occurrence of these omissions is related to the proficiency process the missingness is nonignorable. The purpose of this article is to present a tree-based IRT framework for modeling responses and omissions…

  18. Influence of the wording of evaluation items on outcome-based evaluation results for large-group teaching in anatomy, biochemistry and legal medicine.

    PubMed

    Anders, Sven; Pyka, Katharina; Mueller, Tjark; von Streinbuechel, Nicole; Raupach, Tobias

    2016-11-01

    Student learning outcome is an important dimension of teaching quality in undergraduate medical education. Measuring an increase in knowledge during teaching requires repetitive objective testing which is usually not feasible. As an alternative, student learning outcome can be calculated from student self-ratings. Comparative self-assessment (CSA) gain reflects the performance difference before and after teaching, adjusted for initial knowledge. It has been shown to be a valid proxy measure of actual learning outcome derived from objective tests. However, student self-ratings are prone to a number of confounding factors. In the context of outcome-based evaluation, the wording of self-rating items is crucial to the validity of evaluation results. This randomized trial assessed whether including qualifiers in these statements impacts on student ratings and CSA gain. First-year medical students self-rated their initial (then-test) and final (post-test) knowledge for lectures in anatomy, biochemistry and legal medicine, respectively, and 659 questionnaires were retrieved. Six-point scales were used for self-ratings with 1 being the most positive option. Qualifier use did not affect then-test ratings but was associated with slightly less favorable post-test ratings. Consecutively, mean CSA gain was smaller for items containing qualifiers than for items lacking qualifiers (50.6±15.0% vs. 56.3±14.6%, p=0.079). The effect was more pronounced (Cohen's d=0.82) for items related to anatomy. In order to increase fairness of outcome-based evaluation and increase the comparability of CSA gain data across subjects, medical educators should agree on a consistent approach (qualifiers for all items or no qualifiers at all) when drafting self-rating statements for outcome-based evaluation. Copyright © 2016 Elsevier GmbH. All rights reserved.

  19. The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory

    ERIC Educational Resources Information Center

    Sahin, Alper; Anil, Duygu

    2017-01-01

    This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…

  20. Psychometric properties of the 25-item National Eye Institute Visual Function Questionnaire (NEI VFQ-25), Japanese version.

    PubMed

    Suzukamo, Yoshimi; Oshika, Tetsuro; Yuzawa, Mitsuko; Tokuda, Yoshihiro; Tomidokoro, Atsuo; Oki, Kotaro; Mangione, Carol M; Green, Joseph; Fukuhara, Shunichi

    2005-10-26

    The importance of evaluating the outcomes of health care from the standpoint of the patient is now widely recognized. The purpose of this study is to develop and test a Japanese version of the National Eye Institute Visual Function Questionnaire (NEI VFQ-25). A Japanese version was developed with a previously standardized method. The questionnaire and optional items were completed by 245 patients with cataracts, glaucoma, or age-related macular degeneration, by 110 others before and after cataract surgery, and by a reference group (n = 31). We computed rates of missing data, measured reproducibility and internal consistency reliability, and tested for convergent and discriminant validity, concurrent validity, known-groups validity, factor structure, and responsiveness to change. Based on information from the participants, some items were changed to 2-step items (asking if an activity was done, and if it was done, then asking how difficult it was). The near-vision and distance-vision subscales each had 1 item that was endorsed by very few participants, so these items were replaced with items that were optional in the English version. For example, more than 60% of participants did not drive, so the driving question was excluded. Reliability and validity were adequate for all subscales except driving, ocular pain, color vision, and peripheral vision. With cataract surgery, most scores improved by at least 20 points. With minor modifications from the English version, the Japanese NEI VFQ-25 can give reliable, valid, responsive data on vision-related quality of life, for group-level comparisons or for tracking therapeutic outcomes.

  1. [Perceptions on item disclosure for the Korean medical licensing examination].

    PubMed

    Yang, Eunbae B

    2015-09-01

    This study analyzed the perceptions of medical students and faculty regarding disclosure of test items on the Korean medical licensing examination. I conducted a survey of medical students from medical colleges and professional medical schools nationwide. Responses were analyzed from 718 participants as well as 69 faculty members who participated in creating the medical licensing examination item sets. Data were analyzed using descriptive statistics and the chi-square test. It is important to maintain test quality and to keep the test items unavailable to the public. There are also concerns among students that disclosure of test items would prompt increasing difficulty of test items (48.3%). Further, few students found it desirable to disclose test items regardless of any considerations (28.5%). The professors, who had experience in designing the test items, also expressed their opposition to test item disclosure (60.9%). It is desirable not to disclose the test items of the Korean medical licensing examination to the public on the condition that students are provided with a sufficient amount of information regarding the examination. This is so that the exam can appropriately identify candidates with the required qualifications.

  2. A Review of Classical Methods of Item Analysis.

    ERIC Educational Resources Information Center

    French, Christine L.

    Item analysis is a very important consideration in the test development process. It is a statistical procedure to analyze test items that combines methods used to evaluate the important characteristics of test items, such as difficulty, discrimination, and distractibility of the items in a test. This paper reviews some of the classical methods for…

  3. Modeling Item-Position Effects within an IRT Framework

    ERIC Educational Resources Information Center

    Debeer, Dries; Janssen, Rianne

    2013-01-01

    Changing the order of items between alternate test forms to prevent copying and to enhance test security is a common practice in achievement testing. However, these changes in item order may affect item and test characteristics. Several procedures have been proposed for studying these item-order effects. The present study explores the use of…

  4. ACER Chemistry Test Item Collection. ACER Chemtic Year 12.

    ERIC Educational Resources Information Center

    Australian Council for Educational Research, Hawthorn.

    The chemistry test item banks contains 225 multiple-choice questions suitable for diagnostic and achievement testing; a three-page teacher's guide; answer key with item facilities; an answer sheet; and a 45-item sample achievement test. Although written for the new grade 12 chemistry course in Victoria, Australia, the items are widely applicable.…

  5. Moving Knowledge Acquisition From the Lecture Hall to the Student Home: A Prospective Intervention Study.

    PubMed

    Raupach, Tobias; Grefe, Clemens; Brown, Jamie; Meyer, Katharina; Schuelper, Nikolai; Anders, Sven

    2015-09-28

    Podcasts are popular with medical students, but the impact of podcast use on learning outcomes in undergraduate medical education has not been studied in detail. Our aim was to assess the impact of podcasts accompanied by quiz questions and lecture attendance on short- and medium-term knowledge retention. Students enrolled for a cardio-respiratory teaching module were asked to prepare for 10 specific lectures by watching podcasts and submitting answers to related quiz questions before attending live lectures. Performance on the same questions was assessed in a surprise test and a retention test. Watching podcasts and submitting answers to quiz questions (versus no podcast/quiz use) was associated with significantly better test performance in all items in the surprise test and 7 items in the retention test. Lecture attendance (versus no attendance) was associated with higher test performance in 3 items and 1 item, respectively. In a linear regression analysis adjusted for age, gender, and overall performance levels, both podcast/quiz use and lecture attendance were significant predictors of student performance. However, the variance explained by podcast/quiz use was greater than the variance explained by lecture attendance in the surprise test (38.7% vs. 2.2%) and retention test (19.1% vs. 4.0%). When used in conjunction with quiz questions, podcasts have the potential to foster knowledge acquisition and retention over and above the effect of live lectures.

  6. The role of relational binding in item memory: evidence from face recognition in a case of developmental amnesia.

    PubMed

    Olsen, Rosanna K; Lee, Yunjo; Kube, Jana; Rosenbaum, R Shayna; Grady, Cheryl L; Moscovitch, Morris; Ryan, Jennifer D

    2015-04-01

    Current theories state that the hippocampus is responsible for the formation of memory representations regarding relations, whereas extrahippocampal cortical regions support representations for single items. However, findings of impaired item memory in hippocampal amnesics suggest a more nuanced role for the hippocampus in item memory. The hippocampus may be necessary when the item elements need to be bound within and across episodes to form a lasting representation that can be used flexibly. The current investigation was designed to test this hypothesis in face recognition. H.C., an individual who developed with a compromised hippocampal system, and control participants incidentally studied individual faces that either varied in presentation viewpoint across study repetitions or remained in a fixed viewpoint across the study repetitions. Eye movements were recorded during encoding and participants then completed a surprise recognition memory test. H.C. demonstrated altered face viewing during encoding. Although the overall number of fixations made by H.C. was not significantly different from that of controls, the distribution of her viewing was primarily directed to the eye region. Critically, H.C. was significantly impaired in her ability to subsequently recognize faces studied from variable viewpoints, but demonstrated spared performance in recognizing faces she encoded from a fixed viewpoint, implicating a relationship between eye movement behavior in the service of a hippocampal binding function. These findings suggest that a compromised hippocampal system disrupts the ability to bind item features within and across study repetitions, ultimately disrupting recognition when it requires access to flexible relational representations. Copyright © 2015 the authors 0270-6474/15/355342-09$15.00/0.

  7. Prediction of true test scores from observed item scores and ancillary data.

    PubMed

    Haberman, Shelby J; Yao, Lili; Sinharay, Sandip

    2015-05-01

    In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE(®) General Analytical Writing and until 2009 in the case of TOEFL(®) iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e-rater(®). In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability. © 2015 The British Psychological Society.

  8. Test-retest reliability at the item level and total score level of the Norwegian version of the Spinal Cord Injury Falls Concern Scale (SCI-FCS).

    PubMed

    Roaldsen, Kirsti Skavberg; Måøy, Åsa Blad; Jørgensen, Vivien; Stanghelle, Johan Kvalvik

    2016-05-01

    Translation of the Spinal Cord Injury Falls Concern Scale (SCI-FCS), and investigation of test-retest reliability on item-level and total-score-level. Translation, adaptation and test-retest study. A specialized rehabilitation setting in Norway. Fifty-four wheelchair users with a spinal cord injury. The median age of the cohort was 49 years, and the median number of years after injury was 13. Interventions/measurements: The SCI-FCS was translated and back-translated according to guidelines. Individuals answered the SCI-FCS twice over the course of one week. We investigated item-level test-retest reliability using Svensson's rank-based statistical method for disagreement analysis of paired ordinal data. For relative reliability, we analyzed the total-score-level test-retest reliability with intraclass correlation coefficients (ICC2.1), the standard error of measurement (SEM), and the smallest detectable change (SDC) for absolute reliability/measurement-error assessment and Cronbach's alpha for internal consistency. All items showed satisfactory percentage agreement (≥69%) between test and retest. There were small but non-negligible systematic disagreements among three items; we recovered an 11-13% higher chance for a lower second score. There was no disagreement due to random variance. The test-retest agreement (ICC2.1) was excellent (0.83). The SEM was 2.6 (12%), and the SDC was 7.1 (32%). The Cronbach's alpha was high (0.88). The Norwegian SCI-FCS is highly reliable for wheelchair users with chronic spinal cord injuries.

  9. Constructing three emotion knowledge tests from the invariant measurement approach

    PubMed Central

    Prieto, Gerardo; Burin, Debora I.

    2017-01-01

    Background Psychological constructionist models like the Conceptual Act Theory (CAT) postulate that complex states such as emotions are composed of basic psychological ingredients that are more clearly respected by the brain than basic emotions. The objective of this study was the construction and initial validation of Emotion Knowledge measures from the CAT frame by means of an invariant measurement approach, the Rasch Model (RM). Psychological distance theory was used to inform item generation. Methods Three EK tests—emotion vocabulary (EV), close emotional situations (CES) and far emotional situations (FES)—were constructed and tested with the RM in a community sample of 100 females and 100 males (age range: 18–65), both separately and conjointly. Results It was corroborated that data-RM fit was sufficient. Then, the effect of type of test and emotion on Rasch-modelled item difficulty was tested. Significant effects of emotion on EK item difficulty were found, but the only statistically significant difference was that between “happiness” and the remaining emotions; neither type of test, nor interaction effects on EK item difficulty were statistically significant. The testing of gender differences was carried out after corroborating that differential item functioning (DIF) would not be a plausible alternative hypothesis for the results. No statistically significant sex-related differences were found out in EV, CES, FES, or total EK. However, the sign of d indicate that female participants were consistently better than male ones, a result that will be of interest for future meta-analyses. Discussion The three EK tests are ready to be used as components of a higher-level measurement process. PMID:28929013

  10. The knowledge, efficacy, and practices instrument for oral health providers: a validity study with dental students.

    PubMed

    Behar-Horenstein, Linda S; Garvan, Cyndi W; Moore, Thomas E; Catalanotto, Frank A

    2013-08-01

    Valid and reliable instruments to measure and assess cultural competence for oral health care providers are scarce in the literature, and most published scales have been contested due to a lack of item analysis and internal estimates of reliability. The purposes of this study were, first, to develop a standardized instrument to measure dental students' knowledge of diversity, skills in culturally competent patient-centered communication, and use of culture-centered practices in patient care and, second, to provide preliminary validity support for this instrument. The initial instrument used in this study was a thirty-six-item Likert-scale survey entitled the Knowledge, Efficacy, and Practices Instrument for Oral Health Providers (KEPI-OHP). This instrument is an adaption of an initially thirty-three-item version of the Multicultural Awareness, Knowledge, and Skills Scale-Counselor Edition (MAKSS-CE), a scale that assesses factors related to social justice, cultural differences among clients, and cross-cultural client management. After the authors conducted cognitive and expert interviews, focus groups, pilot testing, and item analysis, their initial instrument was reduced to twenty-eight items. The KEPI-OHP was then distributed to 916 dental students (response rate=48.6 percent) across the United States to measure its reliability and assess its validity. Both exploratory and confirmatory factor analyses were conducted to test the scale's validity. The modification of the survey into a sensible instrument with a relatively clear factor structure using factor analysis resulted in twenty items. A scree test suggested three expressive factors, which were retained for rotation. Bentler's comparative fit and Bentler and Bonnett's non-normed indices were 0.95 and 0.92, respectively. A three-factor solution, including efficacy of assessment, knowledge of diversity, and culture-centered practice subscales, comprised of twenty-items was identified. The KEPI-OHP was found to have reasonable internal consistency reliability to warrant its use for baseline and repeated measures in assessing changes in dental students' growth in cultural competence across four-year dental curricula.

  11. Measurement Equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Pain Interference Short Form Items: Application to Ethnically Diverse Cancer and Palliative Care Populations.

    PubMed

    Teresi, Jeanne A; Ocepek-Welikson, Katja; Cook, Karon F; Kleinman, Marjorie; Ramirez, Mildred; Reid, M Carrington; Siu, Albert

    2016-01-01

    Reducing the response burden of standardized pain measures is desirable, particularly for individuals who are frail or live with chronic illness, e.g., those suffering from cancer and those in palliative care. The Patient Reported Outcome Measurement Information System ® (PROMIS ® ) project addressed this issue with the provision of computerized adaptive tests (CAT) and short form measures that can be used clinically and in research. Although there has been substantial evaluation of PROMIS item banks, little is known about the performance of PROMIS short forms, particularly in ethnically diverse groups. Reviewed in this article are findings related to the differential item functioning (DIF) and reliability of the PROMIS pain interference short forms across diverse sociodemographic groups. DIF hypotheses were generated for the PROMIS short form pain interference items. Initial analyses tested item response theory (IRT) model assumptions of unidimensionality and local independence. Dimensionality was evaluated using factor analytic methods; local dependence (LD) was tested using IRT-based LD indices. Wald tests were used to examine group differences in IRT parameters, and to test DIF hypotheses. A second DIF-detection method used in sensitivity analyses was based on ordinal logistic regression with a latent IRT-derived conditioning variable. Magnitude and impact of DIF were investigated, and reliability and item and scale information statistics were estimated. The reliability of the short form item set was excellent. However, there were a few items with high local dependency, which affected the estimation of the final discrimination parameters. As a result, the item, "How much did pain interfere with enjoyment of social activities?" was excluded in the DIF analyses for all subgroup comparisons. No items were hypothesized to show DIF for race and ethnicity; however, five items showed DIF after adjustment for multiple comparisons in both primary and sensitivity analyses: ability to concentrate, enjoyment of recreational activities, tasks away from home, participation in social activities, and socializing with others. The magnitude of DIF was small and the impact negligible. Three items were consistently identified with DIF for education: enjoyment of life, ability to concentrate, and enjoyment of recreational activities. No item showed DIF above the magnitude threshold and the impact of DIF on the overall measure was minimal. No item showed gender DIF after correction for multiple comparisons in the primary analyses. Four items showed consistent age DIF: enjoyment of life, ability to concentrate, day to day activities, and enjoyment of recreational activities, none with primary magnitude values above threshold. Conditional on the pain state, Spanish speakers were hypothesized to report less pain interference on one item, enjoyment of life. The DIF findings confirmed the hypothesis; however, the magnitude was small. Using an arbitrary cutoff point of theta ( θ ) ≥ 1.0 to classify respondents with acute pain interference, the highest number of changes were for the education groups analyses. There were 231 respondents (4% of the total sample) who changed from the designation of no acute pain interference to acute interference after the DIF adjustment. There was no change in the designations for race/ethnic subgroups, and a small number of changes for respondents aged 65 to 84. Although significant DIF was observed after correction for multiple comparisons, all DIF was of low magnitude and impact. However, some individual-level impact was observed for low education groups. Reliability estimates were high. Thus, the PROMIS short form pain items examined in this ethnically diverse sample performed relatively well; although one item was problematic and removed from the analyses. It is concluded that the majority of the PROMIS pain interference short form items can be recommended for use among ethnically diverse groups, including those in palliative care and with cancer and chronic illness.

  12. Measurement Equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Pain Interference Short Form Items: Application to Ethnically Diverse Cancer and Palliative Care Populations

    PubMed Central

    Teresi, Jeanne A.; Ocepek-Welikson, Katja; Cook, Karon F.; Kleinman, Marjorie; Ramirez, Mildred; Reid, M. Carrington; Siu, Albert

    2017-01-01

    Reducing the response burden of standardized pain measures is desirable, particularly for individuals who are frail or live with chronic illness, e.g., those suffering from cancer and those in palliative care. The Patient Reported Outcome Measurement Information System® (PROMIS®) project addressed this issue with the provision of computerized adaptive tests (CAT) and short form measures that can be used clinically and in research. Although there has been substantial evaluation of PROMIS item banks, little is known about the performance of PROMIS short forms, particularly in ethnically diverse groups. Reviewed in this article are findings related to the differential item functioning (DIF) and reliability of the PROMIS pain interference short forms across diverse sociodemographic groups. Methods DIF hypotheses were generated for the PROMIS short form pain interference items. Initial analyses tested item response theory (IRT) model assumptions of unidimensionality and local independence. Dimensionality was evaluated using factor analytic methods; local dependence (LD) was tested using IRT-based LD indices. Wald tests were used to examine group differences in IRT parameters, and to test DIF hypotheses. A second DIF-detection method used in sensitivity analyses was based on ordinal logistic regression with a latent IRT-derived conditioning variable. Magnitude and impact of DIF were investigated, and reliability and item and scale information statistics were estimated. Results The reliability of the short form item set was excellent. However, there were a few items with high local dependency, which affected the estimation of the final discrimination parameters. As a result, the item, “How much did pain interfere with enjoyment of social activities?” was excluded in the DIF analyses for all subgroup comparisons. No items were hypothesized to show DIF for race and ethnicity; however, five items showed DIF after adjustment for multiple comparisons in both primary and sensitivity analyses: ability to concentrate, enjoyment of recreational activities, tasks away from home, participation in social activities, and socializing with others. The magnitude of DIF was small and the impact negligible. Three items were consistently identified with DIF for education: enjoyment of life, ability to concentrate, and enjoyment of recreational activities. No item showed DIF above the magnitude threshold and the impact of DIF on the overall measure was minimal. No item showed gender DIF after correction for multiple comparisons in the primary analyses. Four items showed consistent age DIF: enjoyment of life, ability to concentrate, day to day activities, and enjoyment of recreational activities, none with primary magnitude values above threshold. Conditional on the pain state, Spanish speakers were hypothesized to report less pain interference on one item, enjoyment of life. The DIF findings confirmed the hypothesis; however, the magnitude was small. Using an arbitrary cutoff point of theta (θ) ≥ 1.0 to classify respondents with acute pain interference, the highest number of changes were for the education groups analyses. There were 231 respondents (4% of the total sample) who changed from the designation of no acute pain interference to acute interference after the DIF adjustment. There was no change in the designations for race/ethnic subgroups, and a small number of changes for respondents aged 65 to 84. Conclusions Although significant DIF was observed after correction for multiple comparisons, all DIF was of low magnitude and impact. However, some individual-level impact was observed for low education groups. Reliability estimates were high. Thus, the PROMIS short form pain items examined in this ethnically diverse sample performed relatively well; although one item was problematic and removed from the analyses. It is concluded that the majority of the PROMIS pain interference short form items can be recommended for use among ethnically diverse groups, including those in palliative care and with cancer and chronic illness. PMID:28983449

  13. Selective attention and recognition: effects of congruency on episodic learning.

    PubMed

    Rosner, Tamara M; D'Angelo, Maria C; MacLellan, Ellen; Milliken, Bruce

    2015-05-01

    Recent research on cognitive control has focused on the learning consequences of high selective attention demands in selective attention tasks (e.g., Botvinick, Cognit Affect Behav Neurosci 7(4):356-366, 2007; Verguts and Notebaert, Psychol Rev 115(2):518-525, 2008). The current study extends these ideas by examining the influence of selective attention demands on remembering. In Experiment 1, participants read aloud the red word in a pair of red and green spatially interleaved words. Half of the items were congruent (the interleaved words had the same identity), and the other half were incongruent (the interleaved words had different identities). Following the naming phase, participants completed a surprise recognition memory test. In this test phase, recognition memory was better for incongruent than for congruent items. In Experiment 2, context was only partially reinstated at test, and again recognition memory was better for incongruent than for congruent items. In Experiment 3, all of the items contained two different words, but in one condition the words were presented close together and interleaved, while in the other condition the two words were spatially separated. Recognition memory was better for the interleaved than for the separated items. This result rules out an interpretation of the congruency effects on recognition in Experiments 1 and 2 that hinges on stronger relational encoding for items that have two different words. Together, the results support the view that selective attention demands for incongruent items lead to encoding that improves recognition.

  14. The development of a test of biodiversity knowledge of high school students

    NASA Astrophysics Data System (ADS)

    Ajayi, Olabisi Modupe

    2002-09-01

    The primary purpose of this study was to develop a valid and reliable test of the knowledge of biodiversity of high school students. The test differentiated students' knowledge on three levels of biodiversity: species, ecosystem and genetics. A secondary purpose was to examine how biodiversity scores were affected by gender, grade point average, and families' socioeconomic status. The initial phase of the instrument development involved the construction of 60 dichotomous items (true/false). To establish content validity, a panel of biodiversity experts reviewed the items for appropriateness and clarity. The items were checked for readability using Flesch-Kincaid Readability Index and the readability was at the fifth grade level. The instrument was subjected to factor analysis. As a result, the final instrument was compiled and named the Ajayi Biodiversity Instrument (ABI). The reliability of ABI was .87. The mean score on the 25-item test was 79%. No significant difference at >0.05 was found in the score of students on each of the three subtests for genetics, species, and ecosystem. No significant difference was found in the score of students relative to their family's socioeconomic status. There was a significant correlation between grade point average and participation in extracurricular activities that related to biodiversity concepts and scores on ABI. Gender differences emerged at the ecosystem level, females scoring higher than males. Differences among ethnic groups also emerged. Anglo-Americans scored significantly higher on the test of knowledge of biodiversity for high school students than the rest of the ethnic groups combined.

  15. A partner-related risk behavior index to identify people at elevated risk for sexually transmitted infections.

    PubMed

    Crosby, Richard; Shrier, Lydia A

    2013-04-01

    The purpose of this study was to develop and test a sexual-partner-related risk behavior index to identify high-risk individuals most likely to have a sexually transmitted infection (STI). Patients from five STI and adolescent medical clinics in three US cities were recruited (N = 928; M age = 29.2 years). Data were collected using audio-computer-assisted self-interviewing. Of seven sexual-partner-related variables, those that were significantly associated with the outcomes were combined into a partner-related risk behavior index. The dependent variables were laboratory-confirmed infection with Chlamydia trachomatis, Neisseria gonorrhoeae, and/or Trichomonas vaginalis. Nearly one-fifth of the sample (169/928; 18.4%) tested positive for an STI. Three of the seven items were significantly associated with having one or more STIs: sex with a newly released prisoner, sex with a person known or suspected of having an STI, and sexual concurrency. In combined form, this three-item index was significantly associated with STI prevalence (p < .001). In the presence of three covariates (gender, race, and age), those classified as being at-risk by the index were 1.8 times more likely than those not classified as such to test positive for an STI (p < .001). Among individuals at risk for STIs, a three-item index predicted testing positive for one or more of three STIs. This index could be used to prioritize and guide intensified clinic-based counseling for high-risk patients of STI and other clinics.

  16. The Influence of Forward and Backward Associative Strength on False Memories for Encoding Context

    PubMed Central

    Arndt, Jason

    2016-01-01

    Two experiments examined the effects of Forward Associative Strength (FAS) and Backward Associative Strength (FAS) on false recollection of unstudied lure items. Themes were constructed such that four associates were strongly related to a lure item in terms of FAS or BAS and four associates were weakly related to a lure item in terms of FAS or BAS. Further, when FAS was manipulated, BAS was controlled across strong and weak associates, while FAS was controlled across strong and weak associates when BAS was manipulated. Strong associates were presented in one font while weak associates were presented in a second font. At test, lure items were disproportionately attributed to the source used to present lures’ strong associates compared to lures’ weak associates, both when BAS was manipulated and when FAS was manipulated. This outcome demonstrates that both BAS and FAS influence lure item false recollection, which favors global-matching models’ explanation of false recollection over the explanation offered by spreading-activation theories. PMID:25312499

  17. Assembling a Computerized Adaptive Testing Item Pool as a Set of Linear Tests

    ERIC Educational Resources Information Center

    van der Linden, Wim J.; Ariel, Adelaide; Veldkamp, Bernard P.

    2006-01-01

    Test-item writing efforts typically results in item pools with an undesirable correlational structure between the content attributes of the items and their statistical information. If such pools are used in computerized adaptive testing (CAT), the algorithm may be forced to select items with less than optimal information, that violate the content…

  18. Evaluation of Northwest University, Kano Post-UTME Test Items Using Item Response Theory

    ERIC Educational Resources Information Center

    Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi

    2016-01-01

    High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…

  19. Item Specifications, Science Grade 8. Blue Prints for Testing Minimum Performance Test.

    ERIC Educational Resources Information Center

    Arkansas State Dept. of Education, Little Rock.

    These item specifications were developed as a part of the Arkansas "Minimum Performance Testing Program" (MPT). There is one item specification for each instructional objective included in the MPT. The purpose of an item specification is to provide an overview of the general content and format of test items used to measure an…

  20. Item Specifications, Science Grade 6. Blue Prints for Testing Minimum Performance Test.

    ERIC Educational Resources Information Center

    Arkansas State Dept. of Education, Little Rock.

    These item specifications were developed as a part of the Arkansas "Minimum Performance Testing Program" (MPT). There is one item specification for each instructional objective included in the MPT. The purpose of an item specification is to provide an overview of the general content and format of test items used to measure an…

  1. Criterion-Referenced Test Items for Welding.

    ERIC Educational Resources Information Center

    Davis, Diane, Ed.

    This test item bank on welding contains test questions based upon competencies found in the Missouri Welding Competency Profile. Some test items are keyed for multiple competencies. These criterion-referenced test items are designed to work with the Vocational Instructional Management System. Questions have been statistically sampled and validated…

  2. Optimal Test Design with Rule-Based Item Generation

    ERIC Educational Resources Information Center

    Geerlings, Hanneke; van der Linden, Wim J.; Glas, Cees A. W.

    2013-01-01

    Optimal test-design methods are applied to rule-based item generation. Three different cases of automated test design are presented: (a) test assembly from a pool of pregenerated, calibrated items; (b) test generation on the fly from a pool of calibrated item families; and (c) test generation on the fly directly from calibrated features defining…

  3. The Negative Testing and Negative Generation Effects Are Eliminated by Delay

    ERIC Educational Resources Information Center

    Mulligan, Neil W.; Peterson, Daniel J.

    2015-01-01

    Although retrieval often enhances subsequent memory (the testing effect), a negative testing effect has recently been documented in which prior retrieval harms later recall compared with restudying. The negative testing effect was predicated on the negative generation effect and the item-specific-relational framework. The present experiments…

  4. [The Maugeri Stress Index: a questionnaire to assess work-related psychological stress].

    PubMed

    Giorgi, Ines; Baiardi, Paola; Tringali, Salvatore; Candura, Stefano Massimo; Gardinali, Francesco; Grignani, Elena; Bertolotti, Giorgio; Imbriani, Marcello

    2011-01-01

    The European directives concerning the evaluation of work-related stress were absorbed into Italian law by means of Legislative Decree No. 81 of 9 April 2008. To develop a new questionnaire to assess the impact of work-related psychological distress and to validate it by testing its factorial structure, its content, its construct and discriminant validity. After critically reviewing the literature, we generated an initial item set to identify the items to be used in a preliminary version of the questionnaire, and then used a focus group to test the comprehensibility of the items. The questionnaire was administered to 329 subjects working in state and private organisation and a small sample of 29 subjects complaining of vexation at work. The Maugeri Stress Index (MSI) is reliable (Cronbach alpha: 0.93). Factorial analysis indicated five factors: Well-being, Adaptation, Support, Irritability and Avoidance. The total and subscale scores were significantly different when comparing subjects with and without vexation at work. The MSI has a multi-factorial structure, good internal reliability and sufficient discriminant power.

  5. Specifying the role of the left prefrontal cortex in word selection.

    PubMed

    Riès, S K; Karzmark, C R; Navarrete, E; Knight, R T; Dronkers, N F

    2015-10-01

    Word selection allows us to choose words during language production. This is often viewed as a competitive process wherein a lexical representation is retrieved among semantically-related alternatives. The left prefrontal cortex (LPFC) is thought to help overcome competition for word selection through top-down control. However, whether the LPFC is always necessary for word selection remains unclear. We tested 6 LPFC-injured patients and controls in two picture naming paradigms varying in terms of item repetition. Both paradigms elicited the expected semantic interference effects (SIE), reflecting interference caused by semantically-related representations in word selection. However, LPFC patients as a group showed a larger SIE than controls only in the paradigm involving item repetition. We argue that item repetition increases interference caused by semantically-related alternatives, resulting in increased LPFC-dependent cognitive control demands. The remaining network of brain regions associated with word selection appears to be sufficient when items are not repeated. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. The development and validation of a test of science critical thinking for fifth graders.

    PubMed

    Mapeala, Ruslan; Siew, Nyet Moi

    2015-01-01

    The paper described the development and validation of the Test of Science Critical Thinking (TSCT) to measure the three critical thinking skill constructs: comparing and contrasting, sequencing, and identifying cause and effect. The initial TSCT consisted of 55 multiple choice test items, each of which required participants to select a correct response and a correct choice of critical thinking used for their response. Data were obtained from a purposive sampling of 30 fifth graders in a pilot study carried out in a primary school in Sabah, Malaysia. Students underwent the sessions of teaching and learning activities for 9 weeks using the Thinking Maps-aided Problem-Based Learning Module before they answered the TSCT test. Analyses were conducted to check on difficulty index (p) and discrimination index (d), internal consistency reliability, content validity, and face validity. Analysis of the test-retest reliability data was conducted separately for a group of fifth graders with similar ability. Findings of the pilot study showed that out of initial 55 administered items, only 30 items with relatively good difficulty index (p) ranged from 0.40 to 0.60 and with good discrimination index (d) ranged within 0.20-1.00 were selected. The Kuder-Richardson reliability value was found to be appropriate and relatively high with 0.70, 0.73 and 0.92 for identifying cause and effect, sequencing, and comparing and contrasting respectively. The content validity index obtained from three expert judgments equalled or exceeded 0.95. In addition, test-retest reliability showed good, statistically significant correlations ([Formula: see text]). From the above results, the selected 30-item TSCT was found to have sufficient reliability and validity and would therefore represent a useful tool for measuring critical thinking ability among fifth graders in primary science.

  7. Criterion-Referenced Test Items for Small Engines.

    ERIC Educational Resources Information Center

    Herd, Amon

    This notebook contains criterion-referenced test items for testing students' knowledge of small engines. The test items are based upon competencies found in the Missouri Small Engine Competency Profile. The test item bank is organized in 18 sections that cover the following duties: shop procedures; tools and equipment; fasteners; servicing fuel…

  8. Multiple choice questions can be designed or revised to challenge learners' critical thinking.

    PubMed

    Tractenberg, Rochelle E; Gushta, Matthew M; Mulroney, Susan E; Weissinger, Peggy A

    2013-12-01

    Multiple choice (MC) questions from a graduate physiology course were evaluated by cognitive-psychology (but not physiology) experts, and analyzed statistically, in order to test the independence of content expertise and cognitive complexity ratings of MC items. Integration of higher order thinking into MC exams is important, but widely known to be challenging-perhaps especially when content experts must think like novices. Expertise in the domain (content) may actually impede the creation of higher-complexity items. Three cognitive psychology experts independently rated cognitive complexity for 252 multiple-choice physiology items using a six-level cognitive complexity matrix that was synthesized from the literature. Rasch modeling estimated item difficulties. The complexity ratings and difficulty estimates were then analyzed together to determine the relative contributions (and independence) of complexity and difficulty to the likelihood of correct answers on each item. Cognitive complexity was found to be statistically independent of difficulty estimates for 88 % of items. Using the complexity matrix, modifications were identified to increase some item complexities by one level, without affecting the item's difficulty. Cognitive complexity can effectively be rated by non-content experts. The six-level complexity matrix, if applied by faculty peer groups trained in cognitive complexity and without domain-specific expertise, could lead to improvements in the complexity targeted with item writing and revision. Targeting higher order thinking with MC questions can be achieved without changing item difficulties or other test characteristics, but this may be less likely if the content expert is left to assess items within their domain of expertise.

  9. Cognitive testing of tobacco use items for administration to patients with cancer and cancer survivors in clinical research.

    PubMed

    Land, Stephanie R; Warren, Graham W; Crafts, Jennifer L; Hatsukami, Dorothy K; Ostroff, Jamie S; Willis, Gordon B; Chollette, Veronica Y; Mitchell, Sandra A; Folz, Jasmine N M; Gulley, James L; Szabo, Eva; Brandon, Thomas H; Duffy, Sonia A; Toll, Benjamin A

    2016-06-01

    To the authors' knowledge, there are currently no standardized measures of tobacco use and secondhand smoke exposure in patients diagnosed with cancer, and this gap hinders the conduct of studies examining the impact of tobacco on cancer treatment outcomes. The objective of the current study was to evaluate and refine questionnaire items proposed by an expert task force to assess tobacco use. Trained interviewers conducted cognitive testing with cancer patients aged ≥21 years with a history of tobacco use and a cancer diagnosis of any stage and organ site who were recruited at the National Institutes of Health Clinical Center in Bethesda, Maryland. Iterative rounds of testing and item modification were conducted to identify and resolve cognitive issues (comprehension, memory retrieval, decision/judgment, and response mapping) and instrument navigation issues until no items warranted further significant modification. Thirty participants (6 current cigarette smokers, 1 current cigar smoker, and 23 former cigarette smokers) were enrolled from September 2014 to February 2015. The majority of items functioned well. However, qualitative testing identified wording ambiguities related to cancer diagnosis and treatment trajectory, such as "treatment" and "surgery"; difficulties with lifetime recall; errors in estimating quantities; and difficulties with instrument navigation. Revisions to item wording, format, order, response options, and instructions resulted in a questionnaire that demonstrated navigational ease as well as good question comprehension and response accuracy. The Cancer Patient Tobacco Use Questionnaire (C-TUQ) can be used as a standardized item set to accelerate the investigation of tobacco use in the cancer setting. Cancer 2016;122:1728-34. © 2016 American Cancer Society. © 2016 American Cancer Society.

  10. Applications of computerized adaptive testing (CAT) to the assessment of headache impact.

    PubMed

    Ware, John E; Kosinski, Mark; Bjorner, Jakob B; Bayliss, Martha S; Batenhorst, Alice; Dahlöf, Carl G H; Tepper, Stewart; Dowson, Andrew

    2003-12-01

    To evaluate the feasibility of computerized adaptive testing (CAT) and the reliability and validity of CAT-based estimates of headache impact scores in comparison with 'static' surveys. Responses to the 54-item Headache Impact Test (HIT) were re-analyzed for recent headache sufferers (n = 1016) who completed telephone interviews during the National Survey of Headache Impact (NSHI). Item response theory (IRT) calibrations and the computerized dynamic health assessment (DYNHA) software were used to simulate CAT assessments by selecting the most informative items for each person and estimating impact scores according to pre-set precision standards (CAT-HIT). Results were compared with IRT estimates based on all items (total-HIT), computerized 6-item dynamic estimates (CAT-HIT-6), and a developmental version of a 'static' 6-item form (HIT-6-D). Analyses focused on: respondent burden (survey length and administration time), score distributions ('ceiling' and 'floor' effects), reliability and standard errors, and clinical validity (diagnosis, level of severity). A random sample (n = 245) was re-assessed to test responsiveness. A second study (n = 1103) compared actual CAT surveys and an improved 'static' HIT-6 among current headache sufferers sampled on the Internet. Respondents completed measures from the first study and the generic SF-8 Health Survey; some (n = 540) were re-tested on the Internet after 2 weeks. In the first study, simulated CAT-HIT and total-HIT scores were highly correlated (r = 0.92) without 'ceiling' or 'floor' effects and with a substantial reduction (90.8%) in respondent burden. Six of the 54 items accounted for the great majority of item administrations (3603/5028, 77.6%). CAT-HIT reliability estimates were very high (0.975-0.992) in the range where 95% of respondents scored, and relative validity (RV) coefficients were high for diagnosis (RV = 0.87) and severity (RV = 0.89); patient-level classifications were accurate 91.3% for a diagnosis of migraine. For all three criteria of change, CAT-HIT scores were more responsive than all other measures. In the second study, estimates of respondent burden, item usage, reliability and clinical validity were replicated. The test-retest reliability of CAT-HIT was 0.79 and alternate forms coefficients ranged from 0.85 to 0.91. All correlations with the generic SF-8 were negative. CAT-based administrations of headache impact items achieved very large reductions in respondent burden without compromising validity for purposes of patient screening or monitoring changes in headache impact over time. IRT models and CAT-based dynamic health assessments warrant testing among patients with other conditions.

  11. Evaluating the Psychometric Characteristics of Generated Multiple-Choice Test Items

    ERIC Educational Resources Information Center

    Gierl, Mark J.; Lai, Hollis; Pugh, Debra; Touchie, Claire; Boulais, André-Philippe; De Champlain, André

    2016-01-01

    Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric…

  12. Ruggedness/robustness evaluation and system suitability test on United States Pharmacopoeia XXVI assay ginsenosides in Asian and American ginseng by high-performance liquid chromatography.

    PubMed

    Li, Yong-Guo; Chen, Ming; Chou, Gui-Xin; Wang, Zheng-Tao; Hu, Zhi-Bi

    2004-09-03

    The work of the ruggedness/robustness evaluation and system suitability tests was oriented to profound understand the practicability of using assay methods issued by United States Pharmacopoeia (USP XXVI and XXVII) for ginsenosides in Asian ginseng and American ginseng. The items chosen for the method validation included quantitative related items such as recovery of Rg(1) and Rb(1), respectively, and qualitative related items such as resolution, theoretical plate number, relative retention time of two critical-band-pairs, Rg(1)/Re and Rb(1) with its neighboring peak, respectively. Totally, 16 column types were used for comparison of different vendors, different packing materials, different size, etc. and five sets of LC systems and two laboratories were involved in comparing the data of both quantitative and qualitative items. The results showed that different packing materials of columns used might significantly alters separation. The column packing material Hypersil afforded the preferable separating for the ginsenosides. No significant difference was observed from the different instrumentations and inter-laboratories. Our results suggest a modification of the system suitability test as given in USP26-NF21 and the latest version of USP27-NF22, which was not suitable for most systems. Using resolutions of Rg(1)/Re and Rb(1) with its neighboring peak as critical parameters for the ginsenosides assay and omitting the relative retention time of both Rg(1)/Re and Rb(1) with its neighboring peak is our suggestion for a more reasonable, yet practicable system suitability. Six typical chromatograms gain from different columns were figured out as well.

  13. Development and psychometric testing of the Cancer Knowledge Scale for Elders.

    PubMed

    Su, Ching-Ching; Chen, Yuh-Min; Kuo, Bo-Jein

    2009-03-01

    To develop the Cancer Knowledge Scale for Elders and test its validity and reliability. The number of elders suffering from cancer is increasing. To facilitate cancer prevention behaviours among elders, they shall be educated about cancer-related knowledge. Prior to designing a programme that would respond to the special needs of elders, understanding the cancer-related knowledge within this population was necessary. However, extensive review of the literature revealed a lack of appropriate instruments for measuring cancer-related knowledge. A valid and reliable cancer knowledge scale for elders is necessary. A non-experimental methodological design was used to test the psychometric properties of the Cancer Knowledge Scale for Elders. Item analysis was first performed to screen out items that had low corrected item-total correlation coefficients. Construct validity was examined with a principle component method of exploratory factor analysis. Cancer-related health behaviour was used as the criterion variable to evaluate criterion-related validity. Internal consistency reliability was assessed by the KR-20. Stability was determined by two-week test-retest reliability. The factor analysis yielded a four-factor solution accounting for 49.5% of the variance. For criterion-related validity, cancer knowledge was positively correlated with cancer-related health behaviour (r = 0.78, p < 0.001). The KR-20 coefficients of each factor were 0.85, 0.76, 0.79 and 0.67 and 0.87 for the total scale. Test-retest reliability over a two-week period was 0.83 (p < 0.001). This study provides evidence for content validity, construct validity, criterion-related validity, internal consistency and stability of the Cancer Knowledge Scale for Elders. The results show that this scale is an easy-to-use instrument for elders and has adequate validity and reliability. The scale can be used as an assessment instrument when implementing cancer education programmes for elders. It can also be used to evaluate the effects of education programmes.

  14. Development and psychometric evaluation of an information literacy self-efficacy survey and an information literacy knowledge test*

    PubMed Central

    Tepe, Rodger; Tepe, Chabha

    2015-01-01

    Objective To develop and psychometrically evaluate an information literacy (IL) self-efficacy survey and an IL knowledge test. Methods In this test–retest reliability study, a 25-item IL self-efficacy survey and a 50-item IL knowledge test were developed and administered to a convenience sample of 53 chiropractic students. Item analyses were performed on all questions. Results The IL self-efficacy survey demonstrated good reliability (test–retest correlation = 0.81) and good/very good internal consistency (mean κ = .56 and Cronbach's α = .92). A total of 25 questions with the best item analysis characteristics were chosen from the 50-item IL knowledge test, resulting in a 25-item IL knowledge test that demonstrated good reliability (test–retest correlation = 0.87), very good internal consistency (mean κ = .69, KR20 = 0.85), and good item discrimination (mean point-biserial = 0.48). Conclusions This study resulted in the development of three instruments: a 25-item IL self-efficacy survey, a 50-item IL knowledge test, and a 25-item IL knowledge test. The information literacy self-efficacy survey and the 25-item version of the information literacy knowledge test have shown preliminary evidence of adequate reliability and validity to justify continuing study with these instruments. PMID:25517736

  15. Cross-Culture Validation of the HIV/AIDS Stress Scale: The Development of a Revised Chinese Version.

    PubMed

    Niu, Lu; Qiu, Yangyang; Luo, Dan; Chen, Xi; Wang, Min; Pakenham, Kenneth I; Zhang, Xixing; Huang, Zhulin; Xiao, Shuiyuan

    2016-01-01

    Being HIV-infected is a stressful experience for many individuals. To assess HIV-related stress in the Chinese context, a measure with satisfied psychometric properties is yet underdeveloped. This study aimed to examine the psychometric characteristics of a simplified Chinese version of the HIV/AIDS Stress Scale (SS-HIV) among people living with HIV/AIDS in central China. A total of 667 people living with HIV (92% were male) were recruited from March 1st 2014 to August 31th 2015 by consecutive sampling. A standard questionnaire package containing the Chinese HIV/AIDS Stress Scale (CSS-HIV), the Chinese Patient Health Questionnaire-9 (PHQ-9), and the Chinese Generalized Anxiety Disorder Scale (GAD-7) were administered to all participants, and 38 of the participants were selected randomly to be re-tested in four weeks after the initial testing. Our data supported that a revised 17-item CSS-HIV had adequate psychometric properties. It consisted of 3 factors: emotional stress (6 items), social stress (6 items) and instrumental stress (5 items). The overall Cronbach's α was 0.906, and the test-retest reliability coefficient was 0.832. The revised CSS-HIV was significantly correlated with the number of HIV-related symptoms, as well as scores on the PHQ-9 and GAD-7, indicating acceptable concurrent validity. The 17-item Chinese version of the SS-HIV has potential research and clinical utility in identifying important stressors among the Chinese HIV-infected population and in understanding the effects of stress on adjustment to HIV.

  16. Development and psychometric characteristics of the SCI-QOL Pressure Ulcers scale and short form

    PubMed Central

    Kisala, Pamela A.; Tulsky, David S.; Choi, Seung W.; Kirshblum, Steven C.

    2015-01-01

    Objective To develop a self-reported measure of the subjective impact of pressure ulcers on health-related quality of life (HRQOL) in individuals with spinal cord injury (SCI) as part of the SCI quality of life (SCI-QOL) measurement system. Design Grounded-theory based qualitative item development methods, large-scale item calibration testing, confirmatory factor analysis (CFA), and item response theory-based psychometric analysis. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Main Outcome Measures SCI-QOL Pressure Ulcers scale. Results 189 individuals with traumatic SCI who experienced a pressure ulcer within the past 7 days completed 30 items related to pressure ulcers. CFA confirmed a unidimensional pool of items. IRT analyses were conducted. A constrained Graded Response Model with a constant slope parameter was used to estimate item thresholds for the 12 retained items. Conclusions The 12-item SCI-QOL Pressure Ulcers scale is unique in that it is specifically targeted to individuals with spinal cord injury and at every stage of development has included input from individuals with SCI. Furthermore, use of CFA and IRT methods provide flexibility and precision of measurement. The scale may be administered in its entirety or as a 7-item “short form” and is available for both research and clinical practice. PMID:26010965

  17. Benchmarks for Deeper Learning on Next Generation Tests: A Study of PISA. CRESST Report 855

    ERIC Educational Resources Information Center

    Herman, Joan L.; La Torre, Deborah; Epstein, Scott; Wang, Jia

    2016-01-01

    This report presents the results of expert panels' item-by-item analysis of the 2015 PISA Reading Literacy and Mathematics Literacy assessments and compares study findings on PISA's representation of deeper learning with that of other related studies. Results indicate that about 11% to 14% of PISA's total raw score value for reading and…

  18. Comparison of Autism Screening in Younger and Older Toddlers

    ERIC Educational Resources Information Center

    Sturner, Raymond; Howard, Barbara; Bergmann, Paul; Stewart, Lydia; Afarian, Talin E.

    2017-01-01

    This study examined the effect of age at completion of an autism screening test on item failure rates contrasting older (>20 months) with younger (<20 months) toddlers in a community primary care sample of 73,564 children. Items related to social development were categorized into one of three age sets per criteria from Inada et al.…

  19. Integrating Test-Form Formatting into Automated Test Assembly

    ERIC Educational Resources Information Center

    Diao, Qi; van der Linden, Wim J.

    2013-01-01

    Automated test assembly uses the methodology of mixed integer programming to select an optimal set of items from an item bank. Automated test-form generation uses the same methodology to optimally order the items and format the test form. From an optimization point of view, production of fully formatted test forms directly from the item pool using…

  20. Instructional Topics in Educational Measurement (ITEMS) Module: Using Automated Processes to Generate Test Items

    ERIC Educational Resources Information Center

    Gierl, Mark J.; Lai, Hollis

    2013-01-01

    Changes to the design and development of our educational assessments are resulting in the unprecedented demand for a large and continuous supply of content-specific test items. One way to address this growing demand is with automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer…

  1. Investigating diagnostic bias in autism spectrum conditions: An item response theory analysis of sex bias in the AQ-10.

    PubMed

    Murray, Aja Louise; Allison, Carrie; Smith, Paula L; Baron-Cohen, Simon; Booth, Tom; Auyeung, Bonnie

    2017-05-01

    Diagnostic bias is a concern in autism spectrum conditions (ASC) where prevalence and presentation differ by sex. To ensure that females with ASC are not under-identified, it is important that ASC screening tools do not systematically underestimate autistic traits in females relative to males. We evaluated whether the AQ-10, a brief screen for ASC recommended by the National Institute of Clinical Excellence in cases of suspected ASC, exhibits such a bias. Using an item response theory approach, we evaluated differential item functioning and differential test functioning. We found that although individual items showed some sex bias, these biases at times favored males and at other times favored females. Thus, at the level of test scores the item-level biases cancelled out to give an unbiased overall score. Results support the continued use of the AQ-10 sum score in its current form; however, suggest that caution should be exercised when interpreting responses to individual items. The nature of the item level biases could serve as a guide for future research into how ASC affects males and females differently. Autism Res 2017, 10: 790-800. © 2016 International Society for Autism Research, Wiley Periodicals, Inc. © 2016 International Society for Autism Research, Wiley Periodicals, Inc.

  2. Visual search by chimpanzees (Pan): assessment of controlling relations.

    PubMed Central

    Tomonaga, M

    1995-01-01

    Three experimentally sophisticated chimpanzees (Pan), Akira, Chloe, and Ai, were trained on visual search performance using a modified multiple-alternative matching-to-sample task in which a sample stimulus was followed by the search display containing one target identical to the sample and several uniform distractors (i.e., negative comparison stimuli were identical to each other). After they acquired this task, they were tested for transfer of visual search performance to trials in which the sample was not followed by the uniform search display (odd-item search). Akira showed positive transfer of visual search performance to odd-item search even when the display size (the number of stimulus items in the search display) was small, whereas Chloe and Ai showed a transfer only when the display size was large. Chloe and Ai used some nonrelational cues such as perceptual isolation of the target among uniform distractors (so-called pop-out). In addition to the odd-item search test, various types of probe trials were presented to clarify the controlling relations in multiple-alternative matching to sample. Akira showed a decrement of accuracy as a function of the display size when the search display was nonuniform (i.e., each "distractor" stimulus was not the same), whereas Chloe and Ai showed perfect performance. Furthermore, when the sample was identical to the uniform distractors in the search display, Chloe and Ai never selected an odd-item target, but Akira selected it when the display size was large. These results indicated that Akira's behavior was controlled mainly by relational cues of target-distractor oddity, whereas an identity relation between the sample and the target strongly controlled the performance of Chloe and Ai. PMID:7714449

  3. Self-reported walking ability predicts functional mobility performance in frail older adults.

    PubMed

    Alexander, N B; Guire, K E; Thelen, D G; Ashton-Miller, J A; Schultz, A B; Grunawalt, J C; Giordani, B

    2000-11-01

    To determine how self-reported physical function relates to performance in each of three mobility domains: walking, stance maintenance, and rising from chairs. Cross-sectional analysis of older adults. University-based laboratory and community-based congregate housing facilities. Two hundred twenty-one older adults (mean age, 79.9 years; range, 60-102 years) without clinical evidence of dementia (mean Folstein Mini-Mental State score, 28; range, 24-30). We compared the responses of these older adults on a questionnaire battery used by the Established Populations for the Epidemiologic Study of the Elderly (EPESE) project, to performance on mobility tasks of graded difficulty. Responses to the EPESE battery included: (1) whether assistance was required to perform seven Katz activities of daily living (ADL) items, specifically with walking and transferring; (2) three Rosow-Breslau items, including the ability to walk up stairs and walk a half mile; and (3) five Nagi items, including difficulty stooping, reaching, and lifting objects. The performance measures included the ability to perform, and time taken to perform, tasks in three summary score domains: (1) walking ("Walking," seven tasks, including walking with an assistive device, turning, stair climbing, tandem walking); (2) stance maintenance ("Stance," six tasks, including unipedal, bipedal, tandem, and maximum lean); and (3) chair rise ("Chair Rise," six tasks, including rising from a variety of seat heights with and without the use of hands for assistance). A total score combines scores in each Walking, Stance, and Chair Rise domain. We also analyzed how cognitive/ behavioral factors such as depression and self-efficacy related to the residuals from the self-report and performance-based ANOVA models. Rosow-Breslau items have the strongest relationship with the three performance domains, Walking, Stance, and Chair Rise (eta-squared ranging from 0.21 to 0.44). These three performance domains are as strongly related to one Katz ADL item, walking (eta-squared ranging from 0.15 to 0.33) as all of the Katz ADL items combined (eta-squared ranging from 0.21 to 0.35). Tests of problem solving and psychomotor speed, the Trails A and Trails B tests, are significantly correlated with the residuals from the self-report and performance-based ANOVA models. Compared with the rest of the EPESE self-report items, self-report items related to walking (such as Katz walking and Rosow-Breslau items) are better predictors of functional mobility performance on tasks involving walking, stance maintenance, and rising from chairs. Compared with other self-report items, self-reported walking ability may be the best predictor of overall functional mobility.

  4. Putting Humpty together and pulling him apart: accessing and unbinding the hippocampal item-context engram.

    PubMed

    Sadeh, Talya; Maril, Anat; Bitan, Tali; Goshen-Gottstein, Yonatan

    2012-03-01

    A remarkable act of memory entails binding different forms of information. We focus on the timeless question of how the bound engram is accessed such that its component features-item and context-are extracted. To shed light on this question, we investigate the dynamics between brain structures that together mediate the binding and extraction of item and context. Converging evidence has implicated the Parahippocampal cortex (PHc) in contextual processing, the Perirhinal cortex (PRc) in item processing, and the hippocampus in item-context binding. Effective connectivity analysis was conducted on fMRI data gathered during retrieval on tests that differ with regard to the to-be-extracted information. Results revealed that recall is initiated by context-related PHc activity, followed by hippocampal item-context engram activation, and completed with retrieval of the study-item by the PRc. The reverse path was found for recognition. We thus provide novel evidence for dissociative patterns of item-context unbinding during retrieval. Copyright © 2011 Elsevier Inc. All rights reserved.

  5. Assessing the associative deficit of older adults in long-term and short-term/working memory.

    PubMed

    Chen, Tina; Naveh-Benjamin, Moshe

    2012-09-01

    Older adults exhibit a deficit in associative long-term memory relative to younger adults. However, the literature is inconclusive regarding whether this deficit is attenuated in short-term/working memory. To elucidate the issue, three experiments assessed younger and older adults' item and interitem associative memory and the effects of several variables that might potentially contribute to the inconsistent pattern of results in previous studies. In Experiment 1, participants were tested on item and associative recognition memory with both long-term and short-term retention intervals in a single, continuous recognition paradigm. There was an associative deficit for older adults in the short-term and long-term intervals. Using only short-term intervals, Experiment 2 utilized mixed and blocked test designs to examine the effect of test event salience. Blocking the test did not attenuate the age-related associative deficit seen in the mixed test blocks. Finally, an age-related associative deficit was found in Experiment 3, under both sequential and simultaneous presentation conditions. Even while accounting for some methodological issues, the associative deficit of older adults is evident in short-term/working memory.

  6. A Procedure To Detect Test Bias Present Simultaneously in Several Items.

    ERIC Educational Resources Information Center

    Shealy, Robin; Stout, William

    A statistical procedure is presented that is designed to test for unidirectional test bias existing simultaneously in several items of an ability test, based on the assumption that test bias is incipient within the two groups' ability differences. The proposed procedure--Simultaneous Item Bias (SIB)--is based on a multidimensional item response…

  7. An Item Response Theory Model for Test Bias.

    ERIC Educational Resources Information Center

    Shealy, Robin; Stout, William

    This paper presents a conceptualization of test bias for standardized ability tests which is based on multidimensional, non-parametric, item response theory. An explanation of how individually-biased items can combine through a test score to produce test bias is provided. It is contended that bias, although expressed at the item level, should be…

  8. Item response theory analyses of the Delis-Kaplan Executive Function System card sorting subtest.

    PubMed

    Spencer, Mercedes; Cho, Sun-Joo; Cutting, Laurie E

    2018-02-02

    In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years. We also tested for measurement invariance for these items across age and gender groups using item response theory (IRT). Results of the exploratory factor analysis indicated that a two-factor model that distinguished between verbal and perceptual items provided the best fit to the data. Although the items demonstrated measurement invariance across age groups, measurement invariance was violated for gender groups, with two items demonstrating differential item functioning for males and females. Multigroup analysis using all 16 items indicated that the items were more effective for individuals whose IRT scale scores were relatively high. A single-group explanatory IRT model using 14 non-differential item functioning items showed that for perceptual ability, females scored higher than males and that scores increased with age for both males and females; for verbal ability, the observed increase in scores across age differed for males and females. The implications of these findings are discussed.

  9. The emotion dysregulation inventory: Psychometric properties and item response theory calibration in an autism spectrum disorder sample.

    PubMed

    Mazefsky, Carla A; Yu, Lan; White, Susan W; Siegel, Matthew; Pilkonis, Paul A

    2018-06-01

    Individuals with autism spectrum disorder (ASD) often present with prominent emotion dysregulation that requires treatment but can be difficult to measure. The Emotion Dysregulation Inventory (EDI) was created using methods developed by the Patient-Reported Outcomes Measurement Information System (PROMIS ® ) to capture observable indicators of poor emotion regulation. Caregivers of 1,755 youth with ASD completed 66 candidate EDI items, and the final 30 items were selected based on classical test theory and item response theory (IRT) analyses. The analyses identified two factors: (a) Reactivity, characterized by intense, rapidly escalating, sustained, and poorly regulated negative emotional reactions, and (b) Dysphoria, characterized by anhedonia, sadness, and nervousness. The final items did not show differential item functioning (DIF) based on gender, age, intellectual ability, or verbal ability. Because the final items were calibrated using IRT, even a small number of items offers high precision, minimizing respondent burden. IRT co-calibration of the EDI with related measures demonstrated its superiority in assessing the severity of emotion dysregulation with as few as seven items. Validity of the EDI was supported by expert review, its association with related constructs (e.g., anxiety and depression symptoms, aggression), higher scores in psychiatric inpatients with ASD compared to a community ASD sample, and demonstration of test-retest stability and sensitivity to change. In sum, the EDI provides an efficient and sensitive method to measure emotion dysregulation for clinical assessment, monitoring, and research in youth with ASD of any level of cognitive or verbal ability. Autism Res 2018, 11: 928-941. © 2018 International Society for Autism Research, Wiley Periodicals, Inc. This paper describes a new measure of poor emotional control called the Emotion Dysregulation Inventory (EDI). Caregivers of 1,755 youth with ASD completed candidate items, and advanced statistical techniques were applied to identify the best final items. The EDI is unique because it captures common emotional problems in ASD and is appropriate for both nonverbal and verbal youth. It is an efficient and sensitive measure for use in clinical assessments, monitoring, and research with youth with ASD. © 2018 International Society for Autism Research, Wiley Periodicals, Inc.

  10. Does relative body fat influence the Movement ABC-2 assessment in children with and without developmental coordination disorder?

    PubMed

    Faught, Brent E; Demetriades, Stephen; Hay, John; Cairney, John

    2013-12-01

    Developmental coordination disorder (DCD) is a condition that results in an impairment of gross and/or fine motor coordination. Compromised motor coordination contributes to lower levels of physical activity, which is associated with elevated body fat. The impact of elevated body fat on motor coordination diagnostic assessments in children with DCD has not been established. The purpose of this study was to determine if relative body fat influences performance on the Movement Assessment Battery for Children, 2nd Edition (MABC-2) test items in children with and without DCD. A nested case-control, design was conducted within the Physical Health Activity Study Team longitudinal cohort study. The MABC-2 was used to assess motor coordination to categorize cases and matched controls. Relative body fat was assessed using whole body air displacement plethysmography. Relative body fat was negatively associated with the MABC-2 "balance" subcategory after adjusting for physical activity and DCD status. Relative body fat did not influence the subcategories of "manual dexterity" or "aiming and catching". Item analysis of the three balance tasks indicated that relative body fat significantly influences both "2-board balance" and "zig-zag hopping", but not "walking heel-toe backwards". Children with higher levels of relative body fat do not perform as well on the MABC-2, regardless of whether the have DCD or not. Dynamic balance test items are most negatively influenced by body fat. Health practitioners and researchers should be aware that body fat can influence results when interpreting MABC-2 test scores. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.

  11. Using Reliability and Item Analysis to Evaluate a Teacher-Developed Test in Educational Measurement and Evaluation

    ERIC Educational Resources Information Center

    Quaigrain, Kennedy; Arhin, Ato Kwamina

    2017-01-01

    Item analysis is essential in improving items which will be used again in later tests; it can also be used to eliminate misleading items in a test. The study focused on item and test quality and explored the relationship between difficulty index (p-value) and discrimination index (DI) with distractor efficiency (DE). The study was conducted among…

  12. Introducing a short version of the physical self description questionnaire: new strategies, short-form evaluative criteria, and applications of factor analyses.

    PubMed

    Marsh, Herbert W; Martin, Andrew J; Jackson, Susan

    2010-08-01

    Based on the Physical Self Description Questionnaire (PSDQ) normative archive (n = 1,607 Australian adolescents), 40 of 70 items were selected to construct a new short form (PSDQ-S). The PSDQ-S was evaluated in a new cross-validation sample of 708 Australian adolescents and four additional samples: 349 Australian elite-athlete adolescents, 986 Spanish adolescents, 395 Israeli university students, 760 Australian older adults. Across these six groups, the 11 PSDQ-S factors had consistently high reliabilities and invariant factor structures. Study 1, using a missing-by-design variation of multigroup invariance tests, showed invariance across 40 PSDQ-S items and 70 PSDQ items. Study 2 demonstrated factorial invariance over a 1-year interval (test-retest correlations .57-.90; Mdn = .77), and good convergent and discriminant validity in relation to time. Study 3 showed good and nearly identical support for convergent and discriminant validity of PSDQ and PSDQ-S responses in relation to two other physical self-concept instruments.

  13. The Impact of Non-attempted and Dually-Attempted Items on Person Abilities Using Item Response Theory

    PubMed Central

    Sideridis, Georgios D.; Tsaousis, Ioannis; Al Harbi, Khaleel

    2016-01-01

    The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction. PMID:27790174

  14. The Impact of Non-attempted and Dually-Attempted Items on Person Abilities Using Item Response Theory.

    PubMed

    Sideridis, Georgios D; Tsaousis, Ioannis; Al Harbi, Khaleel

    2016-01-01

    The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction.

  15. Development and psychometric evaluation of a health-related quality of life instrument for individuals with adult-onset hearing loss.

    PubMed

    Stika, Carren J; Hays, Ron D

    2015-07-01

    Self-reports of 'hearing handicap' are available, but a comprehensive measure of health-related quality of life (HRQOL) for individuals with adult-onset hearing loss (AOHL) does not exist. Our objective was to develop and evaluate a multidimensional HRQOL instrument for individuals with AOHL. The Impact of Hearing Loss Inventory Tool (IHEAR-IT) was developed using results of focus groups, a literature review, advisory expert panel input, and cognitive interviews. The 73-item field-test instrument was completed by 409 adults (22-91 years old) with varying degrees of AOHL and from different areas of the USA. Multitrait scaling analysis supported four multi-item scales and five individual items. Internal consistency reliabilities ranged from 0.93 to 0.96 for the scales. Construct validity was supported by correlations between the IHEAR-IT scales and scores on the 36-item Short Form Health Survey, version 2.0 (SF-36v2) mental composite summary (r = 0.32-0.64) and the Hearing Handicap Inventory for the Elderly/Adults (HHIE/HHIA) (r ≥ -0.70). The field test provides initial support for the reliability and construct validity of the IHEAR-IT for evaluating HRQOL of individuals with AOHL. Further research is needed to evaluate the responsiveness to change of the IHEAR-IT scales and identify items for a short-form.

  16. Audio Adapted Assessment Data: Does the Addition of Audio to Written Items Modify the Item Calibration?

    ERIC Educational Resources Information Center

    Snyder, James

    2010-01-01

    This dissertation research examined the changes in item RIT calibration that occurred when adding audio to a set of currently calibrated RIT items and then placing these new items as field test items in the modified assessments on the NWEA MAP test platform. The researcher used test results from over 600 students in the Poway School District in…

  17. RhinAsthma patient perspective: A Rasch validation study.

    PubMed

    Molinengo, Giorgia; Baiardini, Ilaria; Braido, Fulvio; Loera, Barbara

    2018-02-01

    In daily practice, Health-Related Quality of Life (HRQoL) tools are useful for supplementing clinical data with the patient's perspective. To encourage their use by clinicians, the availability of tools that can quickly provide valid results is crucial. A new HRQoL tool has been proposed for patients with asthma and rhinitis: the RhinAsthma Patient Perspective-RAPP. The aim of this study was to evaluate the psychometric robustness of the RAPP using the Item Response Theory (IRT) approach, to evaluate the scalability of items and test whether or not patients use the items response scale correctly. 155 patients (53.5% women, mean age 39.1, range 16-76) were recruited during a multicenter study. RAPP metric properties were investigated using IRT models. Differential item functioning (DIF) was used for gender, age, and asthma control test (ACT). The RAPP adequately fitted the Rating Scale model, demonstrating the equality of the rating scale structure for all items. All statistics on items were satisfactory. The RAPP had adequate internal reliability and showed good ability to discriminate among different groups of participants. DIF analysis indicated that there were no differential item functioning issues for gender. One item showed a DIF by age and four items by ACT. The psychometric evaluation performed using IRT models demonstrated that the RAPP met all the criteria to be considered a reliable and valid method of measurement. From a clinical perspective, this will allow physicians to confidently interpret scores as good indicators of Quality of Life of patients with asthma.

  18. Development and validation of the Cancer Exercise Stereotypes Scale.

    PubMed

    Falzon, Charlène; Sabiston, Catherine; Bergamaschi, Alessandro; Corrion, Karine; Chalabaev, Aïna; D'Arripe-Longueville, Fabienne

    2014-01-01

    The objective of this study was to develop and validate a French-language questionnaire measuring stereotypes related to exercise in cancer patients: The Cancer Exercise Stereotypes Scale (CESS). Four successive steps were carried out with 806 participants. First, a preliminary version was developed on the basis of the relevant literature and qualitative interviews. A test of clarity then led to the reformulation of six of the 30 items. Second, based on the modification indices of the first confirmatory factorial analysis, 11 of the 30 initial items were deleted. A new factorial structure analysis showed a good fit and validated a 19-item instrument with five subscales. Third, the stability of the instrument was tested over time. Last, tests of construct validity were conducted to examine convergent validity and discriminant validity. The French-language CESS appears to have good psychometric qualities and can be used to test theoretical tenets and inform intervention strategies on ways to foster exercise in cancer patients.

  19. Student science achievement and the integration of Indigenous knowledge on standardized tests

    NASA Astrophysics Data System (ADS)

    Dupuis, Juliann; Abrams, Eleanor

    2017-09-01

    In this article, we examine how American Indian students in Montana performed on standardized state science assessments when a small number of test items based upon traditional science knowledge from a cultural curriculum, "Indian Education for All", were included. Montana is the first state in the US to mandate the use of a culturally relevant curriculum in all schools and to incorporate this curriculum into a portion of the standardized assessment items. This study compares White and American Indian student test scores on these particular test items to determine how White and American Indian students perform on culturally relevant test items compared to traditional standard science test items. The connections between student achievement on adapted culturally relevant science test items versus traditional items brings valuable insights to the fields of science education, research on student assessments, and Indigenous studies.

  20. Two are not better than one: Combining unitization and relational encoding strategies.

    PubMed

    Tu, Hsiao-Wei; Diana, Rachel A

    2016-01-01

    In recognition memory, recollection is defined as retrieval of the context associated with an event, whereas familiarity is defined as retrieval based on item strength alone. Recent studies have shown that conventional recollection-based tasks, in which context details are manipulated for source memory assessment at test, can also rely on familiarity when context information is "unitized" with the relevant item information at encoding. Unlike naturalistic episodic memories that include many context details encoded in different ways simultaneously, previous studies have focused on unitization and its effect on the recognition of a single context detail. To further understand how various encoding strategies operate on item and context representations, we independently assigned unitization and relational association to 2 context details (size and color) of each item and tested the contribution of recollection and familiarity to source recognition of each detail. The influence of familiarity on retrieval of each context detail was compared as a function of the encoding strategy used for each detail. Receiver operating characteristic curves suggested that the unitization effect was not additive and that similar levels of familiarity occurred for 1 or multiple details when unitization was the only strategy applied during encoding. On the other hand, a detrimental effect was found when relational encoding and unitization were simultaneously applied to 1 item such that a salient nonunitized context detail interfered with the effortful processing required to unitize an accompanying context detail. However, this detrimental effect was not reciprocal and possibly dependent on the nature of individual context details. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  1. An Effect Size Measure for Raju's Differential Functioning for Items and Tests

    ERIC Educational Resources Information Center

    Wright, Keith D.; Oshima, T. C.

    2015-01-01

    This study established an effect size measure for differential functioning for items and tests' noncompensatory differential item functioning (NCDIF). The Mantel-Haenszel parameter served as the benchmark for developing NCDIF's effect size measure for reporting moderate and large differential item functioning in test items. The effect size of…

  2. Influence of Fallible Item Parameters on Test Information During Adaptive Testing.

    ERIC Educational Resources Information Center

    Wetzel, C. Douglas; McBride, James R.

    Computer simulation was used to assess the effects of item parameter estimation errors on different item selection strategies used in adaptive and conventional testing. To determine whether these effects reduced the advantages of certain optimal item selection strategies, simulations were repeated in the presence and absence of item parameter…

  3. A Guide to Item Banking in Education. (Third Edition).

    ERIC Educational Resources Information Center

    Naccarato, Richard W.

    The current status of banks of test items existing across the United States was determined through a survey conducted between September and December 1987. Item "bank" in this context does not imply that the test items are available in computerized form, but simply that "deposited" test items can be withdrawn for use. Emphasis…

  4. Development and validation of an energy-balance knowledge test for fourth- and fifth-grade students.

    PubMed

    Chen, Senlin; Zhu, Xihe; Kang, Minsoo

    2017-05-01

    A valid test measuring children's energy-balance (EB) knowledge is lacking in research. This study developed and validated the energy-balance knowledge test (EBKT) for fourth and fifth grade students. The original EBKT contained 25 items but was reduced to 23 items based on pilot result and intensive expert panel discussion. De-identified data were collected from 468 fourth and fifth grade students enrolled in four schools to examine the psychometric properties of the EBKT items. The Rasch model analysis was conducted using the Winstep 3.65.0 software. Differential item functioning (DIF) analysis flagged 1 item (item #4) functioning differently between boys and girls, which was deleted. The final 22-item EBKT showed desirable model-data fit indices. The items had large variability ranging from -3.58 logit (item #10, the easiest) to 1.70 logit (item #3, the hardest). The average person ability on the test was 0.28 logit (SD = .78). Additional analyses supported known-group difference validity of the EBKT scores in capturing gender- and grade-based ability differences. The test was overall valid but could be further improved by expanding test items to discern various ability levels. For lack of a better test, researchers and practitioners may use the EBKT to assess fourth- and fifth-grade students' EB knowledge.

  5. Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

    NASA Astrophysics Data System (ADS)

    Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

    2016-12-01

    This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.

  6. Modeling Local Item Dependence in Cloze and Reading Comprehension Test Items Using Testlet Response Theory

    ERIC Educational Resources Information Center

    Baghaei, Purya; Ravand, Hamdollah

    2016-01-01

    In this study the magnitudes of local dependence generated by cloze test items and reading comprehension items were compared and their impact on parameter estimates and test precision was investigated. An advanced English as a foreign language reading comprehension test containing three reading passages and a cloze test was analyzed with a…

  7. Variable-Length Computerized Adaptive Testing: Adaptation of the A-Stratified Strategy in Item Selection with Content Balancing

    ERIC Educational Resources Information Center

    Huo, Yan

    2009-01-01

    Variable-length computerized adaptive testing (CAT) can provide examinees with tailored test lengths. With the fixed standard error of measurement ("SEM") termination rule, variable-length CAT can achieve predetermined measurement precision by using relatively shorter tests compared to fixed-length CAT. To explore the application of…

  8. Machine Shop. Criterion-Referenced Test (CRT) Item Bank.

    ERIC Educational Resources Information Center

    Davis, Diane, Ed.

    This drafting criterion-referenced test item bank is keyed to the machine shop competency profile developed by industry and education professionals in Missouri. The 16 references used for drafting the test items are listed. Test items are arranged under these categories: orientation to machine shop; performing mathematical calculations; performing…

  9. Rescuing Computerized Testing by Breaking Zipf's Law.

    ERIC Educational Resources Information Center

    Wainer, Howard

    2000-01-01

    Suggests that because of the nonlinear relationship between item usage and item security, the problems of test security posed by continuous administration of standardized tests cannot be resolved merely by increasing the size of the item pool. Offers alternative strategies to overcome these problems, distributing test items so as to avoid the…

  10. Development and validation of the positive affect and well-being scale for the neurology quality of life (Neuro-QOL) measurement system.

    PubMed

    Salsman, John M; Victorson, David; Choi, Seung W; Peterman, Amy H; Heinemann, Allen W; Nowinski, Cindy; Cella, David

    2013-11-01

    To develop and validate an item-response theory-based patient-reported outcomes assessment tool of positive affect and well-being (PAW). This is part of a larger NINDS-funded study to develop a health-related quality of life measurement system across major neurological disorders, called Neuro-QOL. Informed by a literature review and qualitative input from clinicians and patients, item pools were created to assess PAW concepts. Items were administered to a general population sample (N = 513) and a group of individuals with a variety of neurologic conditions (N = 581) for calibration and validation purposes, respectively. A 23-item calibrated bank and a 9-item short form of PAW was developed, reflecting components of positive affect, life satisfaction, or an overall sense of purpose and meaning. The Neuro-QOL PAW measure demonstrated sufficient unidimensionality and displayed good internal consistency, test-retest reliability, model fit, convergent and discriminant validity, and responsiveness. The Neuro-QOL PAW measure was designed to aid clinicians and researchers to better evaluate and understand the potential role of positive health processes for individuals with chronic neurological conditions. Further psychometric testing within and between neurological conditions, as well as testing in non-neurologic chronic diseases, will help evaluate the generalizability of this new tool.

  11. Scales for assessing self-efficacy of nurses and assistants for preventing falls

    PubMed Central

    Dykes, Patricia C.; Carroll, Diane; McColgan, Kerry; Hurley, Ann C.; Lipsitz, Stuart R.; Colombo, Lisa; Zuyev, Lyubov; Middleton, Blackford

    2011-01-01

    Aim This paper is a report of the development and testing of the Self-Efficacy for Preventing Falls Nurse and Assistant scales. Background Patient falls and fall-related injuries are traumatic ordeals for patients, family members and providers, and carry a toll for hospitals. Self-efficacy is an important factor in determining actions persons take and levels of performance they achieve. Performance of individual caregivers is linked to the overall performance of hospitals. Scales to assess nurses and certified nursing assistants’ self-efficacy to prevent patients from falling would allow for targeting resources to increase SE, resulting in improved individual performance and ultimately decreased numbers of patient falls. Method Four phases of instrument development were carried out to (1) generate individual items from eight focus groups (four each nurse and assistant conducted in October 2007), (2) develop prototype scales, (3) determine content validity during a second series of four nurse and assistant focus groups (January 2008) and (4) conduct item analysis, paired t-tests, Student’s t-tests and internal consistency reliability to refine and confirm the scales. Data were collected during February–December, 2008. Results The 11-item Self-Efficacy for Preventing Falls Nurse had an alpha of 0·89 with all items in the range criterion of 0·3–0·7 for item total correlation. The 8-item Self-Efficacy for Preventing Falls Assistant had an alpha of 0·74 and all items had item total correlations in the 0·3–0·7 range. Conclusions The Self-Efficacy for Preventing Falls Nurse and Self-Efficacy for Preventing Falls Assistant scales demonstrated psychometric adequacy and are recommended to measure bedside staff’s self-efficacy beliefs in preventing patient falls. PMID:21073506

  12. Thyroid-specific questions on work ability showed known-groups validity among Danes with thyroid diseases.

    PubMed

    Nexo, Mette Andersen; Watt, Torquil; Bonnema, Steen Joop; Hegedüs, Laszlo; Rasmussen, Åse Krogh; Feldt-Rasmussen, Ulla; Bjorner, Jakob Bue

    2015-07-01

    We aimed to identify the best approach to work ability assessment in patients with thyroid disease by evaluating the factor structure, measurement equivalence, known-groups validity, and predictive validity of a broad set of work ability items. Based on the literature and interviews with thyroid patients, 24 work ability items were selected from previous questionnaires, revised, or developed anew. Items were tested among 632 patients with thyroid disease (non-toxic goiter, toxic nodular goiter, Graves' disease (with or without orbitopathy), autoimmune hypothyroidism, and other thyroid diseases), 391 of which had participated in a study 5 years previously. Responses to select items were compared to general population data. We used confirmatory factor analyses for categorical data, logistic regression analyses and tests of differential item function, and head-to-head comparisons of relative validity in distinguishing known groups. Although all work ability items loaded on a common factor, the optimal factor solution included five factors: role physical, role emotional, thyroid-specific limitations, work limitations (without disease attribution), and work performance. The scale on thyroid-specific limitations showed the most power in distinguishing clinical groups and time since diagnosis. A global single item proved useful for comparisons with the general population, and a thyroid-specific item predicted labor market exclusion within the next 5 years (OR 5.0, 95 % CI 2.7-9.1). Items on work limitations with attribution to thyroid disease were most effective in detecting impact on work ability and showed good predictive validity. Generic work ability items remain useful for general population comparisons.

  13. An Evaluation of "Intentional" Weighting of Extended-Response or Constructed-Response Items in Tests with Mixed Item Types.

    ERIC Educational Resources Information Center

    Ito, Kyoko; Sykes, Robert C.

    This study investigated the practice of weighting a type of test item, such as constructed response, more than other types of items, such as selected response, to compute student scores for a mixed-item type of test. The study used data from statewide writing field tests in grades 3, 5, and 8 and considered two contexts, that in which a single…

  14. Moving Knowledge Acquisition From the Lecture Hall to the Student Home: A Prospective Intervention Study

    PubMed Central

    Grefe, Clemens; Brown, Jamie; Meyer, Katharina; Schuelper, Nikolai; Anders, Sven

    2015-01-01

    Background Podcasts are popular with medical students, but the impact of podcast use on learning outcomes in undergraduate medical education has not been studied in detail. Objective Our aim was to assess the impact of podcasts accompanied by quiz questions and lecture attendance on short- and medium-term knowledge retention. Methods Students enrolled for a cardio-respiratory teaching module were asked to prepare for 10 specific lectures by watching podcasts and submitting answers to related quiz questions before attending live lectures. Performance on the same questions was assessed in a surprise test and a retention test. Results Watching podcasts and submitting answers to quiz questions (versus no podcast/quiz use) was associated with significantly better test performance in all items in the surprise test and 7 items in the retention test. Lecture attendance (versus no attendance) was associated with higher test performance in 3 items and 1 item, respectively. In a linear regression analysis adjusted for age, gender, and overall performance levels, both podcast/quiz use and lecture attendance were significant predictors of student performance. However, the variance explained by podcast/quiz use was greater than the variance explained by lecture attendance in the surprise test (38.7% vs 2.2%) and retention test (19.1% vs 4.0%). Conclusions When used in conjunction with quiz questions, podcasts have the potential to foster knowledge acquisition and retention over and above the effect of live lectures. PMID:26416467

  15. Do the Guideline Violations Influence Test Difficulty of High-Stake Test?: An Investigation on University Entrance Examination in Turkey

    ERIC Educational Resources Information Center

    Atalmis, Erkan Hasan

    2016-01-01

    Multiple-choice (MC) items are commonly used in high-stake tests. Thus, each item of such tests should be meticulously constructed to increase the accuracy of decisions based on test results. Haladyna and his colleagues (2002) addressed the valid item-writing guidelines to construct high quality MC items in order to increase test reliability and…

  16. Support for an auto-associative model of spoken cued recall: evidence from fMRI.

    PubMed

    de Zubicaray, Greig; McMahon, Katie; Eastburn, Mathew; Pringle, Alan J; Lorenz, Lina; Humphreys, Michael S

    2007-03-02

    Cued recall and item recognition are considered the standard episodic memory retrieval tasks. However, only the neural correlates of the latter have been studied in detail with fMRI. Using an event-related fMRI experimental design that permits spoken responses, we tested hypotheses from an auto-associative model of cued recall and item recognition [Chappell, M., & Humphreys, M. S. (1994). An auto-associative neural network for sparse representations: Analysis and application to models of recognition and cued recall. Psychological Review, 101, 103-128]. In brief, the model assumes that cues elicit a network of phonological short term memory (STM) and semantic long term memory (LTM) representations distributed throughout the neocortex as patterns of sparse activations. This information is transferred to the hippocampus which converges upon the item closest to a stored pattern and outputs a response. Word pairs were learned from a study list, with one member of the pair serving as the cue at test. Unstudied words were also intermingled at test in order to provide an analogue of yes/no recognition tasks. Compared to incorrectly rejected studied items (misses) and correctly rejected (CR) unstudied items, correctly recalled items (hits) elicited increased responses in the left hippocampus and neocortical regions including the left inferior prefrontal cortex (LIPC), left mid lateral temporal cortex and inferior parietal cortex, consistent with predictions from the model. This network was very similar to that observed in yes/no recognition studies, supporting proposals that cued recall and item recognition involve common rather than separate mechanisms.

  17. Absolute and Relative Measures of Instructional Sensitivity

    ERIC Educational Resources Information Center

    Naumann, Alexander; Hartig, Johannes; Hochweber, Jan

    2017-01-01

    Valid inferences on teaching drawn from students' test scores require that tests are sensitive to the instruction students received in class. Accordingly, measures of the test items' instructional sensitivity provide empirical support for validity claims about inferences on instruction. In the present study, we first introduce the concepts of…

  18. The Heteroscedastic Graded Response Model with a Skewed Latent Trait: Testing Statistical and Substantive Hypotheses Related to Skewed Item Category Functions

    ERIC Educational Resources Information Center

    Molenaar, Dylan; Dolan, Conor V.; de Boeck, Paul

    2012-01-01

    The Graded Response Model (GRM; Samejima, "Estimation of ability using a response pattern of graded scores," Psychometric Monograph No. 17, Richmond, VA: The Psychometric Society, 1969) can be derived by assuming a linear regression of a continuous variable, Z, on the trait, [theta], to underlie the ordinal item scores (Takane & de Leeuw in…

  19. Full-Information Item Bi-Factor Analysis. ONR Technical Report. [Biometric Lab Report No. 90-2.

    ERIC Educational Resources Information Center

    Gibbons, Robert D.; And Others

    A plausible "s"-factor solution for many types of psychological and educational tests is one in which there is one general factor and "s - 1" group- or method-related factors. The bi-factor solution results from the constraint that each item has a non-zero loading on the primary dimension "alpha(sub j1)" and at most…

  20. Development of a wheelchair mobility skills test for children and adolescents: combining evidence with clinical expertise.

    PubMed

    Sol, Marleen Elisabeth; Verschuren, Olaf; de Groot, Laura; de Groot, Janke Frederike

    2017-02-13

    Wheelchair mobility skills (WMS) training is regarded by children using a manual wheelchair and their parents as an important factor to improve participation and daily physical activity. Currently, there is no outcome measure available for the evaluation of WMS in children. Several wheelchair mobility outcome measures have been developed for adults, but none of these have been validated in children. Therefore the objective of this study is to develop a WMS outcome measure for children using the current knowledge from literature in combination with the clinical expertise of health care professionals, children and their parents. Mixed methods approach. Phase 1: Item identification of WMS items through a systematic review using the 'COnsensus-based Standards for the selection of health Measurement Instruments' (COSMIN) recommendations. Phase 2: Item selection and validation of relevant WMS items for children, using a focus group and interviews with children using a manual wheelchair, their parents and health care professionals. Phase 3: Feasibility of the newly developed Utrecht Pediatric Wheelchair Mobility Skills Test (UP-WMST) through pilot testing. Phase 1: Data analysis and synthesis of nine WMS related outcome measures showed there is no widely used outcome measure with levels of evidence across all measurement properties. However, four outcome measures showed some levels of evidence on reliability and validity for adults. Twenty-two WMS items with the best clinimetric properties were selected for further analysis in phase 2. Phase 2: Fifteen items were deemed as relevant for children, one item needed adaptation and six items were considered not relevant for assessing WMS in children. Phase 3: Two health care professionals administered the UP-WMST in eight children. The instructions of the UP-WMST were clear, but the scoring method of the height difference items needed adaptation. The outdoor items for rolling over soft surface and the side slope item were excluded in the final version of the UP-WMST due to logistic reasons. The newly developed 15 item UP-WMST is a validated outcome measure which is easy to administer in children using a manual wheelchair. More research regarding reliability, construct validity and responsiveness is warranted before the UP-WMST can be used in practice.

  1. Item difficulty and item validity for the Children's Group Embedded Figures Test.

    PubMed

    Rusch, R R; Trigg, C L; Brogan, R; Petriquin, S

    1994-02-01

    The validity and reliability of the Children's Group Embedded Figures Test was reported for students in Grade 2 by Cromack and Stone in 1980; however, a search of the literature indicates no evidence for internal consistency or item analysis. Hence the purpose of this study was to examine the item difficulty and item validity of the test with children in Grades 1 and 2. Confusion in the literature over development and use of this test was seemingly resolved through analysis of these descriptions and through an interview with the test developer. One early-appearing item was unreasonably difficult. Two or three other items were quite difficult and made little contribution to the total score. Caution is recommended, however, in any reordering or elimination of items based on these findings, given the limited number of subjects (n = 84).

  2. Weapon Performance Testing and Analysis: The MODI-PAC Round, the Number 4 Lead-Shot Round, and the Flying Baton

    DTIC Science & Technology

    1976-01-01

    items. The items tested were the MODI-PAC, a proprietary item of Reming)on Arms Company, a standard 12 - gauge round of No. 4 lead shot, and an...to refrain from testing this item. Therefore, the final selection of items for testing were (1) the MODI-PAC, (2) a standard 12 - gauge shotgun round of...The first item evaluated was the MODI-PAC5. The MOQ1-PAC which standsfor “modified impact “ is a 12 - gauge shotgun shell loaded with approximately 320

  3. "Blue flags", development of a short clinical questionnaire on work-related psychosocial risk factors - a validation study in primary care.

    PubMed

    Post Sennehed, Charlotte; Gard, Gunvor; Holmberg, Sara; Stigmar, Kjerstin; Forsbrand, Malin; Grahn, Birgitta

    2017-07-24

    Working conditions substantially influence health, work ability and sick leave. Useful instruments to help clinicians pay attention to working conditions are lacking in primary care (PC). The aim of this study was to test the validity of a short "Blue flags" questionnaire, which focuses on work-related psychosocial risk factors and any potential need for contacts and/or actions at the workplace. From the original"The General Nordic Questionnaire" (QPS Nordic ) the research group identified five content areas with a total of 51 items which were considered to be most relevant focusing on work-related psychosocial risk factors. Fourteen items were selected from the identified QPS Nordic content areas and organised in a short questionnaire "Blue flags". These 14 items were validated towards the 51 QPS Nordic items. Content validity was reviewed by a professional panel and a patient panel. Structural and concurrent validity were also tested within a randomised clinical trial. The two panels (n = 111) considered the 14 psychosocial items to be relevant. A four-factor model was extracted with an explained variance of 25.2%, 14.9%, 10.9% and 8.3% respectively. All 14 items showed satisfactory loadings on all factors. Concerning concurrent validity the overall correlation was very strong r s  = 0.87 (p < 0.001).). Correlations were moderately strong for factor one, r s  = 0.62 (p < 0.001) and factor two, r s  = 0.74 (p < 0.001). Factor three and factor four were weaker, bur still fair and significant at r s  = 0.53 (p < 0.001) and r s  = 0.41 (p < 0.001) respectively. The internal consistency of the whole "Blue flags" was good with Cronbach's alpha of 0.76. The content, structural and concurrent validity were satisfactory in this first step of development of the "Blue flags" questionnaire. In summary, the overall validity is considered acceptable. Testing in clinical contexts and in other patient populations is recommended to ensure predictive validity and usefulness.

  4. Interactions Between Item Content And Group Membership on Achievement Test Items.

    ERIC Educational Resources Information Center

    Linn, Robert L.; Harnisch, Delwyn L.

    The purpose of this investigation was to examine the interaction of item content and group membership on achievement test items. Estimates of the parameters of the three parameter logistic model were obtained on the 46 item math test for the sample of eighth grade students (N = 2055) participating in the Illinois Inventory of Educational Progress,…

  5. Effects of Item Exposure for Conventional Examinations in a Continuous Testing Environment.

    ERIC Educational Resources Information Center

    Hertz, Norman R.; Chinn, Roberta N.

    This study explored the effect of item exposure on two conventional examinations administered as computer-based tests. A principal hypothesis was that item exposure would have little or no effect on average difficulty of the items over the course of an administrative cycle. This hypothesis was tested by exploring conventional item statistics and…

  6. Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement.

    PubMed

    McInnes, Matthew D F; Moher, David; Thombs, Brett D; McGrath, Trevor A; Bossuyt, Patrick M; Clifford, Tammy; Cohen, Jérémie F; Deeks, Jonathan J; Gatsonis, Constantine; Hooft, Lotty; Hunt, Harriet A; Hyde, Christopher J; Korevaar, Daniël A; Leeflang, Mariska M G; Macaskill, Petra; Reitsma, Johannes B; Rodin, Rachel; Rutjes, Anne W S; Salameh, Jean-Paul; Stevens, Adrienne; Takwoingi, Yemisi; Tonelli, Marcello; Weeks, Laura; Whiting, Penny; Willis, Brian H

    2018-01-23

    Systematic reviews of diagnostic test accuracy synthesize data from primary diagnostic studies that have evaluated the accuracy of 1 or more index tests against a reference standard, provide estimates of test performance, allow comparisons of the accuracy of different tests, and facilitate the identification of sources of variability in test accuracy. To develop the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) diagnostic test accuracy guideline as a stand-alone extension of the PRISMA statement. Modifications to the PRISMA statement reflect the specific requirements for reporting of systematic reviews and meta-analyses of diagnostic test accuracy studies and the abstracts for these reviews. Established standards from the Enhancing the Quality and Transparency of Health Research (EQUATOR) Network were followed for the development of the guideline. The original PRISMA statement was used as a framework on which to modify and add items. A group of 24 multidisciplinary experts used a systematic review of articles on existing reporting guidelines and methods, a 3-round Delphi process, a consensus meeting, pilot testing, and iterative refinement to develop the PRISMA diagnostic test accuracy guideline. The final version of the PRISMA diagnostic test accuracy guideline checklist was approved by the group. The systematic review (produced 64 items) and the Delphi process (provided feedback on 7 proposed items; 1 item was later split into 2 items) identified 71 potentially relevant items for consideration. The Delphi process reduced these to 60 items that were discussed at the consensus meeting. Following the meeting, pilot testing and iterative feedback were used to generate the 27-item PRISMA diagnostic test accuracy checklist. To reflect specific or optimal contemporary systematic review methods for diagnostic test accuracy, 8 of the 27 original PRISMA items were left unchanged, 17 were modified, 2 were added, and 2 were omitted. The 27-item PRISMA diagnostic test accuracy checklist provides specific guidance for reporting of systematic reviews. The PRISMA diagnostic test accuracy guideline can facilitate the transparent reporting of reviews, and may assist in the evaluation of validity and applicability, enhance replicability of reviews, and make the results from systematic reviews of diagnostic test accuracy studies more useful.

  7. [Development and validation of a questionnaire on knowledge and personal hygiene habits in childhood (HICORIN®)].

    PubMed

    Moreno-Martínez, Francisco José; Ruzafa-Martínez, María; Ramos-Morcillo, Antonio Jesús; Gómez García, Carmen Isabel; Hernández-Susarte, Ana María

    2015-01-01

    To develop and validate a questionnaire on the integral assessment of the habits and knowledge in personal hygiene in children between 7 to 12 years old in the educational, social and health environment. Cross-sectional study for the validation of a questionnaire. One primary and secondary school and one children's home in the Region of Murcia, Spain. A total of 86 children were included (80 from a primary and secondary school; 6 from a children's home), as well as 7 experts. Content validation by experts; qualitative assessment; identify difficulties related to some questions, item response analysis, and test-retest reliability. After the literature search, 20 tools that included items related to child body hygiene were obtained. The researchers selected 34 items and drafted 48 additional ones. After content validity by the experts, the questionnaire (HICORIN®) was reduced to 63 items, and consisted of 7 dimensions of child personal hygiene (skin, hair, hands, oral, feet, ears, and intimate hygiene). After with the children some terms were adapted to improve their understanding. Only two items had non-response rates that exceeded 10%. The test-retest showed that 84.1% of the items had between very good and moderate reliability. HICORIN® is a reliable and valid instrument that integrally assesses the habits and knowledge in personal hygiene in children between 7-12 years old. It is applicable in educative and social and health environments and in children from different socioeconomic levels. Copyright © 2014 Elsevier España, S.L.U. All rights reserved.

  8. Development and validation of a measure of workplace climate for healthy weight maintenance.

    PubMed

    Sliter, Katherine A

    2013-07-01

    Due to the obesity epidemic, an increasing amount of research is being conducted to better understand the antecedents and consequences of excess employee weight. One construct often of interest to researchers in this area is organizational climate. Unfortunately, a viable measure of climate, as related to employee weight, does not exist. The purpose of this study was to remedy this by developing and validating a concise, psychometrically sound measure of climate for healthy weight. An item pool was developed based on surveys of full-time employees, and a sorting task was used to eliminate ambiguous items. Items were pilot tested by a sample of 338 full-time employees, and the item pool was reduced through item response theory (IRT) and reliability analyses. Finally, the retained 14 items, comprising 3 subscales, were completed by a sample of 360 full-time employees, representing 26 different organizations from across the United States. Multilevel modeling indicated that sufficient variance was explained by group membership to support aggregation, and confirmatory factor analysis (CFA) supported the hypothesized model of 3 subscale factors and an overall climate factor. Nine hypotheses specific to construct validation were tested. Scores on the new scale correlated significantly with individual-level reports of psychological constructs (e.g., health motivation, general leadership support for health) and physiological phenomena (e.g., body mass index [BMI], physical health problems) to which they should theoretically relate, supporting construct validity. Implications for the use of this scale in both applied and research settings are discussed. PsycINFO Database Record (c) 2013 APA, all rights reserved.

  9. Source Memory for Self and Other in Patients With Mild Cognitive Impairment due to Alzheimer’s Disease

    PubMed Central

    Deason, Rebecca G.; Budson, Andrew E.; Gutchess, Angela H.

    2016-01-01

    Objectives. The present study examined the role of enactment in source memory in a cognitively impaired population. As seen in healthy older adults, it was predicted that source memory in people with mild cognitive impairment due to Alzheimer’s disease (MCI-AD) would benefit from the self-reference aspect of enactment. Method. Seventeen participants with MCI-AD and 18 controls worked in small groups to pack a picnic basket and suitcase and were later tested for their source memory for each item. Results. For item memory, self-referencing improved corrected recognition scores for both MCI-AD and control participants. The MCI-AD group did not demonstrate the same benefit as controls in correct source memory for self-related items. However, those with MCI-AD were relatively less likely to misattribute new items to the self and more likely to misattribute new items to others when committing errors, compared with controls. Discussion. The enactment effect and self-referencing did not enhance accurate source memory more than other referencing for patients with MCI-AD. However, people with MCI-AD benefited in item memory and source memory, being less likely to falsely claim new items as their own, indicating some self-reference benefit occurs for people with MCI-AD. PMID:24904049

  10. Source Memory for Self and Other in Patients With Mild Cognitive Impairment due to Alzheimer's Disease.

    PubMed

    Rosa, Nicole M; Deason, Rebecca G; Budson, Andrew E; Gutchess, Angela H

    2016-01-01

    The present study examined the role of enactment in source memory in a cognitively impaired population. As seen in healthy older adults, it was predicted that source memory in people with mild cognitive impairment due to Alzheimer's disease (MCI-AD) would benefit from the self-reference aspect of enactment. Seventeen participants with MCI-AD and 18 controls worked in small groups to pack a picnic basket and suitcase and were later tested for their source memory for each item. For item memory, self-referencing improved corrected recognition scores for both MCI-AD and control participants. The MCI-AD group did not demonstrate the same benefit as controls in correct source memory for self-related items. However, those with MCI-AD were relatively less likely to misattribute new items to the self and more likely to misattribute new items to others when committing errors, compared with controls. The enactment effect and self-referencing did not enhance accurate source memory more than other referencing for patients with MCI-AD. However, people with MCI-AD benefited in item memory and source memory, being less likely to falsely claim new items as their own, indicating some self-reference benefit occurs for people with MCI-AD. Published by Oxford University Press on behalf of the Gerontological Society of America 2014.

  11. An Efficiency Balanced Information Criterion for Item Selection in Computerized Adaptive Testing

    ERIC Educational Resources Information Center

    Han, Kyung T.

    2012-01-01

    Successful administration of computerized adaptive testing (CAT) programs in educational settings requires that test security and item exposure control issues be taken seriously. Developing an item selection algorithm that strikes the right balance between test precision and level of item pool utilization is the key to successful implementation…

  12. Using Automatic Item Generation to Meet the Increasing Item Demands of High-Stakes Educational and Occupational Assessment

    ERIC Educational Resources Information Center

    Arendasy, Martin E.; Sommer, Markus

    2012-01-01

    The use of new test administration technologies such as computerized adaptive testing in high-stakes educational and occupational assessments demands large item pools. Classic item construction processes and previous approaches to automatic item generation faced the problems of a considerable loss of items after the item calibration phase. In this…

  13. Item Purification Does Not Always Improve DIF Detection: A Counterexample with Angoff's Delta Plot

    ERIC Educational Resources Information Center

    Magis, David; Facon, Bruno

    2013-01-01

    Item purification is an iterative process that is often advocated as improving the identification of items affected by differential item functioning (DIF). With test-score-based DIF detection methods, item purification iteratively removes the items currently flagged as DIF from the test scores to get purified sets of items, unaffected by DIF. The…

  14. [Difference analysis among majors in medical parasitology exam papers by test item bank proposition].

    PubMed

    Jia, Lin-Zhi; Ya-Jun, Ma; Cao, Yi; Qian, Fen; Li, Xiang-Yu

    2012-04-30

    The quality index among "Medical Parasitology" exam papers and measured data for students in three majors from the university in 2010 were compared and analyzed. The exam papers were formed from the test item bank. The alpha reliability coefficients of the three exam papers were above 0.70. The knowledge structure and capacity structure of the exam papers were basically balanced. But the alpha reliability coefficients of the second major was the lowest, mainly due to quality of test items in the exam paper and the failure of revising the index of test item bank in time. This observation demonstrated that revising the test items and their index in the item bank according to the measured data can improve the quality of test item bank proposition and reduce the difference among exam papers.

  15. Health measurement using the ICF: Test-retest reliability study of ICF codes and qualifiers in geriatric care

    PubMed Central

    Okochi, Jiro; Utsunomiya, Sakiko; Takahashi, Tai

    2005-01-01

    Background The International Classification of Functioning, Disability and Health (ICF) was published by the World Health Organization (WHO) to standardize descriptions of health and disability. Little is known about the reliability and clinical relevance of measurements using the ICF and its qualifiers. This study examines the test-retest reliability of ICF codes, and the rate of immeasurability in long-term care settings of the elderly to evaluate the clinical applicability of the ICF and its qualifiers, and the ICF checklist. Methods Reliability of 85 body function (BF) items and 152 activity and participation (AP) items of the ICF was studied using a test-retest procedure with a sample of 742 elderly persons from 59 institutional and at home care service centers. Test-retest reliability was estimated using the weighted kappa statistic. The clinical relevance of the ICF was estimated by calculating immeasurability rate. The effect of the measurement settings and evaluators' experience was analyzed by stratification of these variables. The properties of each item were evaluated using both the kappa statistic and immeasurability rate to assess the clinical applicability of WHO's ICF checklist in the elderly care setting. Results The median of the weighted kappa statistics of 85 BF and 152 AP items were 0.46 and 0.55 respectively. The reproducibility statistics improved when the measurements were performed by experienced evaluators. Some chapters such as genitourinary and reproductive functions in the BF domain and major life area in the AP domain contained more items with lower test-retest reliability measures and rated as immeasurable than in the other chapters. Some items in the ICF checklist were rated as unreliable and immeasurable. Conclusion The reliability of the ICF codes when measured with the current ICF qualifiers is relatively low. The result in increase in reliability according to evaluators' experience suggests proper education will have positive effects to raise the reliability. The ICF checklist contains some items that are difficult to be applied in the geriatric care settings. The improvements should be achieved by selecting the most relevant items for each measurement and by developing appropriate qualifiers for each code according to the interest of the users. PMID:16050960

  16. Initial evaluation of an interactive test of sentence gist recognition.

    PubMed

    Tye-Murray, N; Witt, S; Castelloe, J

    1996-12-01

    The laser videodisc-based Sentence Gist Recognition (SGR) test consists of sets of topically related sentences that are cued by short film clips. Clients respond to test items by selecting picture illustrations and may interact with the talker by using repair strategies when they do not recognize a test item. The two experiments, involving 40 and 35 adult subjects, respectively, indicated that the SGR may better predict subjective measures of speechreading and listening performance than more traditional audiologic sentence and nonsense syllable tests. Data from cochlear implant users indicated that the SGR accounted for a greater percentage of the variance for selected items of the Communication Profile for the Hearing-Impaired and the Speechreading Questionnaire for Cochlear-Implant Users than two other audiologic tests. As in previous work, subjects were most apt to ask the talker to repeat an utterance that they did not recognize than to ask the talker to restructure it. It is suggested that the SGR may reflect the interactive nature of conversation and provide a simulated real-world listening and/or speechreading task. The principles underlaying this test are consistent with the development of other computer technologies and concepts, such as compact discinteractive and virtual reality.

  17. Item Review and the Rearrangement Procedure: Its Process and Its Results

    ERIC Educational Resources Information Center

    Papanastasiou, Elena C.

    2005-01-01

    Permitting item review is to the benefit of the examinees who typically increase their test scores with item review. However, testing companies do not prefer item review since it does not follow the logic on which adaptive tests are based, and since it is prone to cheating strategies. Consequently, item review is not permitted in many adaptive…

  18. A Model-Based Method for Content Validation of Automatically Generated Test Items

    ERIC Educational Resources Information Center

    Zhang, Xinxin; Gierl, Mark

    2016-01-01

    The purpose of this study is to describe a methodology to recover the item model used to generate multiple-choice test items with a novel graph theory approach. Beginning with the generated test items and working backward to recover the original item model provides a model-based method for validating the content used to automatically generate test…

  19. Have a little faith: measuring the impact of illness on positive and negative aspects of faith.

    PubMed

    Salsman, John M; Garcia, Sofia F; Lai, Jin-Shei; Cella, David

    2012-12-01

    The importance of faith and its associations with health are well documented. As part of the Patient Reported Outcomes Measurement Information System, items tapping positive and negative impact of illness (PII and NII) were developed across four content domains: Coping/Stress Response, Self-Concept, Social Connection/Isolation, and Meaning and Spirituality. Faith items were included within the concept of meaning and spirituality. This measurement model was tested on a heterogeneous group of 509 cancer survivors. To evaluate dimensionality, we applied two bi-factor models, specifying a general factor (PII or NII) and four local factors: Coping/Stress Response, Self-Concept, Social Connection/Isolation, and Meaning and Spirituality. Bi-factor analysis supported sufficient unidimensionality within PII and NII item sets. The unidimensionality of both PII and NII item sets was enhanced by extraction of the faith items from the rest of the questions. Of the 10 faith items, nine demonstrated higher local than general factor loadings (range for local factor loadings = 0.402 to 0.876), suggesting utility as a separate but related 'faith' factor. The same was true for only two of the remaining 63 items across the PII and NII item sets. Although conceptually and to a degree empirically related to Meaning and Spirituality, Faith appears to be a distinct subdomain of PII and NII, better handled by distinct assessment. A 10-item measure of the impact of illness upon faith (II-Faith) was therefore assembled. Copyright © 2011 John Wiley & Sons, Ltd.

  20. Evidence against associative blocking as a cause of cue-independent retrieval-induced forgetting.

    PubMed

    Hulbert, Justin C; Shivde, Geeta; Anderson, Michael C

    2012-01-01

    Selectively retrieving an item from long-term memory reduces the accessibility of competing traces, a phenomenon known as retrieval-induced forgetting (RIF). RIF exhibits cue independence, or the tendency for forgetting to generalize to novel test cues, suggesting an inhibitory basis for this phenomenon. An alternative view (Camp, Pecher, & Schmidt, 2007; Camp et al., 2009; Perfect et al., 2004) suggests that using novel test cues to measure cue independence actually engenders associative interference when participants covertly supplement retrieval with practiced cues that then associatively block retrieval. Accordingly, the covert-cueing hypothesis assumes that the relative strength of the practiced items at final test – and not the inhibition levied on the unpracticed items during retrieval practice – underlies cue-independent forgetting. As such, this perspective predicts that strengthening practiced items by any means, even if not via retrieval practice, should induce forgetting. Contrary to these predictions, however, we present clear evidence that cue-independent forgetting is induced by retrieval practice and not by repeated study exposures. This dissociation occurred despite significant, comparable levels of strengthening of practiced items in each case, and despite the use of Anderson and Spellman's original (1995) independent probe method criticized by covert-cueing theorists as being especially conducive to associative blocking. These results demonstrate that cue-independent RIF is unrelated to the strengthening of practiced items, and thereby fail to support a key prediction of the covert-cueing hypothesis. The results, instead, favor a role of inhibition in resolving retrieval interference. © 2011 Hogrefe Publishing

  1. Student certainty answering misconception question: study of Three-Tier Multiple-Choice Diagnostic Test in Acid-Base and Solubility Equilibrium

    NASA Astrophysics Data System (ADS)

    Ardiansah; Masykuri, M.; Rahardjo, S. B.

    2018-04-01

    Students’ concept comprehension in three-tier multiple-choice diagnostic test related to student confidence level. The confidence level related to certainty and student’s self-efficacy. The purpose of this research was to find out students’ certainty in misconception test. This research was quantitative-qualitative research method counting students’ confidence level. The research participants were 484 students that were studying acid-base and equilibrium solubility subject. Data was collected using three-tier multiple-choice (3TMC) with thirty questions and students’ questionnaire. The findings showed that #6 item gives the highest misconception percentage and high student confidence about the counting of ultra-dilute solution’s pH. Other findings were that 1) the student tendency chosen the misconception answer is to increase over item number, 2) student certainty decreased in terms of answering the 3TMC, and 3) student self-efficacy and achievement were related each other in the research. The findings suggest some implications and limitations for further research.

  2. The reliability and validity of the standardized Mensendieck test in relation to disability in patients with chronic pain.

    PubMed

    Keessen, Paul; Maaskant, Jolanda; Visser, Bart

    2018-08-01

    The standardized Mensendieck test (SMT) was developed to quantify posture, movement, gait, and respiration. In the hands of an experienced therapist, the SMT is proven to be a reliable tool. It is unclear whether posture, movement, gait, and respiration are related to the degree of functional disability in patients with chronic pain. The objective of this study was to assess the reliability and convergent validity of the SMT in a heterogeneous sample of 50 patients with chronic pain. Internal consistency was determined by Cronbach's α and interrater reliability by the intraclass correlation coefficient (ICC). Convergent validity was assessed by determining the Spearman rank correlation coefficient between the movement quality measured in the SMT and functional limitation measured on the disability rating index (DRI). The internal consistency was Cronbach's α 0.91. Substantial reliability was found for the items: movement (ICC = 0.68), gait (ICC = 0.69), sitting posture (ICC = 0.63), and respiration (ICC = 0.64). Insufficient reliability was found for standing posture (ICC = 0.23). A moderate correlation was found between average test score SMT and the DRI (r = -0.37) and respiration and DRI (r = -0.45). The SMT is a reasonably reliable tool to assess movement, gait, sitting posture, and respiration. None of the items in the domain standing posture has sufficient reliability. A thorough study of this domain should be considered. The results show little evidence for convergent validity. Several items of the SMT correlated moderately with functional limitation with the DRI. These items were global movement, hip flexion, pelvis rotation, and all respiration items.

  3. Cross-cultural adaptation and measurement properties testing of the Iconographical Falls Efficacy Scale (Icon-FES).

    PubMed

    Franco, Marcia Rodrigues; Pinto, Rafael Zambelli; Delbaere, Kim; Eto, Bianca Yumie; Faria, Maíra Sgobbi; Aoyagi, Giovana Ayumi; Steffens, Daniel; Pastre, Carlos Marcelo

    2018-02-14

    The Iconographical Falls Efficacy Scale (Icon-FES) is an innovative tool to assess concern of falling that uses pictures as visual cues to provide more complete environmental contexts. Advantages of Icon-FES over previous scales include the addition of more demanding balance-related activities, ability to assess concern about falling in highly functioning older people, and its normal distribution. To perform a cross-cultural adaptation and to assess the measurement properties of the 30-item and 10-item Icon-FES in a community-dwelling Brazilian older population. The cross-cultural adaptation followed the recommendations of international guidelines. We evaluated the measurement properties (i.e. internal consistency, test-retest reproducibility, standard error of the measurement, minimal detectable change, construct validity, ceiling/floor effect, data distribution and discriminative validity), in 100 community-dwelling people aged ≥60 years. The 30-item and 10-item Icon-FES-Brazil showed good internal consistency (alpha and omega >0.70) and excellent intra-rater reproducibility (ICC 2,1 =0.96 and 0.93, respectively). According to the standard error of the measurement and minimal detectable change, the magnitude of change needed to exceed the measurement error and variability were 7.2 and 3.4 points for the 30-item and 10-item Icon-FES, respectively. We observed an excellent correlation between both versions of the Icon-FES and Falls Efficacy Scale - International (rho=0.83, p<0.001 [30-item version]; 0.76, p<0.001 [10-item version]). Icon-FES versions showed normal distribution, no floor/ceiling effects and were able to discriminate between groups relating to fall risk factors. Icon-FES-Brazil is a semantically and linguistically appropriate tool with acceptable measurement properties to evaluate concern about falling among the community-dwelling older population. Copyright © 2018 Associação Brasileira de Pesquisa e Pós-Graduação em Fisioterapia. Publicado por Elsevier Editora Ltda. All rights reserved.

  4. State Assessment Program Item Banks: Model Language for Request for Proposals (RFP) and Contracts

    ERIC Educational Resources Information Center

    Swanson, Leonard C.

    2010-01-01

    This document provides recommendations for request for proposal (RFP) and contract language that state education agencies can use to specify their requirements for access to test item banks. An item bank is a repository for test items and data about those items. Item banks are used by state agency staff to view items and associated data; to…

  5. The Impact of Receiving the Same Items on Consecutive Computer Adaptive Test Administrations.

    ERIC Educational Resources Information Center

    O'Neill, Thomas; Lunz, Mary E.; Thiede, Keith

    2000-01-01

    Studied item exposure in a computerized adaptive test when the item selection algorithm presents examinees with questions they were asked in a previous test administration. Results with 178 repeat examinees on a medical technologists' test indicate that the combined use of an adaptive algorithm to select items and latent trait theory to estimate…

  6. Helping Poor Readers Demonstrate Their Science Competence: Item Characteristics Supporting Text-Picture Integration

    ERIC Educational Resources Information Center

    Saß, Steffani; Schütte, Kerstin

    2016-01-01

    Solving test items might require abilities in test-takers other than the construct the test was designed to assess. Item and student characteristics such as item format or reading comprehension can impact the test result. This experiment is based on cognitive theories of text and picture comprehension. It examines whether integration aids, which…

  7. Uncertainties in the Item Parameter Estimates and Robust Automated Test Assembly

    ERIC Educational Resources Information Center

    Veldkamp, Bernard P.; Matteucci, Mariagiulia; de Jong, Martijn G.

    2013-01-01

    Item response theory parameters have to be estimated, and because of the estimation process, they do have uncertainty in them. In most large-scale testing programs, the parameters are stored in item banks, and automated test assembly algorithms are applied to assemble operational test forms. These algorithms treat item parameters as fixed values,…

  8. Identifying Differential Item Functioning in Multi-Stage Computer Adaptive Testing

    ERIC Educational Resources Information Center

    Gierl, Mark J.; Lai, Hollis; Li, Johnson

    2013-01-01

    The purpose of this study is to evaluate the performance of CATSIB (Computer Adaptive Testing-Simultaneous Item Bias Test) for detecting differential item functioning (DIF) when items in the matching and studied subtest are administered adaptively in the context of a realistic multi-stage adaptive test (MST). MST was simulated using a 4-item…

  9. A Time and Place for Everything: Developmental Differences in the Building Blocks of Episodic Memory

    PubMed Central

    Lee, Joshua K.; Wendelken, J. Carter; Bunge, Silvia A.; Ghetti, Simona

    2015-01-01

    This research investigated whether episodic memory development can be explained by improvements in relational binding processes, involved in forming novel associations between events and the context in which they occurred. Memory for item-space, item-time, and item-item relations was assessed in an ethnically diverse sample of 151 children aged 7 to 11 years and 28 young adults. Item-space memory reached adult performance by 9½ years, whereas item-time and item-item memory improved into adulthood. In path analysis, item-space, but not item-time best explained item-item memory. Across age groups, relational binding related to source memory and performance on standardized memory assessments. In conclusion, relational binding development depends on relation type, but relational binding overall supports episodic memory development. PMID:26493950

  10. Development and evaluation of the Korean Health Literacy Instrument.

    PubMed

    Kang, Soo Jin; Lee, Tae Wha; Paasche-Orlow, Michael K; Kim, Gwang Suk; Won, Hee Kwan

    2014-01-01

    The purpose of this study is to develop and validate the Korean Health Literacy Instrument, which measures the capacity to understand and use health-related information and make informed health decisions in Korean adults. In Phase 1, 33 initial items were generated to measure functional, interactive, and critical health literacy with prose, document, and numeracy tasks. These items included content from health promotion, disease management, and health navigation contexts. Content validity assessment was conducted by an expert panel, and 11 items were excluded. In Phase 2, the 22 remaining items were administered to a convenience sample of 292 adults from community and clinical settings. Exploratory factor and item difficulty and discrimination analyses were conducted and four items with low discrimination were deleted. In Phase 3, the remaining 18 items were administered to a convenience sample of 315 adults 40-64 years of age from community and clinical settings. A confirmatory factor analysis was performed to test the construct validity of the instrument. The Korean Health Literacy Instrument has a range of 0 to 18. The mean score in our validation study was 11.98. The instrument exhibited an internal consistency reliability coefficient of 0.82, and a test-retest reliability of 0.89. The instrument is suitable for screening individuals who have limited health literacy skills. Future studies are needed to further define the psychometric properties and predictive validity of the Korean Health Literacy Instrument.

  11. Multilevel Multidimensional Item Response Model with a Multilevel Latent Covariate

    ERIC Educational Resources Information Center

    Cho, Sun-Joo; Bottge, Brian A.

    2015-01-01

    In a pretest-posttest cluster-randomized trial, one of the methods commonly used to detect an intervention effect involves controlling pre-test scores and other related covariates while estimating an intervention effect at post-test. In many applications in education, the total post-test and pre-test scores that ignores measurement error in the…

  12. First State Fitness Test. A Measurement of Functional Health.

    ERIC Educational Resources Information Center

    Brown, Timothy; And Others

    This test is designed to measure the functional health of young people. Functional health refers to those factors relating to personal health that can be improved with regular exercise. This test is unique in comparison to other physical fitness tests because of the absence of motor skill items which have no relationship to an individual's…

  13. Sensitivity of a computer adaptive assessment for measuring functional mobility changes in children enrolled in a community fitness programme.

    PubMed

    Haley, Stephen M; Fragala-Pinkham, Maria; Ni, Pengsheng

    2006-07-01

    To examine the relative sensitivity to detect functional mobility changes with a full-length parent questionnaire compared with a computerized adaptive testing version of the questionnaire after a 16-week group fitness programme. Prospective, pre- and posttest study with a 16-week group fitness intervention. Three community-based fitness centres. Convenience sample of children (n = 28) with physical or developmental disabilities. A 16-week group exercise programme held twice a week in a community setting. A full-length (161 items) paper version of a mobility parent questionnaire based on the Pediatric Evaluation of Disability Inventory, but expanded to include expected skills of children up to 15 years old was compared with a 15-item computer adaptive testing version. Both measures were administered at pre- and posttest intervals. Both the full-length Pediatric Evaluation of Disability Inventory and the 15-item computer adaptive testing version detected significant changes between pre- and posttest scores, had large effect sizes, and standardized response means, with a modest decrease in the computer adaptive test as compared with the 161-item paper version. Correlations between the computer adaptive and paper formats across pre- and posttest scores ranged from r = 0.76 to 0.86. Both functional mobility test versions were able to detect positive functional changes at the end of the intervention period. Greater variability in score estimates was generated by the computerized adaptive testing version, which led to a relative reduction in sensitivity as defined by the standardized response mean. Extreme scores were generally more difficult for the computer adaptive format to estimate with as much accuracy as scores in the mid-range of the scale. However, the reduction in accuracy and sensitivity, which did not influence the group effect results in this study, is counterbalanced by the large reduction in testing burden.

  14. A Stepwise Test Characteristic Curve Method to Detect Item Parameter Drift

    ERIC Educational Resources Information Center

    Guo, Rui; Zheng, Yi; Chang, Hua-Hua

    2015-01-01

    An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the…

  15. The promise and challenge of including multimedia items in medical licensure examinations: some insights from an empirical trial.

    PubMed

    Shen, Linjun; Li, Feiming; Wattleworth, Roberta; Filipetto, Frank

    2010-10-01

    The Comprehensive Osteopathic Medical Licensing Examination conducted a trial of multimedia items in the 2008-2009 Level 3 testing cycle to determine (1) if multimedia items were able to test additional elements of medical knowledge and skills and (2) how to develop effective multimedia items. Forty-four content-matched multimedia and text multiple-choice items were randomly delivered to Level 3 candidates. Logistic regression and paired-samples t tests were used for pairwise and group-level comparisons, respectively. Nine pairs showed significant differences in either difficulty or/and discrimination. Content analysis found that, if text narrations were less direct, multimedia materials could make items easier. When textbook terminologies were replaced by multimedia presentations, multimedia items could become more difficult. Moreover, a multimedia item was found not uniformly difficult for candidates at different ability levels, possibly because multimedia and text items tested different elements of a same concept. Multimedia items may be capable of measuring some constructs different from what text items can measure. Effective multimedia items with reasonable psychometric properties can be intentionally developed.

  16. Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination.

    PubMed

    Koh, Bongyeun; Hong, Sunggi; Kim, Soon-Sim; Hyun, Jin-Sook; Baek, Milye; Moon, Jundong; Kwon, Hayran; Kim, Gyoungyong; Min, Seonggi; Kang, Gu-Hyun

    2016-01-01

    The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination.

  17. A comparison of item response models for accuracy and speed of item responses with applications to adaptive testing.

    PubMed

    van Rijn, Peter W; Ali, Usama S

    2017-05-01

    We compare three modelling frameworks for accuracy and speed of item responses in the context of adaptive testing. The first framework is based on modelling scores that result from a scoring rule that incorporates both accuracy and speed. The second framework is the hierarchical modelling approach developed by van der Linden (2007, Psychometrika, 72, 287) in which a regular item response model is specified for accuracy and a log-normal model for speed. The third framework is the diffusion framework in which the response is assumed to be the result of a Wiener process. Although the three frameworks differ in the relation between accuracy and speed, one commonality is that the marginal model for accuracy can be simplified to the two-parameter logistic model. We discuss both conditional and marginal estimation of model parameters. Models from all three frameworks were fitted to data from a mathematics and spelling test. Furthermore, we applied a linear and adaptive testing mode to the data off-line in order to determine differences between modelling frameworks. It was found that a model from the scoring rule framework outperformed a hierarchical model in terms of model-based reliability, but the results were mixed with respect to correlations with external measures. © 2017 The British Psychological Society.

  18. Expanding the basic science debate: the role of physics knowledge in interpreting clinical findings.

    PubMed

    Goldszmidt, Mark; Minda, John Paul; Devantier, Sarah L; Skye, Aimee L; Woods, Nicole N

    2012-10-01

    Current research suggests a role for biomedical knowledge in learning and retaining concepts related to medical diagnosis. However, learning may be influenced by other, non-biomedical knowledge. We explored this idea using an experimental design and examined the effects of causal knowledge on the learning, retention, and interpretation of medical information. Participants studied a handout about several respiratory disorders and how to interpret respiratory exam findings. The control group received the information in standard "textbook" format and the experimental group was presented with the same information as well as a causal explanation about how sound travels through lungs in both the normal and disease states. Comprehension and memory of the information was evaluated with a multiple-choice exam. Several questions that were not related to the causal knowledge served as control items. Questions related to the interpretation of physical exam findings served as the critical test items. The experimental group outperformed the control group on the critical test items, and our study shows that a causal explanation can improve a student's memory for interpreting clinical details. We suggest an expansion of which basic sciences are considered fundamental to medical education.

  19. Initial retrieval shields against retrieval-induced forgetting.

    PubMed

    Racsmány, Mihály; Keresztes, Attila

    2015-01-01

    Testing, as a form of retrieval, can enhance learning but it can also induce forgetting of related memories, a phenomenon known as retrieval-induced forgetting (RIF). In four experiments we explored whether selective retrieval and selective restudy of target memories induce forgetting of related memories with or without initial retrieval of the entire learning set. In Experiment 1, subjects studied category-exemplar associations, some of which were then either restudied or retrieved. RIF occurred on a delayed final test only when memories were retrieved and not when they were restudied. In Experiment 2, following the study phase of category-exemplar associations, subjects attempted to recall all category-exemplar associations, then they selectively retrieved or restudied some of the exemplars. We found that, despite the huge impact on practiced items, selective retrieval/restudy caused no decrease in final recall of related items. In Experiment 3, we replicated the main result of Experiment 2 by manipulating initial retrieval as a within-subject variable. In Experiment 4 we replicated the main results of the previous experiments with non-practiced (Nrp) baseline items. These findings suggest that initial retrieval of the learning set shields against the forgetting effect of later selective retrieval. Together, our results support the context shift theory of RIF.

  20. Item Analysis in Introductory Economics Testing.

    ERIC Educational Resources Information Center

    Tinari, Frank D.

    1979-01-01

    Computerized analysis of multiple choice test items is explained. Examples of item analysis applications in the introductory economics course are discussed with respect to three objectives: to evaluate learning; to improve test items; and to help improve classroom instruction. Problems, costs and benefits of the procedures are identified. (JMD)

  1. Differential Item Functioning (DIF) among Spanish-Speaking English Language Learners (ELLs) in State Science Tests

    NASA Astrophysics Data System (ADS)

    Ilich, Maria O.

    Psychometricians and test developers evaluate standardized tests for potential bias against groups of test-takers by using differential item functioning (DIF). English language learners (ELLs) are a diverse group of students whose native language is not English. While they are still learning the English language, they must take their standardized tests for their school subjects, including science, in English. In this study, linguistic complexity was examined as a possible source of DIF that may result in test scores that confound science knowledge with a lack of English proficiency among ELLs. Two years of fifth-grade state science tests were analyzed for evidence of DIF using two DIF methods, Simultaneous Item Bias Test (SIBTest) and logistic regression. The tests presented a unique challenge in that the test items were grouped together into testlets---groups of items referring to a scientific scenario to measure knowledge of different science content or skills. Very large samples of 10, 256 students in 2006 and 13,571 students in 2007 were examined. Half of each sample was composed of Spanish-speaking ELLs; the balance was comprised of native English speakers. The two DIF methods were in agreement about the items that favored non-ELLs and the items that favored ELLs. Logistic regression effect sizes were all negligible, while SIBTest flagged items with low to high DIF. A decrease in socioeconomic status and Spanish-speaking ELL diversity may have led to inconsistent SIBTest effect sizes for items used in both testing years. The DIF results for the testlets suggested that ELLs lacked sufficient opportunity to learn science content. The DIF results further suggest that those constructed response test items requiring the student to draw a conclusion about a scientific investigation or to plan a new investigation tended to favor ELLs.

  2. Strategic management and performance differences: nonprofit versus for-profit health organizations.

    PubMed

    Reeves, Terrie C; Ford, Eric W

    2004-01-01

    Despite mixed and contradictory findings, for-profits (FPs) and nonprofits (NPs) are assumed to be similar health services organizations (HSOs). In this study, a fifteen-item scale assessing HSOs' strategic management capacity was developed and tested using fifty-seven FP and twenty NP organizations. Then, using item response theory, the items were hierarchically profiled to produce two strategic profile models, a general and an FP anchored model. We find that deviation from the general profile, but not capability attainment level, is related to two of three financial measures. We conclude that studying FPs and NPs together is appropriate.

  3. Full-Scale Accelerated Pavement Testing of Warm-Mix Asphalt (WMA) for Airfield Pavements

    DTIC Science & Technology

    2014-01-01

    software and Pavement Engineering Utility (PSEVEN) were used 50 ft 65 ft 130 ft 24 ft Item 3 Sasobit ® Item 4 Evotherm 3G Item 1 HMA... Evotherm 3G Air Top Mid-depth Bottom Target temperature = 109 ºF ERDC/GSL TR-14-3 25 The target pavement temperature for this study was 109 ºF, and it is...the locations of the I-buttons and their layout in relation to the vents. 90 95 100 105 110 115 120 HMA Foamed Asphalt Sasobit Evotherm 3G Av er ag e

  4. Applying automatic item generation to create cohesive physics testlets

    NASA Astrophysics Data System (ADS)

    Mindyarto, B. N.; Nugroho, S. E.; Linuwih, S.

    2018-03-01

    Computer-based testing has created the demand for large numbers of items. This paper discusses the production of cohesive physics testlets using an automatic item generation concepts and procedures. The testlets were composed by restructuring physics problems to reveal deeper understanding of the underlying physical concepts by inserting a qualitative question and its scientific reasoning question. A template-based testlet generator was used to generate the testlet variants. Using this methodology, 1248 testlet variants were effectively generated from 25 testlet templates. Some issues related to the effective application of the generated physics testlets in practical assessments were discussed.

  5. Item-cued directed forgetting of related words and pictures in children and adults: selective rehearsal versus cognitive inhibition.

    PubMed

    Lehman, E B; McKinley-Pace, M; Leonard, A M; Thompson, D; Johns, K

    2001-01-01

    The main purpose of this study was to compare the relative importance of selective rehearsal and cognitive inhibition in accounting for developmental changes in the directed-forgetting paradigm developed by R. A. Bjork (1972). In two experiments, children in Grades 2 and 5 and college students were asked to remember some words or pictures and to forget others when items were categorically related. Their memory for both items and the associated remember or forget cues was then tested with recall and recognition. Fifth graders recognized more of the forget-cued words than college students did. The pattern of results suggested that age differences in rehearsal and source monitoring (i.e., remembering whether a word had been cued remember or forget) were better explanatory mechanisms for children's forgetting inefficiencies than retrieval inhibition was. The results are discussed in terms of a multiple process view of inhibition.

  6. Factorial Structure and Age-Related Psychometrics of the MIDUS Personality Adjective Items across the Lifespan

    PubMed Central

    Zimprich, Daniel; Allemand, Mathias; Lachman, Margie E.

    2014-01-01

    The present study addresses issues of measurement invariance and comparability of factor parameters of Big Five personality adjective items across age. Data from the Midlife in the United States (MIDUS) survey were used to investigate age-related developmental psychometrics of the MIDUS personality adjective items in two large cross-sectional samples (exploratory sample: N = 862; analysis sample: N = 3,000). After having established and replicated a comprehensive five-factor structure of the measure, increasing levels of measurement invariance were tested across ten age groups. Results indicate that the measure demonstrates strict measurement invariance in terms of number of factors and factor loadings. Also, we found that factor variances and covariances were equal across age groups. By contrast, a number of age-related factor mean differences emerged. The practical implications of these results are discussed and future research is suggested. PMID:21910548

  7. Development and evaluation of a thermochemistry concept inventory for college-level general chemistry

    NASA Astrophysics Data System (ADS)

    Wren, David A.

    The research presented in this dissertation culminated in a 10-item Thermochemistry Concept Inventory (TCI). The development of the TCI can be divided into two main phases: qualitative studies and quantitative studies. Both phases focused on the primary stakeholders of the TCI, college-level general chemistry instructors and students. Each phase was designed to collect evidence for the validity of the interpretations and uses of TCI testing data. A central use of TCI testing data is to identify student conceptual misunderstandings, which are represented as incorrect options of multiple-choice TCI items. Therefore, quantitative and qualitative studies focused heavily on collecting evidence at the item-level, where important interpretations may be made by TCI users. Qualitative studies included student interviews (N = 28) and online expert surveys (N = 30). Think-aloud student interviews (N = 12) were used to identify conceptual misunderstandings used by students. Novice response process validity interviews (N = 16) helped provide information on how students interpreted and answered TCI items and were the basis of item revisions. Practicing general chemistry instructors (N = 18), or experts, defined boundaries of thermochemistry content included on the TCI. Once TCI items were in the later stages of development, an online version of the TCI was used in expert response process validity survey (N = 12), to provide expert feedback on item content, format and consensus of the correct answer for each item. Quantitative studies included three phases: beta testing of TCI items (N = 280), pilot testing of the a 12-item TCI (N = 485), and a large data collection using a 10-item TCI ( N = 1331). In addition to traditional classical test theory analysis, Rasch model analysis was also used for evaluation of testing data at the test and item level. The TCI was administered in both formative assessment (beta and pilot testing) and summative assessment (large data collection), with items performing well in both. One item, item K, did not have acceptable psychometric properties when the TCI was used as a quiz (summative assessment), but was retained in the final version of the TCI based on the acceptable psychometric properties displayed in pilot testing (formative assessment).

  8. Examining the Impact of Drifted Polytomous Anchor Items on Test Characteristic Curve (TCC) Linking and IRT True Score Equating. Research Report. ETS RR-12-09

    ERIC Educational Resources Information Center

    Li, Yanmei

    2012-01-01

    In a common-item (anchor) equating design, the common items should be evaluated for item parameter drift. Drifted items are often removed. For a test that contains mostly dichotomous items and only a small number of polytomous items, removing some drifted polytomous anchor items may result in anchor sets that no longer resemble mini-versions of…

  9. Measuring Response Styles Across the Big Five: A Multiscale Extension of an Approach Using Multinomial Processing Trees.

    PubMed

    Khorramdel, Lale; von Davier, Matthias

    2014-01-01

    This study shows how to address the problem of trait-unrelated response styles (RS) in rating scales using multidimensional item response theory. The aim is to test and correct data for RS in order to provide fair assessments of personality. Expanding on an approach presented by Böckenholt (2012), observed rating data are decomposed into multiple response processes based on a multinomial processing tree. The data come from a questionnaire consisting of 50 items of the International Personality Item Pool measuring the Big Five dimensions administered to 2,026 U.S. students with a 5-point rating scale. It is shown that this approach can be used to test if RS exist in the data and that RS can be differentiated from trait-related responses. Although the extreme RS appear to be unidimensional after exclusion of only 1 item, a unidimensional measure for the midpoint RS is obtained only after exclusion of 10 items. Both RS measurements show high cross-scale correlations and item response theory-based (marginal) reliabilities. Cultural differences could be found in giving extreme responses. Moreover, it is shown how to score rating data to correct for RS after being proved to exist in the data.

  10. Visual Short-Term Memory Compared in Rhesus Monkeys and Humans

    PubMed Central

    Elmore, L. Caitlin; Ma, Wei Ji; Magnotti, John F.; Leising, Kenneth J.; Passaro, Antony D.; Katz, Jeffrey S.; Wright, Anthony A.

    2011-01-01

    Summary Change detection is a popular task to study visual short-term memory (STM) in humans [1–4]. Much of this work suggests that STM has a fixed capacity of 4 ± 1 items [1–6]. Here we report the first comparison of change detection memory between humans and a species closely related to humans, the rhesus monkey. Monkeys and humans were tested in nearly identical procedures with overlapping display sizes. Although the monkeys’ STM was well fit by a 1-item fixed-capacity memory model, other monkey memory tests with 4-item lists have shown performance impossible to obtain with a 1-item capacity [7]. We suggest that this contradiction can be resolved using a continuous-resource approach more closely tied to the neural basis of memory [8,9]. In this view, items have a noisy memory representation whose noise level depends on display size due to distributed allocation of a continuous resource. In accord with this theory, we show that performance depends on the perceptual distance between items before and after the change, and d′ depends on display size in an approximately power law fashion. Our results open the door to combining the power of psychophysics, computation, and physiology to better understand the neural basis of STM. PMID:21596568

  11. An electrophysiological signature of summed similarity in visual working memory.

    PubMed

    van Vugt, Marieke K; Sekuler, Robert; Wilson, Hugh R; Kahana, Michael J

    2013-05-01

    Summed-similarity models of short-term item recognition posit that participants base their judgments of an item's prior occurrence on that item's summed similarity to the ensemble of items on the remembered list. We examined the neural predictions of these models in 3 short-term recognition memory experiments using electrocorticographic/depth electrode recordings and scalp electroencephalography. On each experimental trial, participants judged whether a test face had been among a small set of recently studied faces. Consistent with summed-similarity theory, participants' tendency to endorse a test item increased as a function of its summed similarity to the items on the just-studied list. To characterize this behavioral effect of summed similarity, we successfully fit a summed-similarity model to individual participant data from each experiment. Using the parameters determined from fitting the summed-similarity model to the behavioral data, we examined the relation between summed similarity and brain activity. We found that 4-9 Hz theta activity in the medial temporal lobe and 2-4 Hz delta activity recorded from frontal and parietal cortices increased with summed similarity. These findings demonstrate direct neural correlates of the similarity computations that form the foundation of several major cognitive theories of human recognition memory. PsycINFO Database Record (c) 2013 APA, all rights reserved.

  12. Which Statistic Should Be Used to Detect Item Preknowledge When the Set of Compromised Items Is Known?

    PubMed

    Sinharay, Sandip

    2017-09-01

    Benefiting from item preknowledge is a major type of fraudulent behavior during educational assessments. Belov suggested the posterior shift statistic for detection of item preknowledge and showed its performance to be better on average than that of seven other statistics for detection of item preknowledge for a known set of compromised items. Sinharay suggested a statistic based on the likelihood ratio test for detection of item preknowledge; the advantage of the statistic is that its null distribution is known. Results from simulated and real data and adaptive and nonadaptive tests are used to demonstrate that the Type I error rate and power of the statistic based on the likelihood ratio test are very similar to those of the posterior shift statistic. Thus, the statistic based on the likelihood ratio test appears promising in detecting item preknowledge when the set of compromised items is known.

  13. Psychometric properties of the Brisbane Burn Scar Impact Profile in adults with burn scars

    PubMed Central

    Kimble, Roy; McPhail, Steven; Plaza, Anita; Simons, Megan

    2017-01-01

    Objective The aim of the study was to determine the longitudinal validity, reproducibility, responsiveness and interpretability of the adult version of the Brisbane Burn Scar Impact Profile, a patient-report measure of health-related quality of life. Methods A prospective longitudinal cohort study of patients with or at risk of burn scarring was conducted at three assessment points (at baseline around the time of wound healing, one to two weeks post-baseline and 1-month post-baseline). Participants attending a major metropolitan adult burn centre at baseline were recruited. Participants completed the Brisbane Burn Scar Impact Profile and the 36-item Short Form Health Survey and Patient Observer Scar Assessment Scale. Intraclass Correlation Coefficients (ICCs), smallest detectable change, percentage of those who improved, stayed the same or worsened and Area under the Receiver Operating Characteristic Curve (AUC) were used to test the aim. Results Data were included for 118 participants at baseline, 68 participants at one to two weeks and 57 participants at 1-month post-baseline. All groups of items had acceptable reproducibility, except for the overall impact of burn scars (ICC = 0.69), the impact of sensations which was not expected to be stable (ICC = 0.63), mobility and daily activities (ICC = 0.63, 0.67 respectively). The responsiveness of six out of seven groups of items able to be tested against external criterion was supported (AUC = 0.72–0.75). Hypothesised correlations of changes in the Brisbane Burn Scar Impact Profile items with changes in criterion measures generally supported longitudinal validity (e.g., nine out of thirteen hypotheses using the SF-36 as an external criterion were supported). Internal consistency estimates, item-total and inter-item correlations indicated there was likely redundancy of some groups of items, particularly in the relationships and social interaction, appearance and emotional reactions items (Chronbach’s alpha range = 0.94–0.95). Conclusion Support was found for the reproducibility, longitudinal validity, responsiveness and interpretability of most groups of Brisbane Burn Scar Impact Profile items and some individual items in the test population. Potential redundancy of items should be investigated further. PMID:28902874

  14. A Bayesian Method for the Detection of Item Preknowledge in CAT. Law School Admission Council Computerized Testing Report. LSAC Research Report Series.

    ERIC Educational Resources Information Center

    McLeod, Lori D.; Lewis, Charles; Thissen, David.

    With the increased use of computerized adaptive testing, which allows for continuous testing, new concerns about test security have evolved, one being the assurance that items in an item pool are safeguarded from theft. In this paper, the risk of score inflation and procedures to detect test takers using item preknowledge are explored. When test…

  15. Rasch analysis suggested three unidimensional domains for Affiliate Stigma Scale: additional psychometric evaluation.

    PubMed

    Chang, Chih-Cheng; Su, Jian-An; Tsai, Ching-Shu; Yen, Cheng-Fang; Liu, Jiun-Horng; Lin, Chung-Ying

    2015-06-01

    To examine the psychometrics of the Affiliate Stigma Scale using rigorous psychometric analysis: classical test theory (CTT) (traditional) and Rasch analysis (modern). Differential item functioning (DIF) items were also tested using Rasch analysis. Caregivers of relatives with mental illness (n = 453; mean age: 53.29 ± 13.50 years) were recruited from southern Taiwan. Each participant filled out four questionnaires: Affiliate Stigma Scale, Rosenberg Self-Esteem Scale, Beck Anxiety Inventory, and one background information sheet. CTT analyses showed that the Affiliate Stigma Scale had satisfactory internal consistency (α = 0.85-0.94) and concurrent validity (Rosenberg Self-Esteem Scale: r = -0.52 to -0.46; Beck Anxiety Inventory: r = 0.27-0.34). Rasch analyses supported the unidimensionality of three domains in the Affiliate Stigma Scale and indicated four DIF items (affect domain: 1; cognitive domain: 3) across gender. Our findings, based on rigorous statistical analysis, verified the psychometrics of the Affiliate Stigma Scale and reported its DIF items. We conclude that the three domains of the Affiliate Stigma Scale can be separately used and are suitable for measuring the affiliate stigma of caregivers of relatives with mental illness. Copyright © 2015 Elsevier Inc. All rights reserved.

  16. Acquisition of generic memory in amnesia.

    PubMed

    Verfaellie, M; Cermak, L S

    1994-06-01

    Amnesic patients' ability to acquire generic, semantic information was assessed relative to their own level of episodic memory. Patients studied a list of words in which some items were presented twice and others once. Upon each presentation, the words were tagged episodically by presenting them in a unique color. Recall of the colors in which words were presented suggested that individual presentations of repeated items were less likely to be recalled than presentations of nonrepeated items; however, actual recall of repeated items exceeded that of nonrepeated items. This outcome demonstrated that amnesics can recall some items generically without recalling either of their individual presentations. However, amnesics' recall of twice-presented items remained far below that of the control group, even when their recall of once-presented items was matched by testing the control group after a delay. This finding suggests that amnesic patients can acquire new generic knowledge but do so much less efficiently than do normal individuals. Furthermore, this deficit occurs independently of the amnesics' episodic memory impairments, reflecting instead a disruption in semantic learning per se.

  17. Effects of aging on neural connectivity underlying selective memory for emotional scenes

    PubMed Central

    Waring, Jill D.; Addis, Donna Rose; Kensinger, Elizabeth A.

    2012-01-01

    Older adults show age-related reductions in memory for neutral items within complex visual scenes, but just like young adults, older adults exhibit a memory advantage for emotional items within scenes compared with the background scene information. The present study examined young and older adults’ encoding-stage effective connectivity for selective memory of emotional items versus memory for both the emotional item and its background. In a functional magnetic resonance imaging (fMRI) study, participants viewed scenes containing either positive or negative items within neutral backgrounds. Outside the scanner, participants completed a memory test for items and backgrounds. Irrespective of scene content being emotionally positive or negative, older adults had stronger positive connections among frontal regions and from frontal regions to medial temporal lobe structures than did young adults, especially when items and backgrounds were subsequently remembered. These results suggest there are differences between young and older adults’ connectivity accompanying the encoding of emotional scenes. Older adults may require more frontal connectivity to encode all elements of a scene rather than just encoding the emotional item. PMID:22542836

  18. Effects of aging on neural connectivity underlying selective memory for emotional scenes.

    PubMed

    Waring, Jill D; Addis, Donna Rose; Kensinger, Elizabeth A

    2013-02-01

    Older adults show age-related reductions in memory for neutral items within complex visual scenes, but just like young adults, older adults exhibit a memory advantage for emotional items within scenes compared with the background scene information. The present study examined young and older adults' encoding-stage effective connectivity for selective memory of emotional items versus memory for both the emotional item and its background. In a functional magnetic resonance imaging (fMRI) study, participants viewed scenes containing either positive or negative items within neutral backgrounds. Outside the scanner, participants completed a memory test for items and backgrounds. Irrespective of scene content being emotionally positive or negative, older adults had stronger positive connections among frontal regions and from frontal regions to medial temporal lobe structures than did young adults, especially when items and backgrounds were subsequently remembered. These results suggest there are differences between young and older adults' connectivity accompanying the encoding of emotional scenes. Older adults may require more frontal connectivity to encode all elements of a scene rather than just encoding the emotional item. Published by Elsevier Inc.

  19. A Bivariate Generalized Linear Item Response Theory Modeling Framework to the Analysis of Responses and Response Times.

    PubMed

    Molenaar, Dylan; Tuerlinckx, Francis; van der Maas, Han L J

    2015-01-01

    A generalized linear modeling framework to the analysis of responses and response times is outlined. In this framework, referred to as bivariate generalized linear item response theory (B-GLIRT), separate generalized linear measurement models are specified for the responses and the response times that are subsequently linked by cross-relations. The cross-relations can take various forms. Here, we focus on cross-relations with a linear or interaction term for ability tests, and cross-relations with a curvilinear term for personality tests. In addition, we discuss how popular existing models from the psychometric literature are special cases in the B-GLIRT framework depending on restrictions in the cross-relation. This allows us to compare existing models conceptually and empirically. We discuss various extensions of the traditional models motivated by practical problems. We also illustrate the applicability of our approach using various real data examples, including data on personality and cognitive ability.

  20. Effect of Multiple Testing Adjustment in Differential Item Functioning Detection

    ERIC Educational Resources Information Center

    Kim, Jihye; Oshima, T. C.

    2013-01-01

    In a typical differential item functioning (DIF) analysis, a significance test is conducted for each item. As a test consists of multiple items, such multiple testing may increase the possibility of making a Type I error at least once. The goal of this study was to investigate how to control a Type I error rate and power using adjustment…

  1. Item Response Theory Models for Performance Decline during Testing

    ERIC Educational Resources Information Center

    Jin, Kuan-Yu; Wang, Wen-Chung

    2014-01-01

    Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…

  2. Differential item functioning analysis of the Vanderbilt Expertise Test for cars.

    PubMed

    Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W; Van Gulick, Ana Beth; Gauthier, Isabel

    2015-01-01

    The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge.

  3. Samejima Items in Multiple-Choice Tests: Identification and Implications

    ERIC Educational Resources Information Center

    Rahman, Nazia

    2013-01-01

    Samejima hypothesized that non-monotonically increasing item response functions (IRFs) of ability might occur for multiple-choice items (referred to here as "Samejima items") if low ability test takers with some, though incomplete, knowledge or skill are drawn to a particularly attractive distractor, while very low ability test takers…

  4. Computerized Numerical Control Test Item Bank.

    ERIC Educational Resources Information Center

    Reneau, Fred; And Others

    This guide contains 285 test items for use in teaching a course in computerized numerical control. All test items were reviewed, revised, and validated by incumbent workers and subject matter instructors. Items are provided for assessing student achievement in such aspects of programming and planning, setting up, and operating machines with…

  5. The Promise of NLP and Speech Processing Technologies in Language Assessment

    ERIC Educational Resources Information Center

    Chapelle, Carol A.; Chung, Yoo-Ree

    2010-01-01

    Advances in natural language processing (NLP) and automatic speech recognition and processing technologies offer new opportunities for language testing. Despite their potential uses on a range of language test item types, relatively little work has been done in this area, and it is therefore not well understood by test developers, researchers or…

  6. Retrieval Cues on Tests: A Strategy for Helping Students Overcome Retrieval Failure

    ERIC Educational Resources Information Center

    Gallagher, Kristel M.

    2017-01-01

    Students often struggle to recall information on tests, frequently claiming to experience a "retrieval failure" of learned information. Thus, the retrieval of information from memory may be a roadblock to student success. I propose a relatively simple adjustment to the wording of test items to help eliminate this potential barrier.…

  7. Are Learning Disabled Students "Test-Wise?": An Inquiry into Reading Comprehension Test Items.

    ERIC Educational Resources Information Center

    Scruggs, Thomas E.; Lifson, Steve

    1986-01-01

    Two experiments compared the ability of learning disabled (LD) students and more typical age peers to answer such reading comprehension questions presented independently of reading passages. Results suggested a relative deficiency on the part of LD students with respect to reasoning strategies and test-taking skills. (Author/LMO)

  8. Bi-Factor MIRT Observed-Score Equating for Mixed-Format Tests

    ERIC Educational Resources Information Center

    Lee, Guemin; Lee, Won-Chan

    2016-01-01

    The main purposes of this study were to develop bi-factor multidimensional item response theory (BF-MIRT) observed-score equating procedures for mixed-format tests and to investigate relative appropriateness of the proposed procedures. Using data from a large-scale testing program, three types of pseudo data sets were formulated: matched samples,…

  9. Real-time analysis system for gas turbine ground test acoustic measurements.

    PubMed

    Johnston, Robert T

    2003-10-01

    This paper provides an overview of a data system upgrade to the Pratt and Whitney facility designed for making acoustic measurements on aircraft gas turbine engines. A data system upgrade was undertaken because the return-on-investment was determined to be extremely high. That is, the savings on the first test series recovered the cost of the hardware. The commercial system selected for this application utilizes 48 input channels, which allows either 1/3 octave and/or narrow-band analyses to be preformed real-time. A high-speed disk drive allows raw data from all 48 channels to be stored simultaneously while the analyses are being preformed. Results of tests to ensure compliance of the new system with regulations and with existing systems are presented. Test times were reduced from 5 h to 1 h of engine run time per engine configuration by the introduction of this new system. Conservative cost reduction estimates for future acoustic testing are 75% on items related to engine run time and 50% on items related to the overall length of the test.

  10. Development and Psychometric Evaluation of a Health-Related Quality of Life Instrument for Individuals with Adult-Onset Hearing Loss

    PubMed Central

    Stika, Carren J.; Hays, Ron D.

    2016-01-01

    Objective Self-reports of “hearing handicap” are available, but a comprehensive measure of health-related quality of life (HRQOL) for individuals with adult-onset hearing loss (AOHL) does not exist. Our objective was to develop and evaluate a multidimensional HRQOL instrument for individuals with AOHL. Design The Impact of Hearing Loss Inventory Tool (IHEAR-IT) was developed using results of focus groups, a literature review, Advisory Expert Panel input, and cognitive interviews. Study Sample The 73-item field-test instrument was completed by 409 adults (22-91 years old) with varying degrees of AOHL and from different areas of the US. Results Multitrait scaling analysis supported four multi-item scales and five individual items. Internal consistency reliabilities ranged from 0.93 to 0.96 for the scales. Construct validity was supported by correlations between the IHEAR-IT scales and scores on the 36-Item Short Form Health Survey, Version 2.0 (SF-36v2) Mental Composite Summary (r’s = 0.32 – 0.64) and the Hearing Handicap Inventory for the Elderly/Adults (HHIE/HHIA) (r’s > −0.70). Conclusions The field test provide initial support for the reliability and construct validity of the IHEAR-IT for evaluating HRQOL of individuals with AOHL. Further research is needed to evaluate the responsiveness to change of the IHEAR-IT scales and identify items for a short-form. PMID:27104754

  11. Clinical vs. Self-report Versions of the Quick Inventory of Depressive Symptomatology in a Public Sector Sample

    PubMed Central

    Bernstein, Ira H.; Rush, A. John; Carmody, Thomas J.; Woo, Ada; Trivedi, Madhukar H.

    2007-01-01

    Objectives Recent work using classical test theory (CTT) and item response theory (IRT) has found that the self-report (QIDS-SR16) and clinician-rated (QIDS-C16) versions of the 16-item Quick Inventory of Depressive Symptomatology were generally comparable in outpatients with nonpsychotic major depressive disorder (MDD). This report extends this comparison to a less well-educated, more treatment-resistant sample that included more ethnic/racial minorities using IRT and selected classical test analyses. Methods The QIDS-SR16 and QIDS-C16 were obtained in a sample of 441 outpatients with nonpsychotic MDD seen in the public sector in the Texas Medication Algorithm Project (TMAP). The Samejima graded response IRT model was used to compare the QIDS-SR16 and QIDS-C16. Results The nine symptom domains in the QIDS-SR16 and QIDS-C16 related well to overall depression. The slopes of the item response functions a), which index the strength of relationship between overall depression and each symptom, were extremely similar with the two measures. Likewise, the CTT and IRT indices of symptom frequency (item means and locations of the item response functions, bi) were also similar with these two measures. For example, sad mood and difficulty with concentration/decision making were highly related to the overall depression severity with both the QIDS-C16 and QIDS-SR16. Likewise, sleeping difficulties were commonly reported, even though they were not as strongly related to overall magnitude of depression. Conclusion In this less educated, socially disadvantaged sample, differences between the QIDS-C16 and QIDS-SR16 were minor. The QIDS-SR16 is a satisfactory substitute for the more time-consuming QIDS-C16 in a broad range of adult, nonpsychotic, depressed outpatients. PMID:16716351

  12. Clinical vs. self-report versions of the quick inventory of depressive symptomatology in a public sector sample.

    PubMed

    Bernstein, Ira H; Rush, A John; Carmody, Thomas J; Woo, Ada; Trivedi, Madhukar H

    2007-01-01

    Recent work using classical test theory (CTT) and item response theory (IRT) has found that the self-report (QIDS-SR(16)) and clinician-rated (QIDS-C(16)) versions of the 16-item quick inventory of depressive symptomatology were generally comparable in outpatients with nonpsychotic major depressive disorder (MDD). This report extends this comparison to a less well-educated, more treatment-resistant sample that included more ethnic/racial minorities using IRT and selected classical test analyses. The QIDS-SR(16) and QIDS-C(16) were obtained in a sample of 441 outpatients with nonpsychotic MDD seen in the public sector in the Texas Medication Algorithm Project (TMAP). The Samejima graded response IRT model was used to compare the QIDS-SR(16) and QIDS-C(16). The nine symptom domains in the QIDS-SR(16) and QIDS-C(16) related well to overall depression. The slopes of the item response functions, a, which index the strength of relationship between overall depression and each symptom, were extremely similar with the two measures. Likewise, the CTT and IRT indices of symptom frequency (item means and locations of the item response functions, b(i) were also similar with these two measures. For example, sad mood and difficulty with concentration/decision making were highly related to the overall depression severity with both the QIDS-C(16) and QIDS-SR(16). Likewise, sleeping difficulties were commonly reported, even though they were not as strongly related to overall magnitude of depression. In this less educated, socially disadvantaged sample, differences between the QIDS-C(16) and QIDS-SR(16) were minor. The QIDS-SR(16) is a satisfactory substitute for the more time-consuming QIDS-C(16) in a broad range of adult, nonpsychotic, depressed outpatients.

  13. Development and psychometric testing of a barriers to HIV testing scale among individuals with HIV infection in Sweden; The Barriers to HIV testing scale-Karolinska version.

    PubMed

    Wiklander, Maria; Brännström, Johanna; Svedhem, Veronica; Eriksson, Lars E

    2015-11-19

    Barriers to HIV testing experienced by individuals at risk for HIV can result in treatment delay and further transmission of the disease. Instruments to systematically measure barriers are scarce, but could contribute to improved strategies for HIV testing. Aims of this study were to develop and test a barriers to HIV testing scale in a Swedish context. An 18-item scale was developed, based on an existing scale with addition of six new items related to fear of the disease or negative consequences of being diagnosed as HIV-infected. Items were phrased as statements about potential barriers with a three-point response format representing not important, somewhat important, and very important. The scale was evaluated regarding missing values, floor and ceiling effects, exploratory factor analysis, and internal consistencies. The questionnaire was completed by 292 adults recently diagnosed with HIV infection, of whom 7 were excluded (≥9 items missing) and 285 were included (≥12 items completed) in the analyses. The participants were 18-70 years old (mean 40.5, SD 11.5), 39 % were females and 77 % born outside Sweden. Routes of transmission were heterosexual transmission 63 %, male to male sex 20 %, intravenous drug use 5 %, blood product/transfusion 2 %, and unknown 9 %. All scale items had <3 % missing values. The data was feasible for factor analysis (KMO = 0.92) and a four-factor solution was chosen, based on level of explained common variance (58.64 %) and interpretability of factor structure. The factors were interpreted as; personal consequences, structural barriers, social and economic security, and confidentiality. Ratings on the minimum level (suggested barrier not important) were common, resulting in substantial floor effects on the scales. The scales were internally consistent (Cronbach's α 0.78-0.91). This study gives preliminary evidence of the scale being feasible, reliable and valid to identify different types of barriers to HIV testing.

  14. Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

    ERIC Educational Resources Information Center

    He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei

    2013-01-01

    Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…

  15. Robust Scale Transformation Methods in IRT True Score Equating under Common-Item Nonequivalent Groups Design

    ERIC Educational Resources Information Center

    He, Yong

    2013-01-01

    Common test items play an important role in equating multiple test forms under the common-item nonequivalent groups design. Inconsistent item parameter estimates among common items can lead to large bias in equated scores for IRT true score equating. Current methods extensively focus on detection and elimination of outlying common items, which…

  16. Using Differential Item Functioning Procedures to Explore Sources of Item Difficulty and Group Performance Characteristics.

    ERIC Educational Resources Information Center

    Scheuneman, Janice Dowd; Gerritz, Kalle

    1990-01-01

    Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)

  17. Investigating Item Exposure Control Methods in Computerized Adaptive Testing

    ERIC Educational Resources Information Center

    Ozturk, Nagihan Boztunc; Dogan, Nuri

    2015-01-01

    This study aims to investigate the effects of item exposure control methods on measurement precision and on test security under various item selection methods and item pool characteristics. In this study, the Randomesque (with item group sizes of 5 and 10), Sympson-Hetter, and Fade-Away methods were used as item exposure control methods. Moreover,…

  18. Detecting Differential Item Discrimination (DID) and the Consequences of Ignoring DID in Multilevel Item Response Models

    ERIC Educational Resources Information Center

    Lee, Woo-yeol; Cho, Sun-Joo

    2017-01-01

    Cross-level invariance in a multilevel item response model can be investigated by testing whether the within-level item discriminations are equal to the between-level item discriminations. Testing the cross-level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model…

  19. Item Pool Design for an Operational Variable-Length Computerized Adaptive Test

    ERIC Educational Resources Information Center

    He, Wei; Reckase, Mark D.

    2014-01-01

    For computerized adaptive tests (CATs) to work well, they must have an item pool with sufficient numbers of good quality items. Many researchers have pointed out that, in developing item pools for CATs, not only is the item pool size important but also the distribution of item parameters and practical considerations such as content distribution…

  20. Measurement characteristics for two health-related quality of life measures in older adults: The SF-36 and the CDC Healthy Days items.

    PubMed

    Barile, John P; Horner-Johnson, Willi; Krahn, Gloria; Zack, Matthew; Miranda, David; DeMichele, Kimberly; Ford, Derek; Thompson, William W

    2016-10-01

    The Short Form Health Survey (SF-36) and the Centers for Disease Control and Prevention (CDC) Healthy Days items are well known measures of health-related quality of life. The validity of the SF-36 for older adults and those with disabilities has been questioned. Assess the extent to which the SF-36 and the Centers for Disease Control and Prevention (CDC) Healthy Days items measure the same aspects of health; whether the SF-36 and the CDC unhealthy days items are invariant across gender, functional status, or the presence of chronic health conditions of older adults; and whether each of the SF-36's eight subscales is independently associated with the CDC Healthy Days items. We analyzed data from 66,269 adult Medicare advantage members age 65 and older. We used confirmatory factor analyses and regression modeling to test associations between the CDC Healthy Days items and subscales of the SF-36. The CDC Healthy Days items were associated with the SF-36 global measures of physical and mental health. The CDC physically unhealthy days item was associated with the SF-36 subscales for bodily pain, physical role limitations, and general health, while the CDC mentally unhealthy days item was associated with the SF-36 subscales for mental health, emotional role limitations, vitality and social functioning. The SF-36 physical functioning subscale was not independently associated with either of the CDC Healthy Days items. The CDC Healthy Days items measure similar domains as the SF-36 but appear to assess HRQOL without regard to limitations in functioning. Copyright © 2016 Elsevier Inc. All rights reserved.

  1. Measurement characteristics for two health-related quality of life measures in older adults: The SF-36 and the CDC Healthy Days items

    PubMed Central

    Barile, John P.; Horner-Johnson, Willi; Krahn, Gloria; Zack, Matthew; Miranda, David; DeMichele, Kimberly; Ford, Derek; Thompson, William W.

    2017-01-01

    Background The Short Form Health Survey (SF-36) and the Centers for Disease Control and Prevention (CDC) Healthy Days items are well known measures of health-related quality of life. The validity of the SF-36 for older adults and those with disabilities has been questioned. Objective Assess the extent to which the SF-36 and the Centers for Disease Control and Prevention (CDC) Healthy Days items measure the same aspects of health; whether the SF-36 and the CDC unhealthy days items are invariant across gender, functional status, or the presence of chronic health conditions of older adults; and whether each of the SF-36’s eight subscales is independently associated with the CDC Healthy Days items. Methods We analyzed data from 66,269 adult Medicare advantage members age 65 and older. We used confirmatory factor analyses and regression modeling to test associations between the CDC Healthy Days items and subscales of the SF-36. Results The CDC Healthy Days items were associated with the SF-36 global measures of physical and mental health. The CDC physically unhealthy days item was associated with the SF-36 subscales for bodily pain, physical role limitations, and general health, while the CDC mentally unhealthy days item was associated with the SF-36 subscales for mental health, emotional role limitations, vitality and social functioning. The SF-36 physical functioning subscale was not independently associated with either of the CDC Healthy Days items. Conclusions The CDC Healthy Days items measure similar domains as the SF-36 but appear to assess HRQOL without regard to limitations in functioning. PMID:27259343

  2. Analyzing Item Generation with Natural Language Processing Tools for the "TOEIC"® Listening Test. Research Report. ETS RR-17-52

    ERIC Educational Resources Information Center

    Yoon, Su-Youn; Lee, Chong Min; Houghton, Patrick; Lopez, Melissa; Sakano, Jennifer; Loukina, Anastasia; Krovetz, Bob; Lu, Chi; Madani, Nitin

    2017-01-01

    In this study, we developed assistive tools and resources to support TOEIC® Listening test item generation. There has recently been an increased need for a large pool of items for these tests. This need has, in turn, inspired efforts to increase the efficiency of item generation while maintaining the quality of the created items. We aimed to…

  3. An Analysis of Factors Affecting the Difficulty of Dialogue Items in TOEFL Listening Comprehension. TOEFL Research Reports, 51.

    ERIC Educational Resources Information Center

    Nissan, Susan; And Others

    One of the item types in the Listening Comprehension section of the Test of English as a Foreign Language (TOEFL) test is the dialogue. Because the dialogue item pool needs to have an appropriate balance of items at a range of difficulty levels, test developers have examined items at various difficulty levels in an attempt to identify their…

  4. Working memory and inhibitory control across the life span: Intrusion errors in the Reading Span Test.

    PubMed

    Robert, Christelle; Borella, Erika; Fagot, Delphine; Lecerf, Thierry; de Ribaupierre, Anik

    2009-04-01

    The aim of this study was to examine to what extent inhibitory control and working memory capacity are related across the life span. Intrusion errors committed by children and younger and older adults were investigated in two versions of the Reading Span Test. In Experiment 1, a mixed Reading Span Test with items of various list lengths was administered. Older adults and children recalled fewer correct words and produced more intrusions than did young adults. Also, age-related differences were found in the type of intrusions committed. In Experiment 2, an adaptive Reading Span Test was administered, in which the list length of items was adapted to each individual's working memory capacity. Age groups differed neither on correct recall nor on the rate of intrusions, but they differed on the type of intrusions. Altogether, these findings indicate that the availability of attentional resources influences the efficiency of inhibition across the life span.

  5. Item development process and analysis of 50 case-based items for implementation on the Korean Nursing Licensing Examination.

    PubMed

    Park, In Sook; Suh, Yeon Ok; Park, Hae Sook; Kang, So Young; Kim, Kwang Sung; Kim, Gyung Hee; Choi, Yeon-Hee; Kim, Hyun-Ju

    2017-01-01

    The purpose of this study was to improve the quality of items on the Korean Nursing Licensing Examination by developing and evaluating case-based items that reflect integrated nursing knowledge. We conducted a cross-sectional observational study to develop new case-based items. The methods for developing test items included expert workshops, brainstorming, and verification of content validity. After a mock examination of undergraduate nursing students using the newly developed case-based items, we evaluated the appropriateness of the items through classical test theory and item response theory. A total of 50 case-based items were developed for the mock examination, and content validity was evaluated. The question items integrated 34 discrete elements of integrated nursing knowledge. The mock examination was taken by 741 baccalaureate students in their fourth year of study at 13 universities. Their average score on the mock examination was 57.4, and the examination showed a reliability of 0.40. According to classical test theory, the average level of item difficulty of the items was 57.4% (80%-100% for 12 items; 60%-80% for 13 items; and less than 60% for 25 items). The mean discrimination index was 0.19, and was above 0.30 for 11 items and 0.20 to 0.29 for 15 items. According to item response theory, the item discrimination parameter (in the logistic model) was none for 10 items (0.00), very low for 20 items (0.01 to 0.34), low for 12 items (0.35 to 0.64), moderate for 6 items (0.65 to 1.34), high for 1 item (1.35 to 1.69), and very high for 1 item (above 1.70). The item difficulty was very easy for 24 items (below -2.0), easy for 8 items (-2.0 to -0.5), medium for 6 items (-0.5 to 0.5), hard for 3 items (0.5 to 2.0), and very hard for 9 items (2.0 or above). The goodness-of-fit test in terms of the 2-parameter item response model between the range of 2.0 to 0.5 revealed that 12 items had an ideal correct answer rate. We surmised that the low reliability of the mock examination was influenced by the timing of the test for the examinees and the inappropriate difficulty of the items. Our study suggested a methodology for the development of future case-based items for the Korean Nursing Licensing Examination.

  6. Assessing the capacity of ministries of health to use research in decision-making: conceptual framework and tool.

    PubMed

    Rodríguez, Daniela C; Hoe, Connie; Dale, Elina M; Rahman, M Hafizur; Akhter, Sadika; Hafeez, Assad; Irava, Wayne; Rajbangshi, Preety; Roman, Tamlyn; Ţîrdea, Marcela; Yamout, Rouham; Peters, David H

    2017-08-01

    The capacity to demand and use research is critical for governments if they are to develop policies that are informed by evidence. Existing tools designed to assess how government officials use evidence in decision-making have significant limitations for low- and middle-income countries (LMICs); they are rarely tested in LMICs and focus only on individual capacity. This paper introduces an instrument that was developed to assess Ministry of Health (MoH) capacity to demand and use research evidence for decision-making, which was tested for reliability and validity in eight LMICs (Bangladesh, Fiji, India, Lebanon, Moldova, Pakistan, South Africa, Zambia). Instrument development was based on a new conceptual framework that addresses individual, organisational and systems capacities, and items were drawn from existing instruments and a literature review. After initial item development and pre-testing to address face validity and item phrasing, the instrument was reduced to 54 items for further validation and item reduction. In-country study teams interviewed a systematic sample of 203 MoH officials. Exploratory factor analysis was used in addition to standard reliability and validity measures to further assess the items. Thirty items divided between two factors representing organisational and individual capacity constructs were identified. South Africa and Zambia demonstrated the highest level of organisational capacity to use research, whereas Pakistan and Bangladesh were the lowest two. In contrast, individual capacity was highest in Pakistan, followed by South Africa, whereas Bangladesh and Lebanon were the lowest. The framework and related instrument represent a new opportunity for MoHs to identify ways to understand and improve capacities to incorporate research evidence in decision-making, as well as to provide a basis for tracking change.

  7. Validity and reliability of short forms of parental-caregiver perception and family impact scale in a Telugu speaking population of India.

    PubMed

    Kumar, Santhosh; Kroon, Jeroen; Lalloo, Ratilal; Johnson, Newell W

    2016-03-01

    Parental-Caregiver Perception Questionnaire (P-CPQ) and Family Impact Scale (FIS) are commonly used measures to evaluate the parent's perception of the impact of children's oral health on quality of life and family respectively. Recently, shorter forms of P-CPQ and FIS have been developed. No study has sought to validate these short forms in other languages and cultures. This study aimed to evaluate the validity and reliability of FIS, 8 and 16-item P-CPQ in a Telugu speaking population of India. For this cross-sectional study, a multi-stage random sampling technique was used to recruit 11-13 year-old schoolchildren of Medak district, Telangana, India and their parents (n = 1342). Parents were approached with questionnaires through their children who underwent clinical examinations for dental caries, fluorosis and malocclusion. The translated versions underwent pilot testing (n = 40), test-retest reliability was also assessed (n = 161). The overall summary scale and subscales of the short forms of P-CPQ and FIS failed to discriminate between the categories of dental caries severity. Also, malocclusion status was not related to the domain or overall scores of both the short forms of P-CPQ. There were significant differences in subscale and overall scores of 16 and 8-item P-CPQ and FIS between the fluorosis categories. Both 16 and 8-item P-CPQ summary scales were significantly related to parent's global rating of oral health (16-item, r = 0.30, p < 0.01; 8-item, r = 0.28, p < 0.01) and overall wellbeing (16-item, r = 0.22, p < 0.01; 8-item, r = 0.22, p < 0.01), thereby exhibiting good construct validity. However, the correlation of emotional and social wellbeing scales of short forms of P-CPQ and FIS with global ratings was of low strength. Cronbach's alphas for FIS, 16-items and 8-items P-CPQ scales were 0.78, 0.83 and 0.71 respectively, while the Intra-Class Correlation coefficients were 0.752, 0.812 and 0.816 respectively. Cronbach's alphas for most of the subscales of short forms of P-CPQ were less than 0.7. The overall scales of 16 and 8-items P-CPQ scales demonstrated good construct validity while the construct validity of FIS was questionable. Discriminant validity of all the three instruments was good only in relation to fluorosis. Overall scales of all three short forms exhibited acceptable internal consistency and reliability on repeated administrations.

  8. Calibration of context-specific survey items to assess youth physical activity behaviour.

    PubMed

    Saint-Maurice, Pedro F; Welk, Gregory J; Bartee, R Todd; Heelan, Kate

    2017-05-01

    This study tests calibration models to re-scale context-specific physical activity (PA) items to accelerometer-derived PA. A total of 195 4th-12th grades children wore an Actigraph monitor and completed the Physical Activity Questionnaire (PAQ) one week later. The relative time spent in moderate-to-vigorous PA (MVPA % ) obtained from the Actigraph at recess, PE, lunch, after-school, evening and weekend was matched with a respective item score obtained from the PAQ's. Item scores from 145 participants were calibrated against objective MVPA % using multiple linear regression with age, and sex as additional predictors. Predicted minutes of MVPA for school, out-of-school and total week were tested in the remaining sample (n = 50) using equivalence testing. The results showed that PAQ β-weights ranged from 0.06 (lunch) to 4.94 (PE) MVPA % (P < 0.05) and models root mean square error ranged from 4.2% (evening) to 20.2% (recess). When applied to an independent sample, differences between PAQ and accelerometer MVPA at school and out-of-school ranged from -15.6 to +3.8 min and the PAQ was within 10-15% of accelerometer measured activity. This study demonstrated that context-specific items can be calibrated to predict minutes of MVPA in groups of youth during in- and out-of-school periods.

  9. On the Relationship Between Classical Test Theory and Item Response Theory: From One to the Other and Back.

    PubMed

    Raykov, Tenko; Marcoulides, George A

    2016-04-01

    The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete nature of the observed items. Two distinct observational equivalence approaches are outlined that render the item response models from corresponding classical test theory-based models, and can each be used to obtain the former from the latter models. Similarly, classical test theory models can be furnished using the reverse application of either of those approaches from corresponding item response models.

  10. Anorexia/cachexia-related quality of life for children with cancer.

    PubMed

    Lai, Jin-Shei; Cella, David; Peterman, Amy; Barocas, Joshua; Goldman, Stewart

    2005-10-01

    Anorexia is a common symptom in patients with cancer, which can lead to poor tolerance of treatment and can contribute to cachexia in extreme cases. Children with advanced-stage cancer are especially vulnerable to malnutrition resulting from anorexia and cachexia. Currently, there are no instruments that measure common concerns specifically associated with anorexia and cachexia in children with cancer. The purpose of the current article was to test the psychometric properties of a newly developed pediatric Functional Assessment of Anorexia and Cachexia Therapy (peds-FAACT) for children with cancer. Ninety-six patients (ages 7-17 yrs) receiving cancer treatment and their parents were asked to complete the 12-item peds-FAACT. The authors implemented both classical test theory and item response theory to evaluate the agreement between parents and patients, internal consistency and unidimensionality of the scale, and stability of items across subgroups. As a result, a patient-reported six-item scale was recommended as the core measure for all pediatric patients with cancer and four additional peripheral items were recommended for adolescent patients. The peds-FAACT demonstrated good psychometric properties, differentiated patients with different functional performance status, and was determined to be a useful tool for future clinical trials.

  11. Electrophysiologically dissociating episodic preretrieval processing.

    PubMed

    Bridger, Emma K; Mecklinger, Axel

    2012-06-01

    Contrasts between ERPs elicited by new items from tests with distinct episodic retrieval requirements index preretrieval processing. Preretrieval operations are thought to facilitate the recovery of task-relevant information because they have been shown to correlate with response accuracy in tasks in which prioritizing the retrieval of this information could be a useful strategy. This claim was tested here by contrasting new item ERPs from two retrieval tasks, each designed to explicitly require the recovery of a different kind of mnemonic information. New item ERPs differed from 400 msec poststimulus, but the distribution of these effects varied markedly, depending upon participants' response accuracy: A protracted posteriorly located effect was present for higher performing participants, whereas an anteriorly distributed effect occurred for lower performing participants. The magnitude of the posterior effect from 400 to 800 msec correlated with response accuracy, supporting the claim that preretrieval processes facilitate the recovery of task-relevant information. Additional contrasts between ERPs from these tasks and an old/new recognition task operating as a relative baseline revealed task-specific effects with nonoverlapping scalp topographies, in line with the assumption that these new item ERP effects reflect qualitatively distinct retrieval operations. Similarities in these effects were also used to reason about preretrieval processes related to the general requirement to recover contextual details. These insights, alongside the distinct pattern of effects for the two accuracy groups, reveal the multifarious nature of preretrieval processing while indicating that only some of these classes of operation are systematically related to response accuracy in recognition memory tasks.

  12. Locally Dependent Linear Logistic Test Model with Person Covariates

    ERIC Educational Resources Information Center

    Ip, Edward H.; Smits, Dirk J. M.; De Boeck, Paul

    2009-01-01

    The article proposes a family of item-response models that allow the separate and independent specification of three orthogonal components: item attribute, person covariate, and local item dependence. Special interest lies in extending the linear logistic test model, which is commonly used to measure item attributes, to tests with embedded item…

  13. Applying Bayesian Item Selection Approaches to Adaptive Tests Using Polytomous Items

    ERIC Educational Resources Information Center

    Penfield, Randall D.

    2006-01-01

    This study applied the maximum expected information (MEI) and the maximum posterior-weighted information (MPI) approaches of computer adaptive testing item selection to the case of a test using polytomous items following the partial credit model. The MEI and MPI approaches are described. A simulation study compared the efficiency of ability…

  14. Do Reading Experts Agree with MCAT Verbal Reasoning Item Classifications?

    ERIC Educational Resources Information Center

    Jackson, Evelyn W.; And Others

    1994-01-01

    Examined whether expert raters (n=5) could agree about classification of Medical College Admission Test (MCAT) items and whether they agreed with MCAT student manual in labeling skill being measured by each test item. Results revealed difficulties in replicating authors' labeling of skills for reading items on practice test provided with 1991 MCAT…

  15. Differential Item Functioning: Its Consequences. Research Report. ETS RR-10-01

    ERIC Educational Resources Information Center

    Lee, Yi-Hsuan; Zhang, Jinming

    2010-01-01

    This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…

  16. Electronics. Criterion-Referenced Test (CRT) Item Bank.

    ERIC Educational Resources Information Center

    Davis, Diane, Ed.

    This document contains 519 criterion-referenced multiple choice and true or false test items for a course in electronics. The test item bank is designed to work with both the Vocational Instructional Management System (VIMS) and the Vocational Administrative Management System (VAMS) in Missouri. The items are grouped into 15 units covering the…

  17. Auto Mechanics. Criterion-Referenced Test (CRT) Item Bank.

    ERIC Educational Resources Information Center

    Tannehill, Dana, Ed.

    This document contains 546 criterion-referenced multiple choice and true or false test items for a course in auto mechanics. The test item bank is designed to work with both the Vocational Instructional Management System (VIMS) and Vocational Administrative Management System (VAMS) in Missouri. The items are grouped into 35 units covering the…

  18. Developing a Strategy for Using Technology-Enhanced Items in Large-Scale Standardized Tests

    ERIC Educational Resources Information Center

    Bryant, William

    2017-01-01

    As large-scale standardized tests move from paper-based to computer-based delivery, opportunities arise for test developers to make use of items beyond traditional selected and constructed response types. Technology-enhanced items (TEIs) have the potential to provide advantages over conventional items, including broadening construct measurement,…

  19. Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination

    PubMed Central

    2016-01-01

    Purpose: The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. Methods: The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. Results: In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). Conclusion: In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination. PMID:26883810

  20. Reliability of the Client-Centeredness of Goal Setting (C-COGS) Scale in Acquired Brain Injury Rehabilitation.

    PubMed

    Doig, Emmah; Prescott, Sarah; Fleming, Jennifer; Cornwell, Petrea; Kuipers, Pim

    2016-01-01

    To examine the internal reliability and test-retest reliability of the Client-Centeredness of Goal Setting (C-COGS) scale. The C-COGS scale was administered to 42 participants with acquired brain injury after completion of multidisciplinary goal planning. Internal reliability of scale items was examined using item-partial total correlations and Cronbach's α coefficient. The scale was readministered within a 1-mo period to a subsample of 12 participants to examine test-retest reliability by calculating exact and close percentage agreement for each item. After examination of item-partial total correlations, test items were revised. The revised items demonstrated stronger internal consistency than the original items. Preliminary evaluation of test-retest reliability was fair, with an average exact percent agreement across all test items of 67%. Findings support the preliminary reliability of the C-COGS scale as a tool to evaluate and promote client-centered goal planning in brain injury rehabilitation. Copyright © 2016 by the American Occupational Therapy Association, Inc.

Top