Sample records for activities evaluation items

  1. Evaluation of item candidates for a diabetic retinopathy quality of life item bank.

    PubMed

    Fenwick, Eva K; Pesudovs, Konrad; Khadka, Jyoti; Rees, Gwyn; Wong, Tien Y; Lamoureux, Ecosse L

    2013-09-01

    We are developing an item bank assessing the impact of diabetic retinopathy (DR) on quality of life (QoL) using a rigorous multi-staged process combining qualitative and quantitative methods. We describe here the first two qualitative phases: content development and item evaluation. After a comprehensive literature review, items were generated from four sources: (1) 34 previously validated patient-reported outcome measures; (2) five published qualitative articles; (3) eight focus groups and 18 semi-structured interviews with 57 DR patients; and (4) seven semi-structured interviews with diabetes or ophthalmic experts. Items were then evaluated during 3 stages, namely binning (grouping) and winnowing (reduction) based on key criteria and panel consensus; development of item stems and response options; and pre-testing of items via cognitive interviews with patients. The content development phase yielded 1,165 unique items across 7 QoL domains. After 3 sessions of binning and winnowing, items were reduced to a minimally representative set (n = 312) across 9 domains of QoL: visual symptoms; ocular surface symptoms; activity limitation; mobility; emotional; health concerns; social; convenience; and economic. After 8 cognitive interviews, 42 items were amended resulting in a final set of 314 items. We have employed a systematic approach to develop items for a DR-specific QoL item bank. The psychometric properties of the nine QoL subscales will be assessed using Rasch analysis. The resulting validated item bank will allow clinicians and researchers to better understand the QoL impact of DR and DR therapies from the patient's perspective.

  2. Evaluation of Item Candidates: The PROMIS Qualitative Item Review

    PubMed Central

    DeWalt, Darren A.; Rothrock, Nan; Yount, Susan; Stone, Arthur A.

    2009-01-01

    One of the PROMIS (Patient-Reported Outcome Measurement Information System) network's primary goals is the development of a comprehensive item bank for patient-reported outcomes of chronic diseases. For its first set of item banks, PROMIS chose to focus on pain, fatigue, emotional distress, physical function, and social function. An essential step for the development of an item pool is the identification, evaluation, and revision of extant questionnaire items for the core item pool. In this work, we also describe the systematic process wherein items are classified for subsequent statistical processing by the PROMIS investigators. Six phases of item development are documented: identification of extant items, item classification and selection, item review and revision, focus group input on domain coverage, cognitive interviews with individual items, and final revision before field testing. Identification of items refers to the systematic search for existing items in currently available scales. Expert item review and revision was conducted by trained professionals who reviewed the wording of each item and revised as appropriate for conventions adopted by the PROMIS network. Focus groups were used to confirm domain definitions and to identify new areas of item development for future PROMIS item banks. Cognitive interviews were used to examine individual items. Items successfully screened through this process were sent to field testing and will be subjected to innovative scale construction procedures. PMID:17443114

  3. Evaluating Item Fit for Multidimensional Item Response Models

    ERIC Educational Resources Information Center

    Zhang, Bo; Stone, Clement A.

    2008-01-01

    This research examines the utility of the s-x[superscript 2] statistic proposed by Orlando and Thissen (2000) in evaluating item fit for multidimensional item response models. Monte Carlo simulation was conducted to investigate both the Type I error and statistical power of this fit statistic in analyzing two kinds of multidimensional test…

  4. Developing and evaluating innovative items for the NCLEX: Part 2, item characteristics and cognitive processing.

    PubMed

    Wendt, Anne; Harmes, J Christine

    2009-01-01

    This article is a continuation of the research on the development and evaluation of innovative item formats for the NCLEX examinations that was published in the March/April 2009 edition of Nurse Educator. The authors discuss the innovative item templates and evaluate the statistical characteristics and level of cognitive processing required to answer the examination items.

  5. Measuring everyday functional competence using the Rasch assessment of everyday activity limitations (REAL) item bank.

    PubMed

    Oude Voshaar, Martijn A H; Ten Klooster, Peter M; Vonkeman, Harald E; van de Laar, Mart A F J

    2017-11-01

    Traditional patient-reported physical function instruments often poorly differentiate patients with mild-to-moderate disability. We describe the development and psychometric evaluation of a generic item bank for measuring everyday activity limitations in outpatient populations. Seventy-two items generated from patient interviews and mapped to the International Classification of Functioning, Disability and Health (ICF) domestic life chapter were administered to 1128 adults representative of the Dutch population. The partial credit model was fitted to the item responses and evaluated with respect to its assumptions, model fit, and differential item functioning (DIF). Measurement performance of a computerized adaptive testing (CAT) algorithm was compared with the SF-36 physical functioning scale (PF-10). A final bank of 41 items was developed. All items demonstrated acceptable fit to the partial credit model and measurement invariance across age, sex, and educational level. Five- and ten-item CAT simulations were shown to have high measurement precision, which exceeded that of SF-36 physical functioning scale across the physical function continuum. Floor effects were absent for a 10-item empirical CAT simulation, and ceiling effects were low (13.5%) compared with SF-36 physical functioning (38.1%). CAT also discriminated better than SF-36 physical functioning between age groups, number of chronic conditions, and respondents with or without rheumatic conditions. The Rasch assessment of everyday activity limitations (REAL) item bank will hopefully prove a useful instrument for assessing everyday activity limitations. T-scores obtained using derived measures can be used to benchmark physical function outcomes against the general Dutch adult population.

  6. Methodology for Developing and Evaluating the PROMIS® Smoking Item Banks

    PubMed Central

    Cai, Li; Stucky, Brian D.; Tucker, Joan S.; Shadel, William G.; Edelen, Maria Orlando

    2014-01-01

    Introduction: This article describes the procedures used in the PROMIS® Smoking Initiative for the development and evaluation of item banks, short forms (SFs), and computerized adaptive tests (CATs) for the assessment of 6 constructs related to cigarette smoking: nicotine dependence, coping expectancies, emotional and sensory expectancies, health expectancies, psychosocial expectancies, and social motivations for smoking. Methods: Analyses were conducted using response data from a large national sample of smokers. Items related to each construct were subjected to extensive item factor analyses and evaluation of differential item functioning (DIF). Final item banks were calibrated, and SF assessments were developed for each construct. The performance of the SFs and the potential use of the item banks for CAT administration were examined through simulation study. Results: Item selection based on dimensionality assessment and DIF analyses produced item banks that were essentially unidimensional in structure and free of bias. Simulation studies demonstrated that the constructs could be accurately measured with a relatively small number of carefully selected items, either through fixed SFs or CAT-based assessment. Illustrative results are presented, and subsequent articles provide detailed discussion of each item bank in turn. Conclusions: The development of the PROMIS smoking item banks provides researchers with new tools for measuring smoking-related constructs. The use of the calibrated item banks and suggested SF assessments will enhance the quality of score estimates, thus advancing smoking research. Moreover, the methods used in the current study, including innovative approaches to item selection and SF construction, may have general relevance to item bank development and evaluation. PMID:23943843

  7. Building an Evaluation Scale using Item Response Theory.

    PubMed

    Lalor, John P; Wu, Hao; Yu, Hong

    2016-11-01

    Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.

  8. Methodology for developing and evaluating the PROMIS smoking item banks.

    PubMed

    Hansen, Mark; Cai, Li; Stucky, Brian D; Tucker, Joan S; Shadel, William G; Edelen, Maria Orlando

    2014-09-01

    This article describes the procedures used in the PROMIS Smoking Initiative for the development and evaluation of item banks, short forms (SFs), and computerized adaptive tests (CATs) for the assessment of 6 constructs related to cigarette smoking: nicotine dependence, coping expectancies, emotional and sensory expectancies, health expectancies, psychosocial expectancies, and social motivations for smoking. Analyses were conducted using response data from a large national sample of smokers. Items related to each construct were subjected to extensive item factor analyses and evaluation of differential item functioning (DIF). Final item banks were calibrated, and SF assessments were developed for each construct. The performance of the SFs and the potential use of the item banks for CAT administration were examined through simulation study. Item selection based on dimensionality assessment and DIF analyses produced item banks that were essentially unidimensional in structure and free of bias. Simulation studies demonstrated that the constructs could be accurately measured with a relatively small number of carefully selected items, either through fixed SFs or CAT-based assessment. Illustrative results are presented, and subsequent articles provide detailed discussion of each item bank in turn. The development of the PROMIS smoking item banks provides researchers with new tools for measuring smoking-related constructs. The use of the calibrated item banks and suggested SF assessments will enhance the quality of score estimates, thus advancing smoking research. Moreover, the methods used in the current study, including innovative approaches to item selection and SF construction, may have general relevance to item bank development and evaluation. © The Author 2013. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  9. Building an Evaluation Scale using Item Response Theory

    PubMed Central

    Lalor, John P.; Wu, Hao; Yu, Hong

    2016-01-01

    Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.1 PMID:28004039

  10. Writing, Evaluating and Assessing Data Response Items in Economics.

    ERIC Educational Resources Information Center

    Trotman-Dickenson, D. I.

    1989-01-01

    Describes some of the problems in writing data response items in economics for use by A Level and General Certificate of Secondary Education (GCSE) students. Examines the experience of two series of workshops on writing items, evaluating them and assessing responses from schools. Offers suggestions for producing packages of data response items as…

  11. Mental health in primary care: an evaluation using the Item Response Theory

    PubMed Central

    da Rocha, Hugo André; dos Santos, Alaneir de Fátima; Reis, Ilka Afonso; Santos, Marcos Antônio da Cunha; Cherchiglia, Mariângela Leal

    2018-01-01

    ABSTRACT OBJECTIVE To determine the items of the Brazilian National Program for Improving Access and Quality of Primary Care that better evaluate the capacity to provide mental health care. METHODS This is a cross-sectional study carried out using the Graded Response Model of the Item Response Theory using secondary data from the second cycle of the National Program for Improving Access and Quality of Primary Care, which evaluates 30,523 primary care teams in the period from 2013 to 2014 in Brazil. The internal consistency, correlation between items, and correlation between items and the total score were tested using the Cronbach’s alpha, Spearman’s correlation, and point biserial coefficients, respectively. The assumptions of unidimensionality and local independence of the items were tested. Word clouds were used as one way to present the results. RESULTS The items with the greatest ability to discriminate were scheduling of the agenda according to risk stratification, keeping of records of the most serious cases of users in psychological distress, and provision of group care. The items that required a higher level of mental health care in the parameter of location were the provision of any type of group care and the provision of educational and mental health promotion activities. Total Cronbach’s alpha coefficient was 0.87. The items that obtained the highest correlation with total score were the recording of the most serious cases of users in psychological distress and scheduling of the agenda according to risk stratification. The final scores obtained oscillated between -2.07 (minimum) and 1.95 (maximum). CONCLUSIONS There are important aspects in the discrimination of the capacity to provide mental health care by primary health care teams: risk stratification for care management, follow-up of the most serious cases, group care, and preventive and health promotion actions. PMID:29489992

  12. Mental health in primary care: an evaluation using the Item Response Theory.

    PubMed

    Rocha, Hugo André da; Santos, Alaneir de Fátima Dos; Reis, Ilka Afonso; Santos, Marcos Antônio da Cunha; Cherchiglia, Mariângela Leal

    2018-01-01

    OBJECTIVE To determine the items of the Brazilian National Program for Improving Access and Quality of Primary Care that better evaluate the capacity to provide mental health care. METHODS This is a cross-sectional study carried out using the Graded Response Model of the Item Response Theory using secondary data from the second cycle of the National Program for Improving Access and Quality of Primary Care, which evaluates 30,523 primary care teams in the period from 2013 to 2014 in Brazil. The internal consistency, correlation between items, and correlation between items and the total score were tested using the Cronbach's alpha, Spearman's correlation, and point biserial coefficients, respectively. The assumptions of unidimensionality and local independence of the items were tested. Word clouds were used as one way to present the results. RESULTS The items with the greatest ability to discriminate were scheduling of the agenda according to risk stratification, keeping of records of the most serious cases of users in psychological distress, and provision of group care. The items that required a higher level of mental health care in the parameter of location were the provision of any type of group care and the provision of educational and mental health promotion activities. Total Cronbach's alpha coefficient was 0.87. The items that obtained the highest correlation with total score were the recording of the most serious cases of users in psychological distress and scheduling of the agenda according to risk stratification. The final scores obtained oscillated between -2.07 (minimum) and 1.95 (maximum). CONCLUSIONS There are important aspects in the discrimination of the capacity to provide mental health care by primary health care teams: risk stratification for care management, follow-up of the most serious cases, group care, and preventive and health promotion actions.

  13. Evaluating Common Item Block Options When Faced with Practical Constraints

    ERIC Educational Resources Information Center

    Wolkowitz, Amanda; Davis-Becker, Susan

    2015-01-01

    This study evaluates the impact of common item characteristics on the outcome of equating in credentialing examinations when traditionally recommended representation is not possible. This research used real data sets from several credentialing exams to test the impact of content representation, item statistics, and number of common items on…

  14. ITEM SELECTION TECHNIQUES AND EVALUATION OF INSTRUCTIONAL OBJECTIVES.

    ERIC Educational Resources Information Center

    COX, RICHARD C.

    THE VALIDITY OF AN EDUCATIONAL ACHIEVEMENT TEST DEPENDS UPON THE CORRESPONDENCE BETWEEN SPECIFIED EDUCATIONAL OBJECTIVES AND THE EXTENT TO WHICH THESE OBJECTIVES ARE MEASURED BY THE EVALUATION INSTRUMENT. THIS STUDY IS DESIGNED TO EVALUATE THE EFFECT OF STATISTICAL ITEM SELECTION ON THE STRUCTURE OF THE FINAL EVALUATION INSTRUMENT AS COMPARED WITH…

  15. Evaluation of the Patient-Reported Outcomes Information System (PROMIS(®)) Spanish-language physical functioning items.

    PubMed

    Paz, Sylvia H; Spritzer, Karen L; Morales, Leo S; Hays, Ron D

    2013-09-01

    To evaluate the equivalence of the PROMIS(®) physical functioning item bank by language of administration (English versus Spanish). The PROMIS(®) wave 1 English-language physical functioning bank consists of 124 items, and 114 of these were translated into Spanish. Item frequencies, means and standard deviations, item-scale correlations, and internal consistency reliability were calculated. The IRT assumption of unidimensionality was evaluated by fitting a single-factor confirmatory factor analytic model. IRT threshold and discrimination parameters were estimated using Samejima's Graded Response Model. DIF by language of administration was evaluated. Item means ranged from 2.53 (SD = 1.36) to 4.62 (SD = 0.82). Coefficient alpha was 0.99, and item-rest correlations ranged from 0.41 to 0.89. A one-factor model fits the data well (CFI = 0.971, TLI = 0.970, and RMSEA = 0.052). The slope parameters ranged from 0.45 ("Are you able to run 10 miles?") to 4.50 ("Are you able to put on a shirt or blouse?"). The threshold parameters ranged from -1.92 ("How much do physical health problems now limit your usual physical activities (such as walking or climbing stairs)?") to 6.06 ("Are you able to run 10 miles?"). Fifty of the 114 items were flagged for DIF based on an R(2) of 0.02 or above criterion. The expected total score was higher for Spanish- than English-language respondents. English- and Spanish-speaking subjects with the same level of underlying physical function responded differently to 50 of 114 items. This study has important implications in the study of physical functioning among diverse populations.

  16. Development and psychometric characteristics of the SCI-QOL Ability to Participate and Satisfaction with Social Roles and Activities item banks and short forms.

    PubMed

    Heinemann, Allen W; Kisala, Pamela A; Hahn, Elizabeth A; Tulsky, David S

    2015-05-01

    To develop a spinal cord injury (SCI)-focused version of PROMIS and Neuro-QOL social domain item banks; evaluate the psychometric properties of items developed for adults with SCI; and report information to facilitate clinical and research use. We used a mixed-methods design to develop and evaluate Ability to Participate in Social Roles and Activities and Satisfaction with Social Roles and Activities items. Focus groups helped define the constructs; cognitive interviews helped revise items; and confirmatory factor analysis and item response theory methods helped calibrate item banks and evaluate differential item functioning related to demographic and injury characteristics. Five SCI Model System sites and one Veterans Administration medical center. The calibration sample consisted of 641 individuals; a reliability sample consisted of 245 individuals residing in the community. A subset of 27 Ability to Participate and 35 Satisfaction items demonstrated good measurement properties and negligible differential item functioning related to demographic and injury characteristics. The SCI-specific measures correlate strongly with the PROMIS and Neuro-QOL versions. Ten item short forms correlate >0.96 with the full banks. Variable-length CATs with a minimum of 4 items, variable-length CATs with a minimum of 8 items, fixed-length CATs of 10 items, and the 10-item short forms demonstrate construct coverage and measurement error that is comparable to the full item bank. The Ability to Participate and Satisfaction with Social Roles and Activities CATs and short forms demonstrate excellent psychometric properties and are suitable for clinical and research applications.

  17. Development of the Oxford Participation and Activities Questionnaire: constructing an item pool

    PubMed Central

    Kelly, Laura; Jenkinson, Crispin; Dummett, Sarah; Dawson, Jill; Fitzpatrick, Ray; Morley, David

    2015-01-01

    Purpose The Oxford Participation and Activities Questionnaire is a patient-reported outcome measure in development that is grounded on the World Health Organization International Classification of Functioning, Disability, and Health (ICF). The study reported here aimed to inform and generate an item pool for the new measure, which is specifically designed for the assessment of participation and activity in patients experiencing a range of health conditions. Methods Items were informed through in-depth interviews conducted with 37 participants spanning a range of conditions. Interviews aimed to identify how their condition impacted their ability to participate in meaningful activities. Conditions included arthritis, cancer, chronic back pain, diabetes, motor neuron disease, multiple sclerosis, Parkinson’s disease, and spinal cord injury. Transcripts were analyzed using the framework method. Statements relating to ICF themes were recast as questionnaire items and shown for review to an expert panel. Cognitive debrief interviews (n=13) were used to assess items for face and content validity. Results ICF themes relevant to activities and participation in everyday life were explored, and a total of 222 items formed the initial item pool. This item pool was refined by the research team and 28 generic items were mapped onto all nine chapters of the ICF construct, detailing activity and participation. Cognitive interviewing confirmed the questionnaire instructions, items, and response options were acceptable to participants. Conclusion Using a clear conceptual basis to inform item generation, 28 items have been identified as suitable to undergo further psychometric testing. A large-scale postal survey will follow in order to refine the instrument further and to assess its psychometric properties. The final instrument is intended for use in clinical trials and interventions targeted at maintaining or improving activity and participation. PMID:26056503

  18. Development of the Oxford Participation and Activities Questionnaire: constructing an item pool.

    PubMed

    Kelly, Laura; Jenkinson, Crispin; Dummett, Sarah; Dawson, Jill; Fitzpatrick, Ray; Morley, David

    2015-01-01

    The Oxford Participation and Activities Questionnaire is a patient-reported outcome measure in development that is grounded on the World Health Organization International Classification of Functioning, Disability, and Health (ICF). The study reported here aimed to inform and generate an item pool for the new measure, which is specifically designed for the assessment of participation and activity in patients experiencing a range of health conditions. Items were informed through in-depth interviews conducted with 37 participants spanning a range of conditions. Interviews aimed to identify how their condition impacted their ability to participate in meaningful activities. Conditions included arthritis, cancer, chronic back pain, diabetes, motor neuron disease, multiple sclerosis, Parkinson's disease, and spinal cord injury. Transcripts were analyzed using the framework method. Statements relating to ICF themes were recast as questionnaire items and shown for review to an expert panel. Cognitive debrief interviews (n=13) were used to assess items for face and content validity. ICF themes relevant to activities and participation in everyday life were explored, and a total of 222 items formed the initial item pool. This item pool was refined by the research team and 28 generic items were mapped onto all nine chapters of the ICF construct, detailing activity and participation. Cognitive interviewing confirmed the questionnaire instructions, items, and response options were acceptable to participants. Using a clear conceptual basis to inform item generation, 28 items have been identified as suitable to undergo further psychometric testing. A large-scale postal survey will follow in order to refine the instrument further and to assess its psychometric properties. The final instrument is intended for use in clinical trials and interventions targeted at maintaining or improving activity and participation.

  19. Factor- and Item-Level Analyses of the 38-Item Activities Scale for Kids-Performance

    ERIC Educational Resources Information Center

    Bagley, Anita M.; Gorton, George E.; Bjornson, Kristie; Bevans, Katherine; Stout, Jean L.; Narayanan, Unni; Tucker, Carole A.

    2011-01-01

    Aim: Children and adolescents highly value their ability to participate in relevant daily life and recreational activities. The Activities Scale for Kids-performance (ASKp) instrument measures the frequency of performance of 30 common childhood activities, and has been shown to be valid and reliable. A revised and expanded 38-item ASKp (ASKp38)…

  20. Evaluation of adding item-response theory analysis for evaluation of the European Board of Ophthalmology Diploma examination.

    PubMed

    Mathysen, Danny G P; Aclimandos, Wagih; Roelant, Ella; Wouters, Kristien; Creuzot-Garcher, Catherine; Ringens, Peter J; Hawlina, Marko; Tassignon, Marie-José

    2013-11-01

    To investigate whether introduction of item-response theory (IRT) analysis, in parallel to the 'traditional' statistical analysis methods available for performance evaluation of multiple T/F items as used in the European Board of Ophthalmology Diploma (EBOD) examination, has proved beneficial, and secondly, to study whether the overall assessment performance of the current written part of EBOD is sufficiently high (KR-20≥ 0.90) to be kept as examination format in future EBOD editions. 'Traditional' analysis methods for individual MCQ item performance comprise P-statistics, Rit-statistics and item discrimination, while overall reliability is evaluated through KR-20 for multiple T/F items. The additional set of statistical analysis methods for the evaluation of EBOD comprises mainly IRT analysis. These analysis techniques are used to monitor whether the introduction of negative marking for incorrect answers (since EBOD 2010) has a positive influence on the statistical performance of EBOD as a whole and its individual test items in particular. Item-response theory analysis demonstrated that item performance parameters should not be evaluated individually, but should be related to one another. Before the introduction of negative marking, the overall EBOD reliability (KR-20) was good though with room for improvement (EBOD 2008: 0.81; EBOD 2009: 0.78). After the introduction of negative marking, the overall reliability of EBOD improved significantly (EBOD 2010: 0.92; EBOD 2011:0.91; EBOD 2012: 0.91). Although many statistical performance parameters are available to evaluate individual items, our study demonstrates that the overall reliability assessment remains the only crucial parameter to be evaluated allowing comparison. While individual item performance analysis is worthwhile to undertake as secondary analysis, drawing final conclusions seems to be more difficult. Performance parameters need to be related, as shown by IRT analysis. Therefore, IRT analysis has

  1. Meta-analytic guidelines for evaluating single-item reliabilities of personality instruments.

    PubMed

    Spörrle, Matthias; Bekk, Magdalena

    2014-06-01

    Personality is an important predictor of various outcomes in many social science disciplines. However, when personality traits are not the principal focus of research, for example, in global comparative surveys, it is often not possible to assess them extensively. In this article, we first provide an overview of the advantages and challenges of single-item measures of personality, a rationale for their construction, and a summary of alternative ways of assessing their reliability. Second, using seven diverse samples (Ntotal = 4,263) we develop the SIMP-G, the German adaptation of the Single-Item Measures of Personality, an instrument assessing the Big Five with one item per trait, and evaluate its validity and reliability. Third, we integrate previous research and our data into a first meta-analysis of single-item reliabilities of personality measures, and provide researchers with guidelines and recommendations for the evaluation of single-item reliabilities. © The Author(s) 2013.

  2. Evaluation of a preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS).

    PubMed

    Rose, M; Bjorner, J B; Becker, J; Fries, J F; Ware, J E

    2008-01-01

    The Patient-Reported Outcomes Measurement Information System (PROMIS) was initiated to improve precision, reduce respondent burden, and enhance the comparability of health outcomes measures. We used item response theory (IRT) to construct and evaluate a preliminary item bank for physical function assuming four subdomains. Data from seven samples (N=17,726) using 136 items from nine questionnaires were evaluated. A generalized partial credit model was used to estimate item parameters, which were normed to a mean of 50 (SD=10) in the US population. Item bank properties were evaluated through Computerized Adaptive Test (CAT) simulations. IRT requirements were fulfilled by 70 items covering activities of daily living, lower extremity, and central body functions. The original item context partly affected parameter stability. Items on upper body function, and need for aid or devices did not fit the IRT model. In simulations, a 10-item CAT eliminated floor and decreased ceiling effects, achieving a small standard error (< 2.2) across scores from 20 to 50 (reliability >0.95 for a representative US sample). This precision was not achieved over a similar range by any comparable fixed length item sets. The methods of the PROMIS project are likely to substantially improve measures of physical function and to increase the efficiency of their administration using CAT.

  3. Item Difficulty in the Evaluation of Computer-Based Instruction: An Example from Neuroanatomy

    PubMed Central

    Chariker, Julia H.; Naaz, Farah; Pani, John R.

    2012-01-01

    This article reports large item effects in a study of computer-based learning of neuroanatomy. Outcome measures of the efficiency of learning, transfer of learning, and generalization of knowledge diverged by a wide margin across test items, with certain sets of items emerging as particularly difficult to master. In addition, the outcomes of comparisons between instructional methods changed with the difficulty of the items to be learned. More challenging items better differentiated between instructional methods. This set of results is important for two reasons. First, it suggests that instruction may be more efficient if sets of consistently difficult items are the targets of instructional methods particularly suited to them. Second, there is wide variation in the published literature regarding the outcomes of empirical evaluations of computer-based instruction. As a consequence, many questions arise as to the factors that may affect such evaluations. The present paper demonstrates that the level of challenge in the material that is presented to learners is an important factor to consider in the evaluation of a computer-based instructional system. PMID:22231801

  4. Item difficulty in the evaluation of computer-based instruction: an example from neuroanatomy.

    PubMed

    Chariker, Julia H; Naaz, Farah; Pani, John R

    2012-01-01

    This article reports large item effects in a study of computer-based learning of neuroanatomy. Outcome measures of the efficiency of learning, transfer of learning, and generalization of knowledge diverged by a wide margin across test items, with certain sets of items emerging as particularly difficult to master. In addition, the outcomes of comparisons between instructional methods changed with the difficulty of the items to be learned. More challenging items better differentiated between instructional methods. This set of results is important for two reasons. First, it suggests that instruction may be more efficient if sets of consistently difficult items are the targets of instructional methods particularly suited to them. Second, there is wide variation in the published literature regarding the outcomes of empirical evaluations of computer-based instruction. As a consequence, many questions arise as to the factors that may affect such evaluations. The present article demonstrates that the level of challenge in the material that is presented to learners is an important factor to consider in the evaluation of a computer-based instructional system. Copyright © 2011 American Association of Anatomists.

  5. The Stanford Leisure-Time Activity Categorical Item (L-Cat): A single categorical item sensitive to physical activity changes in overweight/obese women

    PubMed Central

    Kiernan, Michaela; Schoffman, Danielle E.; Lee, Katherine; Brown, Susan D.; Fair, Joan M.; Perri, Michael G.; Haskell, William L.

    2015-01-01

    Background Physical activity is essential for chronic disease prevention, yet <40% of overweight/obese adults meet national activity recommendations. For time-efficient counseling, clinicians need a brief easy-to-use tool that reliably and validly assesses a full range of activity levels, and most importantly, is sensitive to clinically meaningful changes in activity. The Stanford Leisure-Time Activity Categorical Item (L-Cat) is a single item comprised of six descriptive categories ranging from inactive to very active. This novel methodological approach assesses national activity recommendations as well as multiple clinically relevant categories below and above recommendations, and incorporates critical methodological principles that enhance psychometrics (reliability, validity, sensitivity to change). Methods We evaluated the L-Cat’s psychometrics among 267 overweight/obese women asked to meet national activity recommendations in a randomized behavioral weight-loss trial. Results The L-Cat had excellent test-retest reliability (κ=0.64, P<.001) and adequate concurrent criterion validity; each L-Cat category at 6 months was associated with 1059 more daily pedometer steps (95% CI 712–1407, β=0.38, P<.001) and 1.9% greater initial weight loss at 6 months (95% CI −2.4 to −1.3, β=−0.38, P<.001). Of interest, L-Cat categories differentiated from each other in a dose-response gradient for steps and weight loss (Ps<.05) with excellent face validity. The L-Cat was sensitive to change in response to the trial’s activity component. Women increased one L-Cat category at 6 months (M=1.0±1.4, P<.001); 55.8% met recommendations at 6 months whereas 20.6% did at baseline (P<.001). Even among women not meeting recommendations at both baseline and 6 months (n=106), women who moved ≥1 L-Cat categories at 6 months lost more weight than those who did not (M=−4.6%, 95% CI −6.7 to −2.5, P<.001). Conclusions Given strong psychometrics, the L-Cat has timely

  6. The Stanford Leisure-Time Activity Categorical Item (L-Cat): a single categorical item sensitive to physical activity changes in overweight/obese women.

    PubMed

    Kiernan, M; Schoffman, D E; Lee, K; Brown, S D; Fair, J M; Perri, M G; Haskell, W L

    2013-12-01

    Physical activity is essential for chronic disease prevention, yet <40% of overweight/obese adults meet the national activity recommendations. For time-efficient counseling, clinicians need a brief, easy-to-use tool that reliably and validly assesses a full range of activity levels, and, most importantly, is sensitive to clinically meaningful changes in activity. The Stanford Leisure-Time Activity Categorical Item (L-Cat) is a single item comprising six descriptive categories ranging from inactive to very active. This novel methodological approach assesses national activity recommendations as well as multiple clinically relevant categories below and above the recommendations, and incorporates critical methodological principles that enhance psychometrics (reliability, validity and sensitivity to change). We evaluated the L-Cat's psychometrics among 267 overweight/obese women who were asked to meet the national activity recommendations in a randomized behavioral weight-loss trial. The L-Cat had excellent test-retest reliability (κ=0.64, P<0.001) and adequate concurrent criterion validity; each L-Cat category at 6 months was associated with 1059 more daily pedometer steps (95% CI 712-1407, β=0.38, P<0.001) and 1.9% greater initial weight loss at 6 months (95% CI -2.4 to -1.3, β=-0.38, P<0.001). Of interest, L-Cat categories differentiated from each other in a dose-response gradient for steps and weight loss (Ps<0.05) with excellent face validity. The L-Cat was sensitive to change in response to the trial's activity component. Women increased one L-Cat category at 6 months (M=1.0±1.4, P<0.001); 55.8% met the recommendations at 6 months whereas 20.6% did at baseline (P<0.001). Even among women not meeting the recommendations at both baseline and 6 months (n=106), women who moved 1 L-Cat categories at 6 months lost more weight than those who did not (M=-4.6%, 95% CI -6.7 to -2.5, P<0.001). Given strong psychometrics, the L-Cat has timely potential for clinical

  7. Evaluating innovative items for the NCLEX, part I: usability and pilot testing.

    PubMed

    Wendt, Anne; Harmes, J Christine

    2009-01-01

    National Council of State Boards of Nursing (NCSBN) has recently conducted preliminary research on the feasibility of including various types of innovative test questions (items) on the NCLEX. This article focuses on the participants' reactions to and their strategies for interacting with various types of innovative items. Part 2 in the May/June issue will focus on the innovative item templates and evaluation of the statistical characteristics and the level of cognitive processing required to answer the examination items.

  8. Evaluating the healthiness of chain-restaurant menu items using crowdsourcing: a new method.

    PubMed

    Lesser, Lenard I; Wu, Leslie; Matthiessen, Timothy B; Luft, Harold S

    2017-01-01

    To develop a technology-based method for evaluating the nutritional quality of chain-restaurant menus to increase the efficiency and lower the cost of large-scale data analysis of food items. Using a Modified Nutrient Profiling Index (MNPI), we assessed chain-restaurant items from the MenuStat database with a process involving three steps: (i) testing 'extreme' scores; (ii) crowdsourcing to analyse fruit, nut and vegetable (FNV) amounts; and (iii) analysis of the ambiguous items by a registered dietitian. In applying the approach to assess 22 422 foods, only 3566 could not be scored automatically based on MenuStat data and required further evaluation to determine healthiness. Items for which there was low agreement between trusted crowd workers, or where the FNV amount was estimated to be >40 %, were sent to a registered dietitian. Crowdsourcing was able to evaluate 3199, leaving only 367 to be reviewed by the registered dietitian. Overall, 7 % of items were categorized as healthy. The healthiest category was soups (26 % healthy), while desserts were the least healthy (2 % healthy). An algorithm incorporating crowdsourcing and a dietitian can quickly and efficiently analyse restaurant menus, allowing public health researchers to analyse the healthiness of menu items.

  9. 48 CFR 52.212-2 - Evaluation-Commercial Items.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... compared to price.) (b) Options. The Government will evaluate offers for award purposes by adding the total... provision substantially as follows: Evaluation—Commercial Items (JAN 1999) (a) The Government will award a... solicitation will be most advantageous to the Government, price and other factors considered. The following...

  10. Evaluating construct validity of the second version of the Copenhagen Psychosocial Questionnaire through analysis of differential item functioning and differential item effect.

    PubMed

    Bjorner, Jakob Bue; Pejtersen, Jan Hyld

    2010-02-01

    To evaluate the construct validity of the Copenhagen Psychosocial Questionnaire II (COPSOQ II) by means of tests for differential item functioning (DIF) and differential item effect (DIE). We used a Danish general population postal survey (n = 4,732 with 3,517 wage earners) with a one-year register based follow up for long-term sickness absence. DIF was evaluated against age, gender, education, social class, public/private sector employment, and job type using ordinal logistic regression. DIE was evaluated against job satisfaction and self-rated health (using ordinal logistic regression), against depressive symptoms, burnout, and stress (using multiple linear regression), and against long-term sick leave (using a proportional hazards model). We used a cross-validation approach to counter the risk of significant results due to multiple testing. Out of 1,052 tests, we found 599 significant instances of DIF/DIE, 69 of which showed both practical and statistical significance across two independent samples. Most DIF occurred for job type (in 20 cases), while we found little DIF for age, gender, education, social class and sector. DIE seemed to pertain to particular items, which showed DIE in the same direction for several outcome variables. The results allowed a preliminary identification of items that have a positive impact on construct validity and items that have negative impact on construct validity. These results can be used to develop better shortform measures and to improve the conceptual framework, items and scales of the COPSOQ II. We conclude that tests of DIF and DIE are useful for evaluating construct validity.

  11. Selecting Items for a College Course Evaluation Form

    ERIC Educational Resources Information Center

    Baril, G. L.; Skaggs, C. Thomas

    1976-01-01

    The study describes the implementation of three related suggestions for the development of items for a course evaluation form. High degrees of consistency were evidenced in preferences of students, faculty, and individuals from different academic areas. Significant differences were obtained between student and faculty responses and the basic…

  12. Using the Item Response Theory (IRT) for Educational Evaluation through Games

    ERIC Educational Resources Information Center

    Euzébio Batista, Marcelo Henrique; Victória Barbosa, Jorge Luis; da Rosa Tavares, João Elison; Hackenhaar, Jonathan Luis

    2013-01-01

    This article shows the application of Item Response Theory (IRT) for educational evaluation using games. The article proposes a computational model to create user profiles, called Psychometric Profile Generator (PPG). PPG uses the IRT mathematical model for exploring the levels of skills and behaviors in the form of items and/or stimuli. The model…

  13. A signal detection-item response theory model for evaluating neuropsychological measures.

    PubMed

    Thomas, Michael L; Brown, Gregory G; Gur, Ruben C; Moore, Tyler M; Patt, Virginie M; Risbrough, Victoria B; Baker, Dewleen G

    2018-02-05

    Models from signal detection theory are commonly used to score neuropsychological test data, especially tests of recognition memory. Here we show that certain item response theory models can be formulated as signal detection theory models, thus linking two complementary but distinct methodologies. We then use the approach to evaluate the validity (construct representation) of commonly used research measures, demonstrate the impact of conditional error on neuropsychological outcomes, and evaluate measurement bias. Signal detection-item response theory (SD-IRT) models were fitted to recognition memory data for words, faces, and objects. The sample consisted of U.S. Infantry Marines and Navy Corpsmen participating in the Marine Resiliency Study. Data comprised item responses to the Penn Face Memory Test (PFMT; N = 1,338), Penn Word Memory Test (PWMT; N = 1,331), and Visual Object Learning Test (VOLT; N = 1,249), and self-report of past head injury with loss of consciousness. SD-IRT models adequately fitted recognition memory item data across all modalities. Error varied systematically with ability estimates, and distributions of residuals from the regression of memory discrimination onto self-report of past head injury were positively skewed towards regions of larger measurement error. Analyses of differential item functioning revealed little evidence of systematic bias by level of education. SD-IRT models benefit from the measurement rigor of item response theory-which permits the modeling of item difficulty and examinee ability-and from signal detection theory-which provides an interpretive framework encompassing the experimentally validated constructs of memory discrimination and response bias. We used this approach to validate the construct representation of commonly used research measures and to demonstrate how nonoptimized item parameters can lead to erroneous conclusions when interpreting neuropsychological test data. Future work might include the

  14. Differential Item Functioning Analysis of the 2003-04 NHANES Physical Activity Questionnaire

    ERIC Educational Resources Information Center

    Gao, Yong; Zhu, Weimo

    2011-01-01

    Using differential item functioning (DIF) analyses, this study examined whether there were any DIF items in the National Health and Nutrition Examination Survey (NHANES) physical activity (PA) questionnaire. A subset of adult data from the 2003-04 NHANES study (n = 3,083) was used. PA items related to respondents' occupational, transportation,…

  15. Evaluation of psychometric properties and differential item functioning of 8-item Child Perceptions Questionnaires using item response theory.

    PubMed

    Yau, David T W; Wong, May C M; Lam, K F; McGrath, Colman

    2015-08-19

    Four-factor structure of the two 8-item short forms of Child Perceptions Questionnaire CPQ11-14 (RSF:8 and ISF:8) has been confirmed. However, the sum scores are typically reported in practice as a proxy of Oral health-related Quality of Life (OHRQoL), which implied a unidimensional structure. This study first assessed the unidimensionality of 8-item short forms of CPQ11-14. Item response theory (IRT) was employed to offer an alternative and complementary approach of validation and to overcome the limitations of classical test theory assumptions. A random sample of 649 12-year-old school children in Hong Kong was analyzed. Unidimensionality of the scale was tested by confirmatory factor analysis (CFA), principle component analysis (PCA) and local dependency (LD) statistic. Graded response model was fitted to the data. Contribution of each item to the scale was assessed by item information function (IIF). Reliability of the scale was assessed by test information function (TIF). Differential item functioning (DIF) across gender was identified by Wald test and expected score functions. Both CPQ11-14 RSF:8 and ISF:8 did not deviate much from the unidimensionality assumption. Results from CFA indicated acceptable fit of the one-factor model. PCA indicated that the first principle component explained >30 % of the total variation with high factor loadings for both RSF:8 and ISF:8. Almost all LD statistic <10 indicated the absence of local dependency. Flat and low IIFs were observed in the oral symptoms items suggesting little contribution of information to the scale and item removal caused little practical impact. Comparing the TIFs, RSF:8 showed slightly better information than ISF:8. In addition to oral symptoms items, the item "Concerned with what other people think" demonstrated a uniform DIF (p < 0.001). The expected score functions were not much different between boys and girls. Items related to oral symptoms were not informative to OHRQoL and deletion of these

  16. Measuring the ICF components of impairment, activity limitation and participation restriction: an item analysis using classical test theory and item response theory

    PubMed Central

    Pollard, Beth; Dixon, Diane; Dieppe, Paul; Johnston, Marie

    2009-01-01

    Background The International Classification of Functioning, Disability and Health (ICF) proposes three main health outcomes, Impairment (I), Activity Limitation (A) and Participation Restriction (P), but good measures of these constructs are needed The aim of this study was to use both Classical Test Theory (CTT) and Item Response Theory (IRT) methods to carry out an item analysis to improve measurement of these three components in patients having joint replacement surgery mainly for osteoarthritis (OA). Methods A geographical cohort of patients about to undergo lower limb joint replacement was invited to participate. Five hundred and twenty four patients completed ICF items that had been previously identified as measuring only a single ICF construct in patients with osteoarthritis. There were 13 I, 26 A and 20 P items. The SF-36 was used to explore the construct validity of the resultant I, A and P measures. The CTT and IRT analyses were run separately to identify items for inclusion or exclusion in the measurement of each construct. The results from both analyses were compared and contrasted. Results Overall, the item analysis resulted in the removal of 4 I items, 9 A items and 11 P items. CTT and IRT identified the same 14 items for removal, with CTT additionally excluding 3 items, and IRT a further 7 items. In a preliminary exploration of reliability and validity, the new measures appeared acceptable. Conclusion New measures were developed that reflect the ICF components of Impairment, Activity Limitation and Participation Restriction for patients with advanced arthritis. The resulting Aberdeen IAP measures (Ab-IAP) comprising I (Ab-I, 9 items), A (Ab-A, 17 items), and P (Ab-P, 9 items) met the criteria of conventional psychometric (CTT) analyses and the additional criteria (information and discrimination) of IRT. The use of both methods was more informative than the use of only one of these methods. Thus combining CTT and IRT appears to be a valuable tool in

  17. Assessment of the Item Selection and Weighting in the Birmingham Vasculitis Activity Score for Wegener's Granulomatosis

    PubMed Central

    MAHR, ALFRED D.; NEOGI, TUHINA; LAVALLEY, MICHAEL P.; DAVIS, JOHN C.; HOFFMAN, GARY S.; MCCUNE, W. JOSEPH; SPECKS, ULRICH; SPIERA, ROBERT F.; ST.CLAIR, E. WILLIAM; STONE, JOHN H.; MERKEL, PETER A.

    2013-01-01

    Objective To assess the Birmingham Vasculitis Activity Score for Wegener's Granulomatosis (BVAS/WG) with respect to its selection and weighting of items. Methods This study used the BVAS/WG data from the Wegener's Granulomatosis Etanercept Trial. The scoring frequencies of the 34 predefined items and any “other” items added by clinicians were calculated. Using linear regression with generalized estimating equations in which the physician global assessment (PGA) of disease activity was the dependent variable, we computed weights for all predefined items. We also created variables for clinical manifestations frequently added as other items, and computed weights for these as well. We searched for the model that included the items and their generated weights yielding an activity score with the highest R2 to predict the PGA. Results We analyzed 2,044 BVAS/WG assessments from 180 patients; 734 assessments were scored during active disease. The highest R2 with the PGA was obtained by scoring WG activity based on the following items: the 25 predefined items rated on ≥5 visits, the 2 newly created fatigue and weight loss variables, the remaining minor other and major other items, and a variable that signified whether new or worse items were present at a specific visit. The weights assigned to the items ranged from 1 to 21. Compared with the original BVAS/WG, this modified score correlated significantly more strongly with the PGA. Conclusion This study suggests possibilities to enhance the item selection and weighting of the BVAS/WG. These changes may increase this instrument's ability to capture the continuum of disease activity in WG. PMID:18512722

  18. Evaluation of Northwest University, Kano Post-UTME Test Items Using Item Response Theory

    ERIC Educational Resources Information Center

    Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi

    2016-01-01

    High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…

  19. Item Bank Development for a Revised Pediatric Evaluation of Disability Inventory (PEDI)

    ERIC Educational Resources Information Center

    Dumas, Helene; Fragala-Pinkham, Maria; Haley, Stephen; Coster, Wendy; Kramer, Jessica; Kao, Ying-Chia; Moed, Richard

    2010-01-01

    The Pediatric Evaluation of Disability Inventory (PEDI) is a useful clinical and research assessment, but it has limitations in content, age range, and efficiency. The purpose of this article is to describe the development of the item bank for a new computer adaptive testing version of the PEDI (PEDI-CAT). An expanded item set and response options…

  20. Item validity vs. item discrimination index: a redundancy?

    NASA Astrophysics Data System (ADS)

    Panjaitan, R. L.; Irawati, R.; Sujana, A.; Hanifah, N.; Djuanda, D.

    2018-03-01

    In several literatures about evaluation and test analysis, it is common to find that there are calculations of item validity as well as item discrimination index (D) with different formula for each. Meanwhile, other resources said that item discrimination index could be obtained by calculating the correlation between the testee’s score in a particular item and the testee’s score on the overall test, which is actually the same concept as item validity. Some research reports, especially undergraduate theses tend to include both item validity and item discrimination index in the instrument analysis. It seems that these concepts might overlap for both reflect the test quality on measuring the examinees’ ability. In this paper, examples of some results of data processing on item validity and item discrimination index were compared. It would be discussed whether item validity and item discrimination index can be represented by one of them only or it should be better to present both calculations for simple test analysis, especially in undergraduate theses where test analyses were included.

  1. A Bifactor Multidimensional Item Response Theory Model for Differential Item Functioning Analysis on Testlet-Based Items

    ERIC Educational Resources Information Center

    Fukuhara, Hirotaka; Kamata, Akihito

    2011-01-01

    A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into…

  2. Psychometric evaluation of the PainCAS Interference with Daily Activities, Psychological/Emotional Distress, and Pain scales.

    PubMed

    McCaffrey, Stacey A; Black, Ryan A; Butler, Stephen F

    2018-03-01

    The PainCAS is a web-based clinical tool for assessing and tracking pain and opioid risk in chronic pain patients. Despite evidence for its utility within the clinical setting, the PainCAS scales have never been subject to psychometric evaluation. The current study is the first to evaluate the psychometric properties of the PainCAS Interference with Daily Activities, Psychological/Emotional Distress, and Pain scales. Patients (N = 4797) from treatment centers and hospitals in 16 different states completed the PainCAS as part of routine clinical assessment. A subsample (n = 73) from two hospital-based treatment centers also completed comparator measures. Rasch Rating Scale Models were employed to evaluate the Interference with Daily Activities and Psychological/Emotional Distress scales, and empirical evaluation included assessment of dimensionality, discrimination, item fit, reliability, information, and person-to-item targeting. Additionally, convergent and discriminant validity were evaluated through classical test theory approaches. Convergent validity of the Pain scales was evaluated through correlations with corresponding comparator items. One Interference with Daily Activities item was removed due to poor functioning and discrimination. The retained items from the Interference with Daily Activities and Psychological/Emotional Distress scales conformed to unidimensional Rasch measurement models, yielding satisfactory item fit, reliability, precision, and coverage. Further, results provided support for the convergent and discriminant validity of these two scales. Convergent validity between the PainCAS Pain and BPI Pain items was also strong. Taken together, results provide strong psychometric support for these PainCAS Pain scales. Strengths and limitations of the current study are discussed.

  3. An Evaluation of Three Approximate Item Response Theory Models for Equating Test Scores.

    ERIC Educational Resources Information Center

    Marco, Gary L.; And Others

    Three item response models were evaluated for estimating item parameters and equating test scores. The models, which approximated the traditional three-parameter model, included: (1) the Rasch one-parameter model, operationalized in the BICAL computer program; (2) an approximate three-parameter logistic model based on coarse group data divided…

  4. Item Selection, Evaluation, and Simple Structure in Personality Data

    PubMed Central

    Pettersson, Erik; Turkheimer, Eric

    2010-01-01

    We report an investigation of the genesis and interpretation of simple structure in personality data using two very different self-reported data sets. The first consists of a set of relatively unselected lexical descriptors, whereas the second is based on responses to a carefully constructed instrument. In both data sets, we explore the degree of simple structure by comparing factor solutions to solutions from simulated data constructed to have either strong or weak simple structure. The analysis demonstrates that there is little evidence of simple structure in the unselected items, and a moderate degree among the selected items. In both instruments, however, much of the simple structure that could be observed originated in a strong dimension of positive vs. negative evaluation. PMID:20694168

  5. Item usage in a multidimensional computerized adaptive test (MCAT) measuring health-related quality of life.

    PubMed

    Paap, Muirne C S; Kroeze, Karel A; Terwee, Caroline B; van der Palen, Job; Veldkamp, Bernard P

    2017-11-01

    Examining item usage is an important step in evaluating the performance of a computerized adaptive test (CAT). We study item usage for a newly developed multidimensional CAT which draws items from three PROMIS domains, as well as a disease-specific one. The multidimensional item bank used in the current study contained 194 items from four domains: the PROMIS domains fatigue, physical function, and ability to participate in social roles and activities, and a disease-specific domain (the COPD-SIB). The item bank was calibrated using the multidimensional graded response model and data of 795 patients with chronic obstructive pulmonary disease. To evaluate the item usage rates of all individual items in our item bank, CAT simulations were performed on responses generated based on a multivariate uniform distribution. The outcome variables included active bank size and item overuse (usage rate larger than the expected item usage rate). For average θ-values, the overall active bank size was 9-10%; this number quickly increased as θ-values became more extreme. For values of -2 and +2, the overall active bank size equaled 39-40%. There was 78% overlap between overused items and active bank size for average θ-values. For more extreme θ-values, the overused items made up a much smaller part of the active bank size: here the overlap was only 35%. Our results strengthen the claim that relatively short item banks may suffice when using polytomous items (and no content constraints/exposure control mechanisms), especially when using MCAT.

  6. Separating relational from item load effects in paired recognition: temporoparietal and middle frontal gyral activity with increased associates, but not items during encoding and retention.

    PubMed

    Phillips, Steven; Niki, Kazuhisa

    2002-10-01

    Working memory is affected by items stored and the relations between them. However, separating these factors has been difficult, because increased items usually accompany increased associations/relations. Hence, some have argued, relational effects are reducible to item effects. We overcome this problem by manipulating index length: the fewest number of item positions at which there is a unique item, or tuple of items (if length >1), for every instance in the relational (memory) set. Longer indexes imply greater similarity (number of shared items) between instances and higher load on encoding processes. Subjects were given lists of study pairs and asked to make a recognition judgement. The number of unique items and index length in the three list conditions were: (1) AB, CD: four/one; (2) AB, CD, EF: six/one; and (3) AB, AD, CB: four/two, respectively. Japanese letters were used in Experiments 1 (kanji-ideograms) and 2 (hiragana-phonograms); numbers in Experiment 3; and shapes generated from Fourier descriptors in Experiment 4. Across all materials, right dominant temporoparietal and middle frontal gyral activity was found with increased index length, but not items during study. In Experiment 5, a longer delay was used to isolate retention effects in the absence of visual stimuli. Increased left hemispheric activity was observed in the precuneus, middle frontal gyrus, and superior temporal gyrus with increased index length for the delay period. These results show that relational load is not reducible to item load.

  7. Linking Existing Instruments to Develop an Activity of Daily Living Item Bank.

    PubMed

    Li, Chih-Ying; Romero, Sergio; Bonilha, Heather S; Simpson, Kit N; Simpson, Annie N; Hong, Ickpyo; Velozo, Craig A

    2018-03-01

    This study examined dimensionality and item-level psychometric properties of an item bank measuring activities of daily living (ADL) across inpatient rehabilitation facilities and community living centers. Common person equating method was used in the retrospective veterans data set. This study examined dimensionality, model fit, local independence, and monotonicity using factor analyses and fit statistics, principal component analysis (PCA), and differential item functioning (DIF) using Rasch analysis. Following the elimination of invalid data, 371 veterans who completed both the Functional Independence Measure (FIM) and minimum data set (MDS) within 6 days were retained. The FIM-MDS item bank demonstrated good internal consistency (Cronbach's α = .98) and met three rating scale diagnostic criteria and three of the four model fit statistics (comparative fit index/Tucker-Lewis index = 0.98, root mean square error of approximation = 0.14, and standardized root mean residual = 0.07). PCA of Rasch residuals showed the item bank explained 94.2% variance. The item bank covered the range of θ from -1.50 to 1.26 (item), -3.57 to 4.21 (person) with person strata of 6.3. The findings indicated the ADL physical function item bank constructed from FIM and MDS measured a single latent trait with overall acceptable item-level psychometric properties, suggesting that it is an appropriate source for developing efficient test forms such as short forms and computerized adaptive tests.

  8. Validity and measurement precision of the PROMIS physical function item bank and a content validity-driven 20-item short form in rheumatoid arthritis compared with traditional measures.

    PubMed

    Oude Voshaar, Martijn A H; Ten Klooster, Peter M; Glas, Cees A W; Vonkeman, Harald E; Taal, Erik; Krishnan, Eswar; Bernelot Moens, Hein J; Boers, Maarten; Terwee, Caroline B; van Riel, Piet L C M; van de Laar, Mart A F J

    2015-12-01

    To evaluate the content validity and measurement properties of the Patient-Reported Outcome Measurement Information System (PROMIS) physical function item bank and a 20-item short form in patients with RA in comparison with the HAQ disability index (HAQ-DI) and 36-item Short Form Health Survey (SF-36) physical functioning scale (PF-10). The content validity of the instruments was evaluated by linking their items to the International Classification of Functioning, Disability and Health (ICF) core set for RA. The measures were administered to 690 RA patients enrolled in the Dutch Rheumatoid Arthritis Monitoring registry. Measurement precision was evaluated using item response theory methods and construct validity was evaluated by correlating physical function scores with other clinical and patient-reported outcome measures. All 207 health concepts identified in the physical function measures referred to activities that are featured in the ICF. Twenty-three of 26 ICF RA core set domains are featured in the full PROMIS physical function item bank compared with 13 and 8 for the HAQ-DI and PF-10, respectively. As hypothesized, all three physical function instruments were highly intercorrelated (r 0.74-0.84), moderately correlated with disease activity measures (r 0.44-0.63) and weakly correlated with age (rs 0.07-0.14). Item response theory-based analysis revealed that a 20-item PROMIS physical function short form covered a wider range of physical function levels than the HAQ-DI or PF-10. The PROMIS physical function item bank demonstrated excellent measurement properties in RA. A content-driven 20-item short form may be a useful tool for assessing physical function in RA. © The Author 2015. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  9. Calibration of context-specific survey items to assess youth physical activity behaviour.

    PubMed

    Saint-Maurice, Pedro F; Welk, Gregory J; Bartee, R Todd; Heelan, Kate

    2017-05-01

    This study tests calibration models to re-scale context-specific physical activity (PA) items to accelerometer-derived PA. A total of 195 4th-12th grades children wore an Actigraph monitor and completed the Physical Activity Questionnaire (PAQ) one week later. The relative time spent in moderate-to-vigorous PA (MVPA % ) obtained from the Actigraph at recess, PE, lunch, after-school, evening and weekend was matched with a respective item score obtained from the PAQ's. Item scores from 145 participants were calibrated against objective MVPA % using multiple linear regression with age, and sex as additional predictors. Predicted minutes of MVPA for school, out-of-school and total week were tested in the remaining sample (n = 50) using equivalence testing. The results showed that PAQ β-weights ranged from 0.06 (lunch) to 4.94 (PE) MVPA % (P < 0.05) and models root mean square error ranged from 4.2% (evening) to 20.2% (recess). When applied to an independent sample, differences between PAQ and accelerometer MVPA at school and out-of-school ranged from -15.6 to +3.8 min and the PAQ was within 10-15% of accelerometer measured activity. This study demonstrated that context-specific items can be calibrated to predict minutes of MVPA in groups of youth during in- and out-of-school periods.

  10. Evaluation of five guidelines for option development in multiple-choice item-writing.

    PubMed

    Martínez, Rafael J; Moreno, Rafael; Martín, Irene; Trigo, M Eva

    2009-05-01

    This paper evaluates certain guidelines for writing multiple-choice test items. The analysis of the responses of 5013 subjects to 630 items from 21 university classroom achievement tests suggests that an option should not differ in terms of heterogeneous content because such error has a slight but harmful effect on item discrimination. This also occurs with the "None of the above" option when it is the correct one. In contrast, results do not show the supposedly negative effects of a different-length option, the use of specific determiners, or the use of the "All of the above" option, which not only decreases difficulty but also improves discrimination when it is the correct option.

  11. Explanation and elaboration of the Standards for UNiversal reporting of patient Decision Aid Evaluations (SUNDAE) guidelines: examples of reporting SUNDAE items from patient decision aid evaluation literature

    PubMed Central

    Hoffman, Aubri S; Abhyankar, Purva; Sheridan, Stacey; Bekker, Hilary; LeBlanc, Annie; Levin, Carrie; Ropka, Mary; Shaffer, Victoria; Stacey, Dawn; Stalmeier, Peep; Vo, Ha; Wills, Celia; Thomson, Richard

    2018-01-01

    This Explanation and Elaboration (E&E) article expands on the 26 items in the Standards for UNiversal reporting of Decision Aid Evaluations guidelines. The E&E provides a rationale for each item and includes examples for how each item has been reported in published papers evaluating patient decision aids. The E&E focuses on items key to reporting studies evaluating patient decision aids and is intended to be illustrative rather than restrictive. Authors and reviewers may wish to use the E&E broadly to inform structuring of patient decision aid evaluation reports, or use it as a reference to obtain details about how to report individual checklist items. PMID:29467235

  12. School Self-Evaluation Instruments and Cognitive Validity. Do Items Capture What They Intend to?

    ERIC Educational Resources Information Center

    Faddar, Jerich; Vanhoof, Jan; De Maeyer, Sven

    2017-01-01

    School self-evaluation (SSE) often makes use of questionnaires in order to sketch a picture of the school. How respondents cognitively process questionnaire items determines the validity of SSE results. Still, one readily assumes that respondents interpret and answer items as intended by the instrument developer (referred to as cognitive…

  13. 17 CFR 229.1205 - (Item 1205) Drilling and other exploratory and development activities.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 17 Commodity and Securities Exchanges 2 2011-04-01 2011-04-01 false (Item 1205) Drilling and other... Registrants Engaged in Oil and Gas Producing Activities § 229.1205 (Item 1205) Drilling and other exploratory..., disclose: (1) The number of net productive and dry exploratory wells drilled; and (2) The number of net...

  14. 17 CFR 229.1205 - (Item 1205) Drilling and other exploratory and development activities.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... Registrants Engaged in Oil and Gas Producing Activities § 229.1205 (Item 1205) Drilling and other exploratory... 17 Commodity and Securities Exchanges 3 2014-04-01 2014-04-01 false (Item 1205) Drilling and other..., disclose: (1) The number of net productive and dry exploratory wells drilled; and (2) The number of net...

  15. 17 CFR 229.1205 - (Item 1205) Drilling and other exploratory and development activities.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... Registrants Engaged in Oil and Gas Producing Activities § 229.1205 (Item 1205) Drilling and other exploratory... 17 Commodity and Securities Exchanges 2 2013-04-01 2013-04-01 false (Item 1205) Drilling and other..., disclose: (1) The number of net productive and dry exploratory wells drilled; and (2) The number of net...

  16. 17 CFR 229.1205 - (Item 1205) Drilling and other exploratory and development activities.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... Registrants Engaged in Oil and Gas Producing Activities § 229.1205 (Item 1205) Drilling and other exploratory... 17 Commodity and Securities Exchanges 2 2012-04-01 2012-04-01 false (Item 1205) Drilling and other..., disclose: (1) The number of net productive and dry exploratory wells drilled; and (2) The number of net...

  17. [Research about re-evaluation of screening of traditonal Chinese medicine symptoms item of post-marketing medicine Xuezhikang].

    PubMed

    He, Wei; Xie, Yanming; Wang, Yongyan

    2011-10-01

    The purpose of post-marketing Chinese medicine re-evaluation is to identify Chinese medicine clinical indications, while designing scientific and rational of Chinese medicine symptoms items are important to the result of symptoms re-evaluation. This study give screening of traditional Chinese medicine(TCM) symptoms item of post-marketing medicine Xuezhikang re-evaluation as example that reference to principle dyslipidemia clinical research, academic dissertations, Xuezhikang directions, clinical expert practice experience etc. while standardization those symptom names and screening 41 dyslipidemia common symptoms. Furthermore, this paper discuss about the accoerdance and announcements when screening symptoms item, so as to providing a research thread to manufacture PRO chart for post-marketing medicine re-evaluation.

  18. A Process for Reviewing and Evaluating Generated Test Items

    ERIC Educational Resources Information Center

    Gierl, Mark J.; Lai, Hollis

    2016-01-01

    Testing organization needs large numbers of high-quality items due to the proliferation of alternative test administration methods and modern test designs. But the current demand for items far exceeds the supply. Test items, as they are currently written, evoke a process that is both time-consuming and expensive because each item is written,…

  19. Mutagenic activity of south Indian food items.

    PubMed

    Sivaswamy, S N; Balachandran, B; Balanehru, S; Sivaramakrishnan, V M

    1991-08-01

    Dietary components and food dishes commonly consumed in South India were screened for their mutagenic activity. Kesari powder, calamus oil, palm drink, toddy and Kewra essence were found to be strongly mutagenic; garlic, palm oil, arrack, onion and pyrolysed portions of bread toast, chicory powder were weakly mutagenic, while tamarind and turmeric were not. Certain salted, sundried and oil fried food items were also mutagenic. Cissus quadrangularis was mutagenic, while 'decoctions' of cumin seeds, aniseeds and ginger were not. Several perfumes, essential oils and colouring agents, which are commonly used were also screened and many of them exhibited their mutagenic potential by inducing the 'reverse mutation' in Salmonella typhimurium tester strains.

  20. Item response modeling: A psychometric assessment of the children's fruit, vegetable, water, and physical activity self-efficacy scales among Chinese children

    USDA-ARS?s Scientific Manuscript database

    This study aimed to evaluate the psychometric properties of four self-efficacy scales (i.e., self-efficacy for fruit (FSE), vegetable (VSE), and water (WSE) intakes, and physical activity (PASE)) and to investigate their differences in item functioning across sex, age, and body weight status groups ...

  1. Evaluation of doctoral nursing programs in Japan by faculty members and their educational and research activities.

    PubMed

    Arimoto, Azusa; Gregg, Misuzu F; Nagata, Satoko; Miki, Yuko; Murashima, Sachiyo

    2012-07-01

    Evaluation of doctoral programs in nursing is becoming more important with the rapid increase in the programs in Japan. This study aimed to evaluate doctoral nursing programs by faculty members and to analyze the relationship of the evaluation with educational and research activities of faculty members in Japan. Target settings were all 46 doctoral nursing programs. Eighty-five faculty members from 28 programs answered the questionnaire, which included 17 items for program evaluation, 12 items for faculty evaluation, 9 items for resource evaluation, 3 items for overall evaluations, and educational and research activities. A majority gave low evaluations for sources of funding, the number of faculty members and support staff, and administrative systems. Faculty members who financially supported a greater number of students gave a higher evaluation for extramural funding support, publication, provision of diverse learning experiences, time of supervision, and research infrastructure. The more time a faculty member spent on advising doctoral students, the higher were their evaluations on the supportive learning environment, administrative systems, time of supervision, and timely feedback on students' research. The findings of this study indicate a need for improvement in research infrastructure, funding sources, and human resources to achieve quality nursing doctoral education in Japan. Copyright © 2011 Elsevier Ltd. All rights reserved.

  2. Using Differential Item Functioning Procedures to Explore Sources of Item Difficulty and Group Performance Characteristics.

    ERIC Educational Resources Information Center

    Scheuneman, Janice Dowd; Gerritz, Kalle

    1990-01-01

    Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)

  3. A preliminary psychometric evaluation of the eight-item cognitive load scale.

    PubMed

    Pignatiello, Grant A; Tsivitse, Emily; Hickman, Ronald L

    2018-04-01

    The aim of this article is to report the psychometric properties of the eight-item cognitive load scale. According to cognitive load theory, the formatting and delivery of healthcare education influences the degree to which patients and/or family members can engage their working memory systems for learning. However, despite its relevance, cognitive load has not yet been evaluated among surrogate decision makers exposed to electronic decision support for healthcare decisions. To date, no psychometric analyses of instruments evaluating cognitive load have been reported within healthcare settings. A convenience sample of 62 surrogate decision makers for critically ill patients were exposed to one of two healthcare decision support interventions were recruited from four intensive care units at a tertiary medical center in Northeast Ohio. Participants were administered a battery of psychosocial instruments and the eight-item cognitive load scale (CLS). The CLS demonstrated a bidimensional factor structure with acceptable discriminant validity and internal consistency reliability (Cronbach's α = 0.75 and 0.89). The CLS is a psychometrically sound instrument that may be used in the evaluation of decision support among surrogate decision makers of the critically ill. The authors recommend application of the cognitive load scale in the evaluation and development of healthcare education and interventions. Copyright © 2018 Elsevier Inc. All rights reserved.

  4. Construct Validity Evidence for Single-Response Items to Estimate Physical Activity Levels in Large Sample Studies

    ERIC Educational Resources Information Center

    Jackson, Allen W.; Morrow, James R., Jr.; Bowles, Heather R.; FitzGerald, Shannon J.; Blair, Steven N.

    2007-01-01

    Valid measurement of physical activity is important for studying the risks for morbidity and mortality. The purpose of this study was to examine evidence of construct validity of two similar single-response items assessing physical activity via self-report. Both items are based on the stages of change model. The sample was 687 participants (men =…

  5. Ramsay-Curve Item Response Theory for the Three-Parameter Logistic Item Response Model

    ERIC Educational Resources Information Center

    Woods, Carol M.

    2008-01-01

    In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters of a unidimensional item response model using marginal maximum likelihood estimation. This study evaluates RC-IRT for the three-parameter logistic (3PL) model with comparisons to the normal model and to the empirical…

  6. Age-related Differential Item Functioning for the Patient-Reported Outcomes Information System (PROMIS®) Physical Functioning Items.

    PubMed

    Paz, Sylvia H; Spritzer, Karen L; Morales, Leo S; Hays, Ron D

    2013-03-29

    To evaluate the equivalence of the PROMIS® wave 1 physical functioning item bank, by age (50 years or older versus 18-49). A total of 114 physical functioning items with 5 response choices were administered to English- (n=1504) and Spanish-language (n=640) adults. Item frequencies, means and standard deviations, item-scale correlations, and internal consistency reliability were estimated. Differential Item Functioning (DIF) by age was evaluated. Thirty of the 114 items were fagged for DIF based on an R-squared of 0.02 or above criterion. The expected total score was higher for those respondents who were 18-49 than those who were 50 or older. Those who were 50 years or older versus 18-49 years old with the same level of physical functioning responded differently to 30 of the 114 items in the PROMIS® physical functioning item bank. This study yields essential information about the equivalence of the physical functioning items in older versus younger individuals.

  7. Assessing the Item Response Theory with Covariate (IRT-C) Procedure for Ascertaining Differential Item Functioning

    ERIC Educational Resources Information Center

    Tay, Louis; Vermunt, Jeroen K.; Wang, Chun

    2013-01-01

    We evaluate the item response theory with covariates (IRT-C) procedure for assessing differential item functioning (DIF) without preknowledge of anchor items (Tay, Newman, & Vermunt, 2011). This procedure begins with a fully constrained baseline model, and candidate items are tested for uniform and/or nonuniform DIF using the Wald statistic.…

  8. Item response modeling: a psychometric assessment of the children's fruit, vegetable, water, and physical activity self-efficacy scales among Chinese children.

    PubMed

    Wang, Jing-Jing; Chen, Tzu-An; Baranowski, Tom; Lau, Patrick W C

    2017-09-16

    This study aimed to evaluate the psychometric properties of four self-efficacy scales (i.e., self-efficacy for fruit (FSE), vegetable (VSE), and water (WSE) intakes, and physical activity (PASE)) and to investigate their differences in item functioning across sex, age, and body weight status groups using item response modeling (IRM) and differential item functioning (DIF). Four self-efficacy scales were administrated to 763 Hong Kong Chinese children (55.2% boys) aged 8-13 years. Classical test theory (CTT) was used to examine the reliability and factorial validity of scales. IRM was conducted and DIF analyses were performed to assess the characteristics of item parameter estimates on the basis of children's sex, age and body weight status. All self-efficacy scales demonstrated adequate to excellent internal consistency reliability (Cronbach's α: 0.79-0.91). One FSE misfit item and one PASE misfit item were detected. Small DIF were found for all the scale items across children's age groups. Items with medium to large DIF were detected in different sex and body weight status groups, which will require modification. A Wright map revealed that items covered the range of the distribution of participants' self-efficacy for each scale except VSE. Several self-efficacy scales' items functioned differently by children's sex and body weight status. Additional research is required to modify the four self-efficacy scales to minimize these moderating influences for application.

  9. Using the Bayes Factors to Evaluate Person Fit in the Item Response Theory

    ERIC Educational Resources Information Center

    Pan, Tianshu; Yin, Yue

    2017-01-01

    In this article, we propose using the Bayes factors (BF) to evaluate person fit in item response theory models under the framework of Bayesian evaluation of an informative diagnostic hypothesis. We first discuss the theoretical foundation for this application and how to analyze person fit using BF. To demonstrate the feasibility of this approach,…

  10. Differential item functioning of the patient-reported outcomes information system (PROMIS®) pain interference item bank by language (Spanish versus English).

    PubMed

    Paz, Sylvia H; Spritzer, Karen L; Reise, Steven P; Hays, Ron D

    2017-06-01

    About 70% of Latinos, 5 years old or older, in the United States speak Spanish at home. Measurement equivalence of the PROMIS ® pain interference (PI) item bank by language of administration (English versus Spanish) has not been evaluated. A sample of 527 adult Spanish-speaking Latinos completed the Spanish version of the 41-item PROMIS ® pain interference item bank. We evaluate dimensionality, monotonicity and local independence of the Spanish-language items. Then we evaluate differential item functioning (DIF) using ordinal logistic regression with item response theory scores estimated from DIF-free "anchor" items. One of the 41 items in the Spanish version of the PROMIS ® PI item bank was identified as having significant uniform DIF. English- and Spanish-speaking subjects with the same level of pain interference responded differently to 1 of the 41 items in the PROMIS ® PI item bank. This item was not retained due to proprietary issues. The original English language item parameters can be used when estimating PROMIS ® PI scores.

  11. Rasch analysis of the Pediatric Evaluation of Disability Inventory-computer adaptive test (PEDI-CAT) item bank for children and young adults with spinal muscular atrophy.

    PubMed

    Pasternak, Amy; Sideridis, Georgios; Fragala-Pinkham, Maria; Glanzman, Allan M; Montes, Jacqueline; Dunaway, Sally; Salazar, Rachel; Quigley, Janet; Pandya, Shree; O'Riley, Susan; Greenwood, Jonathan; Chiriboga, Claudia; Finkel, Richard; Tennekoon, Gihan; Martens, William B; McDermott, Michael P; Fournier, Heather Szelag; Madabusi, Lavanya; Harrington, Timothy; Cruz, Rosangel E; LaMarca, Nicole M; Videon, Nancy M; Vivo, Darryl C De; Darras, Basil T

    2016-12-01

    In this study we evaluated the suitability of a caregiver-reported functional measure, the Pediatric Evaluation of Disability Inventory-Computer Adaptive Test (PEDI-CAT), for children and young adults with spinal muscular atrophy (SMA). PEDI-CAT Mobility and Daily Activities domain item banks were administered to 58 caregivers of children and young adults with SMA. Rasch analysis was used to evaluate test properties across SMA types. Unidimensional content for each domain was confirmed. The PEDI-CAT was most informative for type III SMA, with ability levels distributed close to 0.0 logits in both domains. It was less informative for types I and II SMA, especially for mobility skills. Item and person abilities were not distributed evenly across all types. The PEDI-CAT may be used to measure functional performance in SMA, but additional items are needed to identify small changes in function and best represent the abilities of all types of SMA. Muscle Nerve 54: 1097-1107, 2016. © 2016 Wiley Periodicals, Inc.

  12. Evaluating the Psychometric Characteristics of Generated Multiple-Choice Test Items

    ERIC Educational Resources Information Center

    Gierl, Mark J.; Lai, Hollis; Pugh, Debra; Touchie, Claire; Boulais, André-Philippe; De Champlain, André

    2016-01-01

    Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric…

  13. Assessment of the psychometrics of a PROMIS item bank: self-efficacy for managing daily activities

    PubMed Central

    Hong, Ickpyo; Li, Chih-Ying; Romero, Sergio; Gruber-Baldini, Ann L.; Shulman, Lisa M.

    2017-01-01

    Purpose The aim of this study is to investigate the psychometrics of the Patient-Reported Outcomes Measurement Information System self-efficacy for managing daily activities item bank. Methods The item pool was field tested on a sample of 1087 participants via internet (n = 250) and in-clinic (n = 837) surveys. All participants reported having at least one chronic health condition. The 35 item pool was investigated for dimensionality (confirmatory factor analyses, CFA and exploratory factor analysis, EFA), item-total correlations, local independence, precision, and differential item functioning (DIF) across gender, race, ethnicity, age groups, data collection modes, and neurological chronic conditions (McFadden Pseudo R2 less than 10 %). Results The item pool met two of the four CFA fit criteria (CFI = 0.952 and SRMR = 0.07). EFA analysis found a dominant first factor (eigenvalue = 24.34) and the ratio of first to second eigenvalue was 12.4. The item pool demonstrated good item-total correlations (0.59–0.85) and acceptable internal consistency (Cronbach’s alpha = 0.97). The item pool maintained its precision (reliability over 0.90) across a wide range of theta (3.70), and there was no significant DIF. Conclusion The findings indicated the item pool has sound psychometric properties and the test items are eligible for development of computerized adaptive testing and short forms. PMID:27048495

  14. Assessment of the psychometrics of a PROMIS item bank: self-efficacy for managing daily activities.

    PubMed

    Hong, Ickpyo; Velozo, Craig A; Li, Chih-Ying; Romero, Sergio; Gruber-Baldini, Ann L; Shulman, Lisa M

    2016-09-01

    The aim of this study is to investigate the psychometrics of the Patient-Reported Outcomes Measurement Information System self-efficacy for managing daily activities item bank. The item pool was field tested on a sample of 1087 participants via internet (n = 250) and in-clinic (n = 837) surveys. All participants reported having at least one chronic health condition. The 35 item pool was investigated for dimensionality (confirmatory factor analyses, CFA and exploratory factor analysis, EFA), item-total correlations, local independence, precision, and differential item functioning (DIF) across gender, race, ethnicity, age groups, data collection modes, and neurological chronic conditions (McFadden Pseudo R (2) less than 10 %). The item pool met two of the four CFA fit criteria (CFI = 0.952 and SRMR = 0.07). EFA analysis found a dominant first factor (eigenvalue = 24.34) and the ratio of first to second eigenvalue was 12.4. The item pool demonstrated good item-total correlations (0.59-0.85) and acceptable internal consistency (Cronbach's alpha = 0.97). The item pool maintained its precision (reliability over 0.90) across a wide range of theta (3.70), and there was no significant DIF. The findings indicated the item pool has sound psychometric properties and the test items are eligible for development of computerized adaptive testing and short forms.

  15. Impact of IRT item misfit on score estimates and severity classifications: an examination of PROMIS depression and pain interference item banks.

    PubMed

    Zhao, Yue

    2017-03-01

    In patient-reported outcome research that utilizes item response theory (IRT), using statistical significance tests to detect misfit is usually the focus of IRT model-data fit evaluations. However, such evaluations rarely address the impact/consequence of using misfitting items on the intended clinical applications. This study was designed to evaluate the impact of IRT item misfit on score estimates and severity classifications and to demonstrate a recommended process of model-fit evaluation. Using secondary data sources collected from the Patient-Reported Outcome Measurement Information System (PROMIS) wave 1 testing phase, analyses were conducted based on PROMIS depression (28 items; 782 cases) and pain interference (41 items; 845 cases) item banks. The identification of misfitting items was assessed using Orlando and Thissen's summed-score item-fit statistics and graphical displays. The impact of misfit was evaluated according to the agreement of both IRT-derived T-scores and severity classifications between inclusion and exclusion of misfitting items. The examination of the presence and impact of misfit suggested that item misfit had a negligible impact on the T-score estimates and severity classifications with the general population sample in the PROMIS depression and pain interference item banks, implying that the impact of item misfit was insignificant. Findings support the T-score estimates in the two item banks as robust against item misfit at both the group and individual levels and add confidence to the use of T-scores for severity diagnosis in the studied sample. Recommendations on approaches for identifying item misfit (statistical significance) and assessing the misfit impact (practical significance) are given.

  16. Development and assessment of floor and ceiling items for the PROMIS physical function item bank

    PubMed Central

    2013-01-01

    Introduction Disability and Physical Function (PF) outcome assessment has had limited ability to measure functional status at the floor (very poor functional abilities) or the ceiling (very high functional abilities). We sought to identify, develop and evaluate new floor and ceiling items to enable broader and more precise assessment of PF outcomes for the NIH Patient-Reported-Outcomes Measurement Information System (PROMIS). Methods We conducted two cross-sectional studies using NIH PROMIS item improvement protocols with expert review, participant survey and focus group methods. In Study 1, respondents with low PF abilities evaluated new floor items, and those with high PF abilities evaluated new ceiling items for clarity, importance and relevance. In Study 2, we compared difficulty ratings of new floor items by low functioning respondents and ceiling items by high functioning respondents to reference PROMIS PF-10 items. We used frequencies, percentages, means and standard deviations to analyze the data. Results In Study 1, low (n = 84) and high (n = 90) functioning respondents were mostly White, women, 70 years old, with some college, and disability scores of 0.62 and 0.30. More than 90% of the 31 new floor and 31 new ceiling items were rated as clear, important and relevant, leaving 26 ceiling and 30 floor items for Study 2. Low (n = 246) and high (n = 637) functioning Study 2 respondents were mostly White, women, 70 years old, with some college, and Health Assessment Questionnaire (HAQ) scores of 1.62 and 0.003. Compared to difficulty ratings of reference items, ceiling items were rated to be 10% more to greater than 40% more difficult to do, and floor items were rated to be about 12% to nearly 90% less difficult to do. Conclusions These new floor and ceiling items considerably extend the measurable range of physical function at either extreme. They will help improve instrument performance in populations with broad functional ranges and those concentrated at

  17. Item development process and analysis of 50 case-based items for implementation on the Korean Nursing Licensing Examination.

    PubMed

    Park, In Sook; Suh, Yeon Ok; Park, Hae Sook; Kang, So Young; Kim, Kwang Sung; Kim, Gyung Hee; Choi, Yeon-Hee; Kim, Hyun-Ju

    2017-01-01

    The purpose of this study was to improve the quality of items on the Korean Nursing Licensing Examination by developing and evaluating case-based items that reflect integrated nursing knowledge. We conducted a cross-sectional observational study to develop new case-based items. The methods for developing test items included expert workshops, brainstorming, and verification of content validity. After a mock examination of undergraduate nursing students using the newly developed case-based items, we evaluated the appropriateness of the items through classical test theory and item response theory. A total of 50 case-based items were developed for the mock examination, and content validity was evaluated. The question items integrated 34 discrete elements of integrated nursing knowledge. The mock examination was taken by 741 baccalaureate students in their fourth year of study at 13 universities. Their average score on the mock examination was 57.4, and the examination showed a reliability of 0.40. According to classical test theory, the average level of item difficulty of the items was 57.4% (80%-100% for 12 items; 60%-80% for 13 items; and less than 60% for 25 items). The mean discrimination index was 0.19, and was above 0.30 for 11 items and 0.20 to 0.29 for 15 items. According to item response theory, the item discrimination parameter (in the logistic model) was none for 10 items (0.00), very low for 20 items (0.01 to 0.34), low for 12 items (0.35 to 0.64), moderate for 6 items (0.65 to 1.34), high for 1 item (1.35 to 1.69), and very high for 1 item (above 1.70). The item difficulty was very easy for 24 items (below -2.0), easy for 8 items (-2.0 to -0.5), medium for 6 items (-0.5 to 0.5), hard for 3 items (0.5 to 2.0), and very hard for 9 items (2.0 or above). The goodness-of-fit test in terms of the 2-parameter item response model between the range of 2.0 to 0.5 revealed that 12 items had an ideal correct answer rate. We surmised that the low reliability of the

  18. Modeling of Word Translation: Activation Flow from Concepts to Lexical Items

    ERIC Educational Resources Information Center

    Roelofs, Ardi; Dijkstra, Ton; Gerakaki, Svetlana

    2013-01-01

    Whereas most theoretical and computational models assume a continuous flow of activation from concepts to lexical items in spoken word production, one prominent model assumes that the mapping of concepts onto words happens in a discrete fashion (Bloem & La Heij, 2003). Semantic facilitation of context pictures on word translation has been taken to…

  19. Development and Evaluation of the PROMIS® Pediatric Positive Affect Item Bank, Child-Report and Parent-Proxy Editions.

    PubMed

    Forrest, Christopher B; Ravens-Sieberer, Ulrike; Devine, Janine; Becker, Brandon D; Teneralli, Rachel; Moon, JeanHee; Carle, Adam; Tucker, Carole A; Bevans, Katherine B

    2018-03-01

    The purpose of this study is to describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Positive Affect item bank, child-report and parent-proxy editions. The initial item pool comprising 53 items, previously developed using qualitative methods, was administered to 1,874 children 8-17 years old and 909 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and construct validity. A total of 14 items were deleted, because of poor psychometric performance, and an 8-item short form constructed from the remaining 39 items was administered to a national sample of 1,004 children 8-17 years old, and 1,306 parents of children 5-17 years old. The combined sample was used in item response theory (IRT) calibration analyses. The final item bank appeared unidimensional, the items appeared locally independent, and the items were free from differential item functioning. The scales showed excellent reliability and convergent and discriminant validity. Positive affect decreased with children's age and was lower for those with a special health care need. After IRT calibration, we found that 4 and 8 item short forms had a high degree of precision (reliability) across a wide range of the latent trait (>4 SD units). The PROMIS Pediatric Positive Affect item bank and its short forms provide an efficient, precise, and valid assessment of positive affect in children and youth.

  20. Qualitative Development and Content Validation of the PROMIS Pediatric Sleep Health Items.

    PubMed

    Bevans, Katherine B; Meltzer, Lisa J; De La Motte, Anna; Kratchman, Amy; Viél, Dominique; Forrest, Christopher B

    2018-04-25

    To develop the Patient Reported Outcome Measurement Information System (PROMIS) Pediatric Sleep Health item pool and evaluate its content validity. Participants included 8 expert sleep clinician-researchers, 64 children ages 8-17 years, and 54 parents of children ages 5-17 years. We started with item concepts and expressions from the PROMIS Sleep Disturbance and Sleep Related Impairment adult measures. Additional pediatric sleep health concepts were generated by expert (n = 8), child (n = 28), and parent (n = 33) concept elicitation interviews and a systematic review of existing pediatric sleep health questionnaires. Content validity of the item pool was evaluated with item translatability review, readability analysis, and child (n = 36) and parent (n = 21) cognitive interviews. The final pediatric Sleep Health item pool includes 43 items that assess sleep disturbance (children's capacity to fall and stay asleep, sleep quality, dreams, and parasomnias) and sleep-related impairments (daytime sleepiness, low energy, difficulty waking up, and the impact of sleep and sleepiness on cognition, affect, behavior, and daily activities). Items are translatable and relevant and well understood by children ages 8-17 and parents of children ages 5-17. Rigorous qualitative procedures were used to develop and evaluate the content validity of the PROMIS Pediatric Sleep Health item pool. Once the item pool's psychometric properties are established, the scales will be useful for measuring children's subjective experiences of sleep.

  1. Influence of the wording of evaluation items on outcome-based evaluation results for large-group teaching in anatomy, biochemistry and legal medicine.

    PubMed

    Anders, Sven; Pyka, Katharina; Mueller, Tjark; von Streinbuechel, Nicole; Raupach, Tobias

    2016-11-01

    Student learning outcome is an important dimension of teaching quality in undergraduate medical education. Measuring an increase in knowledge during teaching requires repetitive objective testing which is usually not feasible. As an alternative, student learning outcome can be calculated from student self-ratings. Comparative self-assessment (CSA) gain reflects the performance difference before and after teaching, adjusted for initial knowledge. It has been shown to be a valid proxy measure of actual learning outcome derived from objective tests. However, student self-ratings are prone to a number of confounding factors. In the context of outcome-based evaluation, the wording of self-rating items is crucial to the validity of evaluation results. This randomized trial assessed whether including qualifiers in these statements impacts on student ratings and CSA gain. First-year medical students self-rated their initial (then-test) and final (post-test) knowledge for lectures in anatomy, biochemistry and legal medicine, respectively, and 659 questionnaires were retrieved. Six-point scales were used for self-ratings with 1 being the most positive option. Qualifier use did not affect then-test ratings but was associated with slightly less favorable post-test ratings. Consecutively, mean CSA gain was smaller for items containing qualifiers than for items lacking qualifiers (50.6±15.0% vs. 56.3±14.6%, p=0.079). The effect was more pronounced (Cohen's d=0.82) for items related to anatomy. In order to increase fairness of outcome-based evaluation and increase the comparability of CSA gain data across subjects, medical educators should agree on a consistent approach (qualifiers for all items or no qualifiers at all) when drafting self-rating statements for outcome-based evaluation. Copyright © 2016 Elsevier GmbH. All rights reserved.

  2. Evaluating Increased Effort for Item Disposal to Improve Recycling at a University

    ERIC Educational Resources Information Center

    Fritz, Jennifer N.; Dupuis, Danielle L.; Wu, Wai-Ling; Neal, Ashley E.; Rettig, Lisa A.; Lastrapes, Renée E.

    2017-01-01

    An evaluation of increased response effort to dispose of items was conducted to improve recycling at a university. Signs prompting individuals to recycle and notifying them of the location of trash and recycling receptacles were posted in each phase. During the intervention, trashcans were removed from the classrooms, and one large trashcan was…

  3. Restricted interests and teacher presentation of items.

    PubMed

    Stocco, Corey S; Thompson, Rachel H; Rodriguez, Nicole M

    2011-01-01

    Restricted and repetitive behavior (RRB) is more pervasive, prevalent, frequent, and severe in individuals with autism spectrum disorders (ASDs) than in their typical peers. One subtype of RRB is restricted interests in items or activities, which is evident in the manner in which individuals engage with items (e.g., repetitious wheel spinning), the types of items or activities they select (e.g., preoccupation with a phone book), or the range of items or activities they select (i.e., narrow range of items). We sought to describe the relation between restricted interests and teacher presentation of items. Overall, we observed 5 teachers interacting with 2 pairs of students diagnosed with an ASD. Each pair included 1 student with restricted interests. During these observations, teachers were free to present any items from an array of 4 stimuli selected by experimenters. We recorded student responses to teacher presentation of items and analyzed the data to determine the relation between teacher presentation of items and the consequences for presentation provided by the students. Teacher presentation of items corresponded with differential responses provided by students with ASD, and those with restricted preferences experienced a narrower array of items.

  4. A Non-Parametric Item Response Theory Evaluation of the CAGE Instrument Among Older Adults.

    PubMed

    Abdin, Edimansyah; Sagayadevan, Vathsala; Vaingankar, Janhavi Ajit; Picco, Louisa; Chong, Siow Ann; Subramaniam, Mythily

    2018-02-23

    The validity of the CAGE using item response theory (IRT) has not yet been examined in older adult population. This study aims to investigate the psychometric properties of the CAGE using both non-parametric and parametric IRT models, assess whether there is any differential item functioning (DIF) by age, gender and ethnicity and examine the measurement precision at the cut-off scores. We used data from the Well-being of the Singapore Elderly study to conduct Mokken scaling analysis (MSA), dichotomous Rasch and 2-parameter logistic IRT models. The measurement precision at the cut-off scores were evaluated using classification accuracy (CA) and classification consistency (CC). The MSA showed the overall scalability H index was 0.459, indicating a medium performing instrument. All items were found to be homogenous, measuring the same construct and able to discriminate well between respondents with high levels of the construct and the ones with lower levels. The item discrimination ranged from 1.07 to 6.73 while the item difficulty ranged from 0.33 to 2.80. Significant DIF was found for 2-item across ethnic group. More than 90% (CC and CA ranged from 92.5% to 94.3%) of the respondents were consistently and accurately classified by the CAGE cut-off scores of 2 and 3. The current study provides new evidence on the validity of the CAGE from the IRT perspective. This study provides valuable information of each item in the assessment of the overall severity of alcohol problem and the precision of the cut-off scores in older adult population.

  5. Identifying group-sensitive physical activities: a differential item functioning analysis of NHANES data.

    PubMed

    Gao, Yong; Zhu, Weimo

    2011-05-01

    The purpose of this study was to identify subgroup-sensitive physical activities (PA) using differential item functioning (DIF) analysis. A sub-unweighted sample of 1857 (men=923 and women=934) from the 2003-2004 National Health and Nutrition Examination Survey PA questionnaire data was used for the analyses. Using the Mantel-Haenszel, the simultaneous item bias test, and the ANOVA DIF methods, 33 specific leisure-time moderate and/or vigorous PA (MVPA) items were analyzed for DIF across race/ethnicity, gender, education, income, and age groups. Many leisure-time MVPA items were identified as large DIF items. When participating in the same amount of leisure-time MVPA, non-Hispanic blacks were more likely to participate in basketball and dance activities than non-Hispanic whites (NHW); NHW were more likely to participated in golf and hiking than non-Hispanic blacks; Hispanics were more likely to participate in dancing, hiking, and soccer than NHW, whereas NHW were more likely to engage in bicycling, golf, swimming, and walking than Hispanics; women were more likely to participate in aerobics, dancing, stretching, and walking than men, whereas men were more likely to engage in basketball, fishing, golf, running, soccer, weightlifting, and hunting than women; educated persons were more likely to participate in jogging and treadmill exercise than less educated persons; persons with higher incomes were more likely to engage in golf than those with lower incomes; and adults (20-59 yr) were more likely to participate in basketball, dancing, jogging, running, and weightlifting than older adults (60+ yr), whereas older adults were more likely to participate in walking and golf than younger adults. DIF methods are able to identify subgroup-sensitive PA and thus provide useful information to help design group-sensitive, targeted interventions for disadvantaged PA subgroups. © 2011 by the American College of Sports Medicine

  6. An item response theory evaluation of three depression assessment instruments in a clinical sample.

    PubMed

    Adler, Mats; Hetta, Jerker; Isacsson, Göran; Brodin, Ulf

    2012-06-21

    This study investigates whether an analysis, based on Item Response Theory (IRT), can be used for initial evaluations of depression assessment instruments in a limited patient sample from an affective disorder outpatient clinic, with the aim to finding major advantages and deficiencies of the instruments. Three depression assessment instruments, the depression module from the Patient Health Questionnaire (PHQ9), the depression subscale of Affective Self Rating Scale (AS-18-D) and the Montgomery-Åsberg Depression Rating Scale (MADRS) were evaluated in a sample of 61 patients with affective disorder diagnoses, mainly bipolar disorder. A '3- step IRT strategy' was used. In a first step, the Mokken non-parametric analysis showed that PHQ9 and AS-18-D had strong overall scalabilities of 0.510 [C.I. 0.42, 0.61] and 0,513 [C.I. 0.41, 0.63] respectively, while MADRS had a weak scalability of 0.339 [C.I. 0.25, 0.43]. In a second step, a Rasch model analysis indicated large differences concerning the item discriminating capacity and was therefore considered not suitable for the data. In third step, applying a more flexible two parameter model, all three instruments showed large differences in item information and items had a low capacity to reliably measure respondents at low levels of depression severity. We conclude that a stepwise IRT-approach, as performed in this study, is a suitable tool for studying assessment instruments at early stages of development. Such an analysis can give useful information, even in small samples, in order to construct more precise measurements or to evaluate existing assessment instruments. The study suggests that the PHQ9 and AS-18-D can be useful for measurement of depression severity in an outpatient clinic for affective disorder, while the MADRS shows weak measurement properties for this type of patients.

  7. Psychometric evaluation of an item bank for computerized adaptive testing of the EORTC QLQ-C30 cognitive functioning dimension in cancer patients.

    PubMed

    Dirven, Linda; Groenvold, Mogens; Taphoorn, Martin J B; Conroy, Thierry; Tomaszewski, Krzysztof A; Young, Teresa; Petersen, Morten Aa

    2017-11-01

    The European Organisation of Research and Treatment of Cancer (EORTC) Quality of Life Group is developing computerized adaptive testing (CAT) versions of all EORTC Quality of Life Questionnaire (QLQ-C30) scales with the aim to enhance measurement precision. Here we present the results on the field-testing and psychometric evaluation of the item bank for cognitive functioning (CF). In previous phases (I-III), 44 candidate items were developed measuring CF in cancer patients. In phase IV, these items were psychometrically evaluated in a large sample of international cancer patients. This evaluation included an assessment of dimensionality, fit to the item response theory (IRT) model, differential item functioning (DIF), and measurement properties. A total of 1030 cancer patients completed the 44 candidate items on CF. Of these, 34 items could be included in a unidimensional IRT model, showing an acceptable fit. Although several items showed DIF, these had a negligible impact on CF estimation. Measurement precision of the item bank was much higher than the two original QLQ-C30 CF items alone, across the whole continuum. Moreover, CAT measurement may on average reduce study sample sizes with about 35-40% compared to the original QLQ-C30 CF scale, without loss of power. A CF item bank for CAT measurement consisting of 34 items was established, applicable to various cancer patients across countries. This CAT measurement system will facilitate precise and efficient assessment of HRQOL of cancer patients, without loss of comparability of results.

  8. The medial temporal lobes distinguish between within-item and item-context relations during autobiographical memory retrieval.

    PubMed

    Sheldon, Signy; Levine, Brian

    2015-12-01

    During autobiographical memory retrieval, the medial temporal lobes (MTL) relate together multiple event elements, including object (within-item relations) and context (item-context relations) information, to create a cohesive memory. There is consistent support for a functional specialization within the MTL according to these relational processes, much of which comes from recognition memory experiments. In this study, we compared brain activation patterns associated with retrieving within-item relations (i.e., associating conceptual and sensory-perceptual object features) and item-context relations (i.e., spatial relations among objects) with respect to naturalistic autobiographical retrieval. We developed a novel paradigm that cued participants to retrieve information about past autobiographical events, non-episodic within-item relations, and non-episodic item-context relations with the perceptuomotor aspects of retrieval equated across these conditions. We used multivariate analysis techniques to extract common and distinct patterns of activity among these conditions within the MTL and across the whole brain, both in terms of spatial and temporal patterns of activity. The anterior MTL (perirhinal cortex and anterior hippocampus) was preferentially recruited for generating within-item relations later in retrieval whereas the posterior MTL (posterior parahippocampal cortex and posterior hippocampus) was preferentially recruited for generating item-context relations across the retrieval phase. These findings provide novel evidence for functional specialization within the MTL with respect to naturalistic memory retrieval. © 2015 Wiley Periodicals, Inc.

  9. Modeling Item-Level and Step-Level Invariance Effects in Polytomous Items Using the Partial Credit Model

    ERIC Educational Resources Information Center

    Gattamorta, Karina A.; Penfield, Randall D.; Myers, Nicholas D.

    2012-01-01

    Measurement invariance is a common consideration in the evaluation of the validity and fairness of test scores when the tested population contains distinct groups of examinees, such as examinees receiving different forms of a translated test. Measurement invariance in polytomous items has traditionally been evaluated at the item-level,…

  10. Using Classical Test Theory and Item Response Theory to Evaluate the LSCI

    NASA Astrophysics Data System (ADS)

    Schlingman, Wayne M.; Prather, E. E.; Collaboration of Astronomy Teaching Scholars CATS

    2011-01-01

    Analyzing the data from the recent national study using the Light and Spectroscopy Concept Inventory (LSCI), this project uses both Classical Test Theory (CTT) and Item Response Theory (IRT) to investigate the LSCI itself in order to better understand what it is actually measuring. We use Classical Test Theory to form a framework of results that can be used to evaluate the effectiveness of individual questions at measuring differences in student understanding and provide further insight into the prior results presented from this data set. In the second phase of this research, we use Item Response Theory to form a theoretical model that generates parameters accounting for a student's ability, a question's difficulty, and estimate the level of guessing. The combined results from our investigations using both CTT and IRT are used to better understand the learning that is taking place in classrooms across the country. The analysis will also allow us to evaluate the effectiveness of individual questions and determine whether the item difficulties are appropriately matched to the abilities of the students in our data set. These results may require that some questions be revised, motivating the need for further development of the LSCI. This material is based upon work supported by the National Science Foundation under Grant No. 0715517, a CCLI Phase III Grant for the Collaboration of Astronomy Teaching Scholars (CATS). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

  11. Assessing the Straightforwardly-Worded Brief Fear of Negative Evaluation Scale for Differential Item Functioning Across Gender and Ethnicity.

    PubMed

    Harpole, Jared K; Levinson, Cheri A; Woods, Carol M; Rodebaugh, Thomas L; Weeks, Justin W; Brown, Patrick J; Heimberg, Richard G; Menatti, Andrew R; Blanco, Carlos; Schneier, Franklin; Liebowitz, Michael

    2015-06-01

    The Brief Fear of Negative Evaluation Scale (BFNE; Leary Personality and Social Psychology Bulletin , 9, 371-375, 1983) assesses fear and worry about receiving negative evaluation from others. Rodebaugh et al. Psychological Assessment, 16 , 169-181, (2004) found that the BFNE is composed of a reverse-worded factor (BFNE-R) and straightforwardly-worded factor (BFNE-S). Further, they found the BFNE-S to have better psychometric properties and provide more information than the BFNE-R. Currently there is a lack of research regarding the measurement invariance of the BFNE-S across gender and ethnicity with respect to item thresholds. The present study uses item response theory (IRT) to test the BFNE-S for differential item functioning (DIF) related to gender and ethnicity (White, Asian, and Black). Six data sets consisting of clinical, community, and undergraduate participants were utilized ( N =2,109). The factor structure of the BFNE-S was confirmed using categorical confirmatory factor analysis, IRT model assumptions were tested, and the BFNE-S was evaluated for DIF. Item nine demonstrated significant non-uniform DIF between White and Black participants. No other items showed significant uniform or non-uniform DIF across gender or ethnicity. Results suggest the BFNE-S can be used reliably with men and women and Asian and White participants. More research is needed to understand the implications of using the BFNE-S with Black participants.

  12. The five item Barthel index

    PubMed Central

    Hobart, J; Thompson, A

    2001-01-01

    OBJECTIVES—Routine data collection is now considered mandatory. Therefore, staff rated clinical scales that consist of multiple items should have the minimum number of items necessary for rigorous measurement. This study explores the possibility of developing a short form Barthel index, suitable for use in clinical trials, epidemiological studies, and audit, that satisfies criteria for rigorous measurement and is psychometrically equivalent to the 10 item instrument.
METHODS—Data were analysed from 844 consecutive admissions to a neurological rehabilitation unit in London. Random half samples were generated. Short forms were developed in one sample (n=419), by selecting items with the best measurement properties, and tested in the other (n=418). For each of the 10 items of the BI, item total correlations and effect sizes were computed and rank ordered. The best items were defined as those with the lowest cross product of these rank orderings. The acceptability, reliability, validity, and responsiveness of three short form BIs (five, four, and three item) were determined and compared with the 10 item BI. Agreement between scores generated by short forms and 10 item BI was determined using intraclass correlation coefficients and the method of Bland and Altman.
RESULTS—The five best items in this sample were transfers, bathing, toilet use, stairs, and mobility. Of the three short forms examined, the five item BI had the best measurement properties and was psychometrically equivalent to the 10 item BI. Agreement between scores generated by the two measures for individual patients was excellent (ICC=0.90) but not identical (limits of agreement=1.84±3.84).
CONCLUSIONS—The five item short form BI may be a suitable outcome measure for group comparison studies in comparable samples. Further evaluations are needed. Results demonstrate a fundamental difference between assessment and measurement and the importance of incorporating psychometric methods in the

  13. [Conceptual, item, and semantic equivalence of a Brazilian version of the Physical Activity Checklist Interview (PACI)].

    PubMed

    Cruciani, Fernanda; Adami, Fernando; Assunção, Nathalia Antiqueira; Bergamaschi, Denise Pimentel

    2011-01-01

    There is a lack of Brazilian questionnaires to assess physical activity in children. The Physical Activity Checklist Interview (PACI) was originally developed for North American children and allows assessing physical activity during the previous day. The objectives of this study were: i) to describe procedures for choosing the PACI for cross-cultural adaptation and ii) to assess conceptual, item, and semantic equivalence of the Brazilian version to be used with 7-to-10-year-old children. PACI was identified from a systematic review of 18 questionnaires. The process of choosing the instrument involved discussions with researchers. The PACI allows assessing the construct and its dimensions. Some kinds of physical activity that are uncommon in the Brazilian population had to be eliminated. The following steps were taken to evaluate semantic equivalence: translation, retranslation, connotative and referential meaning assessment, and a pretest with 24 children aged 7 to 10 years. We present the PACI in its Brazilian adapted version, called Lista de Atividades Físicas (LAF).

  14. An Evaluation of "Intentional" Weighting of Extended-Response or Constructed-Response Items in Tests with Mixed Item Types.

    ERIC Educational Resources Information Center

    Ito, Kyoko; Sykes, Robert C.

    This study investigated the practice of weighting a type of test item, such as constructed response, more than other types of items, such as selected response, to compute student scores for a mixed-item type of test. The study used data from statewide writing field tests in grades 3, 5, and 8 and considered two contexts, that in which a single…

  15. Development and psychometric evaluation of the PROMIS Pediatric Life Satisfaction item banks, child-report, and parent-proxy editions.

    PubMed

    Forrest, Christopher B; Devine, Janine; Bevans, Katherine B; Becker, Brandon D; Carle, Adam C; Teneralli, Rachel E; Moon, JeanHee; Tucker, Carole A; Ravens-Sieberer, Ulrike

    2018-01-01

    To describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Life Satisfaction item banks, child-report, and parent-proxy editions. A pool of 55 life satisfaction items was administered to 1992 children 8-17 years old and 964 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and assessment of construct validity. Thirteen items were deleted because of poor psychometric performance. An 8-item short form was administered to a national sample of 996 children 8-17 years old, and 1294 parents of children 5-17 years old. The combined sample (2988 children and 2258 parents) was used in item response theory (IRT) calibration analyses. The final item banks were unidimensional, the items were locally independent, and the items were free from impactful differential item functioning. The 8-item and 4-item short form scales showed excellent reliability, convergent validity, and discriminant validity. Life satisfaction decreased with declining socio-economic status, presence of a special health care need, and increasing age for girls, but not boys. After IRT calibration, we found that 4- and 8-item short forms had a high degree of precision (reliability) across a wide range (>4 SD units) of the latent variable. The PROMIS Pediatric Life Satisfaction item banks and their short forms provide efficient, precise, and valid assessments of life satisfaction in children and youth.

  16. The Intuitive Eating Scale-2: item refinement and psychometric evaluation with college women and men.

    PubMed

    Tylka, Tracy L; Kroon Van Diest, Ashley M

    2013-01-01

    The 21-item Intuitive Eating Scale (IES; Tylka, 2006) measures individuals' tendency to follow their physical hunger and satiety cues when determining when, what, and how much to eat. While its scores have demonstrated reliability and validity with college women, the IES-2 was developed to improve upon the original version. Specifically, we added 17 positively scored items to the original IES items (which were predominantly negatively scored), integrated an additional component of intuitive eating (Body-Food Choice Congruence), and evaluated its psychometric properties with 1,405 women and 1,195 men across three studies. After we deleted 15 items (due to low item-factor loadings, high cross-loadings, and redundant content), the results supported the psychometric properties of the IES-2 with women and men. The final 23-item IES-2 contained 11 original items and 12 added items. Exploratory and second-order confirmatory factor analyses upheld its hypothesized 4-factor structure (its original 3 factors, plus Body-Food Choice Congruence) and a higher order factor. The IES-2 was largely invariant across sex, although negligible differences on 1 factor loading and 2 item intercepts were detected. Demonstrating validity, the IES-2 total scores and most IES-2 subscale scores were (a) positively related to body appreciation, self-esteem, and satisfaction with life; (b) inversely related to eating disorder symptomatology, poor interoceptive awareness, body surveillance, body shame, body mass index, and internalization of media appearance ideals; and (c) negligibly related to social desirability. IES-2 scores also garnered incremental validity by predicting psychological well-being above and beyond eating disorder symptomatology. The IES-2's applications for empirical research and clinical work are discussed. PsycINFO Database Record (c) 2013 APA, all rights reserved.

  17. Item Analysis in Introductory Economics Testing.

    ERIC Educational Resources Information Center

    Tinari, Frank D.

    1979-01-01

    Computerized analysis of multiple choice test items is explained. Examples of item analysis applications in the introductory economics course are discussed with respect to three objectives: to evaluate learning; to improve test items; and to help improve classroom instruction. Problems, costs and benefits of the procedures are identified. (JMD)

  18. Using Rasch Analysis to Evaluate the Reliability and Validity of the Swallowing Quality of Life Questionnaire: An Item Response Theory Approach.

    PubMed

    Cordier, Reinie; Speyer, Renée; Schindler, Antonio; Michou, Emilia; Heijnen, Bas Joris; Baijens, Laura; Karaduman, Ayşe; Swan, Katina; Clavé, Pere; Joosten, Annette Veronica

    2018-02-01

    The Swallowing Quality of Life questionnaire (SWAL-QOL) is widely used clinically and in research to evaluate quality of life related to swallowing difficulties. It has been described as a valid and reliable tool, but was developed and tested using classic test theory. This study describes the reliability and validity of the SWAL-QOL using item response theory (IRT; Rasch analysis). SWAL-QOL data were gathered from 507 participants at risk of oropharyngeal dysphagia (OD) across four European countries. OD was confirmed in 75.7% of participants via videofluoroscopy and/or fiberoptic endoscopic evaluation, or a clinical diagnosis based on meeting selected criteria. Patients with esophageal dysphagia were excluded. Data were analysed using Rasch analysis. Item and person reliability was good for all the items combined. However, person reliability was poor for 8 subscales and item reliability was poor for one subscale. Eight subscales exhibited poor person separation and two exhibited poor item separation. Overall item and person fit statistics were acceptable. However, at an individual item fit level results indicated unpredictable item responses for 28 items, and item redundancy for 10 items. The item-person dimensionality map confirmed these findings. Results from the overall Rasch model fit and Principal Component Analysis were suggestive of a second dimension. For all the items combined, none of the item categories were 'category', 'threshold' or 'step' disordered; however, all subscales demonstrated category disordered functioning. Findings suggest an urgent need to further investigate the underlying structure of the SWAL-QOL and its psychometric characteristics using IRT.

  19. Evaluation of Item-Based Top-N Recommendation Algorithms

    DTIC Science & Technology

    2000-09-15

    Furthermore, one of the advantages of the item-based algorithm is that it has much smaller computational require- 11 0.0 0.1 0.2 0.3 0.4 0.5 0.6 ecommerce ...items, utilized by many e-commerce sites, cannot take advantage of pre-computed user-to-user similarities. Consequently, even though the throughput of...Non-Zeros ecommerce 6667 17491 91222 catalog 50918 39080 435524 ccard 42629 68793 398619 skills 4374 2125 82612 movielens 943 1682 100000 Table 1: The

  20. Students' approaches to learning in a clinical practicum: A psychometric evaluation based on item response theory.

    PubMed

    Zhao, Yue; Kuan, Hoi Kei; Chung, Joyce O K; Chan, Cecilia K Y; Li, William H C

    2018-07-01

    The investigation of learning approaches in the clinical workplace context has remained an under-researched area. Despite the validation of learning approach instruments and their applications in various clinical contexts, little is known about the extent to which an individual item, that reflects a specific learning strategy and motive, effectively contributes to characterizing students' learning approaches. This study aimed to measure nursing students' approaches to learning in a clinical practicum using the Approaches to Learning at Work Questionnaire (ALWQ). Survey research design was used in the study. A sample of year 3 nursing students (n = 208) who undertook a 6-week clinical practicum course participated in the study. Factor analyses were conducted, followed by an item response theory analysis, including model assumption evaluation (unidimensionality and local independence), item calibration and goodness-of-fit assessment. Two subscales, deep and surface, were derived. Findings suggested that: (a) items measuring the deep motive from intrinsic interest and deep strategies of relating new ideas to similar situations, and that of concept mapping served as the strongest discriminating indicators; (b) the surface strategy of memorizing facts and details without an overall picture exhibited the highest discriminating power among all surface items; and, (c) both subscales appeared to be informative in assessing a broad range of the corresponding latent trait. The 21-item ALWQ derived from this study presented an efficient, internally consistent and precise measure. Findings provided a useful psychometric evaluation of the ALWQ in the clinical practicum context, added evidence to the utility of the ALWQ for nursing education practice and research, and echoed the discussions from previous studies on the role of the contextual factors in influencing student choices of different learning strategies. They provided insights for clinical educators to measure

  1. Validating and determining the weight of items used for evaluating clinical governance implementation based on analytic hierarchy process model.

    PubMed

    Hooshmand, Elaheh; Tourani, Sogand; Ravaghi, Hamid; Vafaee Najar, Ali; Meraji, Marziye; Ebrahimipour, Hossein

    2015-04-08

    The purpose of implementing a system such as Clinical Governance (CG) is to integrate, establish and globalize distinct policies in order to improve quality through increasing professional knowledge and the accountability of healthcare professional toward providing clinical excellence. Since CG is related to change, and change requires money and time, CG implementation has to be focused on priority areas that are in more dire need of change. The purpose of the present study was to validate and determine the significance of items used for evaluating CG implementation. The present study was descriptive-quantitative in method and design. Items used for evaluating CG implementation were first validated by the Delphi method and then compared with one another and ranked based on the Analytical Hierarchy Process (AHP) model. The items that were validated for evaluating CG implementation in Iran include performance evaluation, training and development, personnel motivation, clinical audit, clinical effectiveness, risk management, resource allocation, policies and strategies, external audit, information system management, research and development, CG structure, implementation prerequisites, the management of patients' non-medical needs, complaints and patients' participation in the treatment process. The most important items based on their degree of significance were training and development, performance evaluation, and risk management. The least important items included the management of patients' non-medical needs, patients' participation in the treatment process and research and development. The fundamental requirements of CG implementation included having an effective policy at national level, avoiding perfectionism, using the expertise and potentials of the entire country and the coordination of this model with other models of quality improvement such as accreditation and patient safety. © 2015 by Kerman University of Medical Sciences.

  2. Further evaluation of leisure items in the attention condition of functional analyses.

    PubMed

    Roscoe, Eileen M; Carreau, Abbey; MacDonald, Jackie; Pence, Sacha T

    2008-01-01

    Research suggests that including leisure items in the attention condition of a functional analysis may produce engagement that masks sensitivity to attention. In this study, 4 individuals' initial functional analyses indicated that behavior was maintained by nonsocial variables (n = 3) or by attention (n = 1). A preference assessment was used to identify items for subsequent functional analyses. Four conditions were compared, attention with and without leisure items and control with and without leisure items. Following this, either high- or low-preference items were included in the attention condition. Problem behavior was more probable during the attention condition when no leisure items or low-preference items were included, and lower levels of problem behavior were observed during the attention condition when high-preference leisure items were included. These findings suggest how preferred items may hinder detection of behavioral function.

  3. Exploring the Relevance of Items in the Communicative Participation Item Bank (CPIB) for Individuals With Hearing Loss

    PubMed Central

    Baylor, Carolyn R.; Birch, Kristen; Yorkston, Kathryn M.

    2017-01-01

    Purpose The Communicative Participation Item Bank (CPIB) was developed to evaluate participation restrictions in communication situations for individuals with speech and language disorders. This study evaluated the potential relevance of CPIB items for individuals with hearing loss. Method Cognitive interviews were conducted with 17 adults with a range of treated and untreated hearing loss, who responded to 46 items. Interviews were continued until saturation was reached and prevalent trends emerged. A focus group was also conducted with 3 experienced audiologists to seek their views on the CPIB. Analysis of data included qualitative and quantitative approaches. Results The majority of the items were applicable to individuals with hearing loss; however, 12 items were identified as potentially not relevant. This was largely attributed to the items' focus on speech production rather than hearing. The results from the focus group were in agreement for a majority of items. Conclusions The next step in validating the CPIB for individuals with hearing loss is a psychometric analysis on a large sample. Possible outcomes could be that the CPIB is considered valid in its entirety or the creation of a new questionnaire or a hearing loss–specific short form with a subset of items is necessary. PMID:28114665

  4. A leukocyte activation test identifies food items which induce release of DNA by innate immune peripheral blood leucocytes.

    PubMed

    Garcia-Martinez, Irma; Weiss, Theresa R; Yousaf, Muhammad N; Ali, Ather; Mehal, Wajahat Z

    2018-01-01

    Leukocyte activation (LA) testing identifies food items that induce a patient specific cellular response in the immune system, and has recently been shown in a randomized double blinded prospective study to reduce symptoms in patients with irritable bowel syndrome (IBS). We hypothesized that test reactivity to particular food items, and the systemic immune response initiated by these food items, is due to the release of cellular DNA from blood immune cells. We tested this by quantifying total DNA concentration in the cellular supernatant of immune cells exposed to positive and negative foods from 20 healthy volunteers. To establish if the DNA release by positive samples is a specific phenomenon, we quantified myeloperoxidase (MPO) in cellular supernatants. We further assessed if a particular immune cell population (neutrophils, eosinophils, and basophils) was activated by the positive food items by flow cytometry analysis. To identify the signaling pathways that are required for DNA release we tested if specific inhibitors of key signaling pathways could block DNA release. Foods with a positive LA test result gave a higher supernatant DNA content when compared to foods with a negative result. This was specific as MPO levels were not increased by foods with a positive LA test. Protein kinase C (PKC) inhibitors resulted in inhibition of positive food stimulated DNA release. Positive foods resulted in CD63 levels greater than negative foods in eosinophils in 76.5% of tests. LA test identifies food items that result in release of DNA and activation of peripheral blood innate immune cells in a PKC dependent manner, suggesting that this LA test identifies food items that result in release of inflammatory markers and activation of innate immune cells. This may be the basis for the improvement in symptoms in IBS patients who followed an LA test guided diet.

  5. Using classical test theory, item response theory, and Rasch measurement theory to evaluate patient-reported outcome measures: a comparison of worked examples.

    PubMed

    Petrillo, Jennifer; Cano, Stefan J; McLeod, Lori D; Coon, Cheryl D

    2015-01-01

    To provide comparisons and a worked example of item- and scale-level evaluations based on three psychometric methods used in patient-reported outcome development-classical test theory (CTT), item response theory (IRT), and Rasch measurement theory (RMT)-in an analysis of the National Eye Institute Visual Functioning Questionnaire (VFQ-25). Baseline VFQ-25 data from 240 participants with diabetic macular edema from a randomized, double-masked, multicenter clinical trial were used to evaluate the VFQ at the total score level. CTT, RMT, and IRT evaluations were conducted, and results were assessed in a head-to-head comparison. Results were similar across the three methods, with IRT and RMT providing more detailed diagnostic information on how to improve the scale. CTT led to the identification of two problematic items that threaten the validity of the overall scale score, sets of redundant items, and skewed response categories. IRT and RMT additionally identified poor fit for one item, many locally dependent items, poor targeting, and disordering of over half the response categories. Selection of a psychometric approach depends on many factors. Researchers should justify their evaluation method and consider the intended audience. If the instrument is being developed for descriptive purposes and on a restricted budget, a cursory examination of the CTT-based psychometric properties may be all that is possible. In a high-stakes situation, such as the development of a patient-reported outcome instrument for consideration in pharmaceutical labeling, however, a thorough psychometric evaluation including IRT or RMT should be considered, with final item-level decisions made on the basis of both quantitative and qualitative results. Copyright © 2015. Published by Elsevier Inc.

  6. Using Reliability and Item Analysis to Evaluate a Teacher-Developed Test in Educational Measurement and Evaluation

    ERIC Educational Resources Information Center

    Quaigrain, Kennedy; Arhin, Ato Kwamina

    2017-01-01

    Item analysis is essential in improving items which will be used again in later tests; it can also be used to eliminate misleading items in a test. The study focused on item and test quality and explored the relationship between difficulty index (p-value) and discrimination index (DI) with distractor efficiency (DE). The study was conducted among…

  7. A 67-Item Stress Resilience item bank showing high content validity was developed in a psychosomatic sample.

    PubMed

    Obbarius, Nina; Fischer, Felix; Obbarius, Alexander; Nolte, Sandra; Liegl, Gregor; Rose, Matthias

    2018-04-10

    To develop the first item bank to measure Stress Resilience (SR) in clinical populations. Qualitative item development resulted in an initial pool of 131 items covering a broad theoretical SR concept. These items were tested in n=521 patients at a psychosomatic outpatient clinic. Exploratory and Confirmatory Factor Analysis (CFA), as well as other state-of-the-art item analyses and IRT were used for item evaluation and calibration of the final item bank. Out of the initial item pool of 131 items, we excluded 64 items (54 factor loading <.5, 4 residual correlations >.3, 2 non-discriminative Item Response Curves, 4 Differential Item Functioning). The final set of 67 items indicated sufficient model fit in CFA and IRT analyses. Additionally, a 10-item short form with high measurement precision (SE≤.32 in a theta range between -1.8 and +1.5) was derived. Both the SR item bank and the SR short form were highly correlated with an existing static legacy tool (Connor-Davidson Resilience Scale). The final SR item bank and 10-item short form showed good psychometric properties. When further validated, they will be ready to be used within a framework of Computer-Adaptive Tests for a comprehensive assessment of the Stress-Construct. Copyright © 2018. Published by Elsevier Inc.

  8. Statistical evaluation of synchronous spike patterns extracted by frequent item set mining

    PubMed Central

    Torre, Emiliano; Picado-Muiño, David; Denker, Michael; Borgelt, Christian; Grün, Sonja

    2013-01-01

    We recently proposed frequent itemset mining (FIM) as a method to perform an optimized search for patterns of synchronous spikes (item sets) in massively parallel spike trains. This search outputs the occurrence count (support) of individual patterns that are not trivially explained by the counts of any superset (closed frequent item sets). The number of patterns found by FIM makes direct statistical tests infeasible due to severe multiple testing. To overcome this issue, we proposed to test the significance not of individual patterns, but instead of their signatures, defined as the pairs of pattern size z and support c. Here, we derive in detail a statistical test for the significance of the signatures under the null hypothesis of full independence (pattern spectrum filtering, PSF) by means of surrogate data. As a result, injected spike patterns that mimic assembly activity are well detected, yielding a low false negative rate. However, this approach is prone to additionally classify patterns resulting from chance overlap of real assembly activity and background spiking as significant. These patterns represent false positives with respect to the null hypothesis of having one assembly of given signature embedded in otherwise independent spiking activity. We propose the additional method of pattern set reduction (PSR) to remove these false positives by conditional filtering. By employing stochastic simulations of parallel spike trains with correlated activity in form of injected spike synchrony in subsets of the neurons, we demonstrate for a range of parameter settings that the analysis scheme composed of FIM, PSF and PSR allows to reliably detect active assemblies in massively parallel spike trains. PMID:24167487

  9. Item-focussed Trees for the Identification of Items in Differential Item Functioning.

    PubMed

    Tutz, Gerhard; Berger, Moritz

    2016-09-01

    A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method.

  10. Science Library of Test Items. Volume Nineteen. A Collection of Multiple Choice Test Items Relating Mainly to Geology.

    ERIC Educational Resources Information Center

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  11. Science Library of Test Items. Volume Seventeen. A Collection of Multiple Choice Test Items Relating Mainly to Biology.

    ERIC Educational Resources Information Center

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  12. Science Library of Test Items. Volume Eighteen. A Collection of Multiple Choice Test Items Relating Mainly to Chemistry.

    ERIC Educational Resources Information Center

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  13. Conditional recall and the frequency effect in the serial recall task: an examination of item-to-item associativity.

    PubMed

    Miller, Leonie M; Roodenrys, Steven

    2012-11-01

    The frequency effect in short-term serial recall is influenced by the composition of lists. In pure lists, a robust advantage in the recall of high-frequency (HF) words is observed, yet in alternating mixed lists, HF and low-frequency (LF) words are recalled equally well. It has been argued that the preexisting associations between all list items determine a single, global level of supportive activation that assists item recall. Preexisting associations between items are assumed to be a function of language co-occurrence; HF-HF associations are high, LF-LF associations are low, and mixed associations are intermediate in activation strength. This account, however, is based on results when alternating lists with equal numbers of HF and LF words were used. It is possible that directional association between adjacent list items is responsible for the recall patterns reported. In the present experiment, the recall of three forms of mixed lists-those with equal numbers of HF and LF items and pure lists-was examined to test the extent to which item-to-item associations are present in serial recall. Furthermore, conditional probabilities were used to examine more closely the evidence for a contribution, since correct-in-position scoring may mask recall that is dependent on the recall of prior items. The results suggest that an item-to-item effect is clearly present for early but not late list items, and they implicate an additional factor, perhaps the availability of resources at output, in the recall of late list items.

  14. Calibration of the Spanish PROMIS Smoking Item Banks.

    PubMed

    Huang, Wenjing; Stucky, Brian D; Edelen, Maria O; Tucker, Joan S; Shadel, William G; Hansen, Mark; Cai, Li

    2016-07-01

    The Patient-Reported Outcomes Measurement Information System (PROMIS) Smoking Initiative has developed item banks for assessing six smoking behaviors and biopsychosocial correlates of smoking among adult cigarette smokers. The goal of this study is to evaluate the performance of the Spanish version of the PROMIS smoking item banks as compared to the original banks developed in English. The six PROMIS banks for daily smokers were translated into Spanish and administered to a sample of Spanish-speaking adult daily smokers in the United States (N = 302). We first evaluated the unidimensionality of each bank using confirmatory factor analysis. We then conducted a two-group item response theory calibration, including an item response theory-based Differential Item Functioning (DIF) analysis by language of administration (Spanish vs. English). Finally, we generated full bank and short form scores for the translated banks and evaluated their psychometric performance. Unidimensionality of the Spanish smoking item banks was supported by confirmatory factor analysis results. Out of a total of 109 items that were evaluated for language DIF, seven items in three of the six banks were identified as having levels of DIF that exceeded an established criterion. The psychometric performance of the Spanish daily smoker banks is largely comparable to that of the English versions. The Spanish PROMIS smoking item banks are highly similar, but not entirely equivalent, to the original English versions. The parameters from these two-group calibrations can be used to generate comparable bank scores across the two language versions. In this study, we developed a Spanish version of the PROMIS smoking toolkit, which was originally designed and developed for English speakers. With the growing Spanish-speaking population, it is important to make the toolkit more accessible by translating the items and calibrating the Spanish version to be comparable with English-language scores. This study

  15. Development of new physical activity and sedentary behavior change self-efficacy questionnaires using item response modeling

    USDA-ARS?s Scientific Manuscript database

    Theoretically, increased levels of physical activity self-efficacy (PASE) should lead to increased physical activity, but few studies have reported this effect among youth. This failure may be at least partially attributable to measurement limitations. In this study, Item Response Modeling (IRM) was...

  16. An Evaluation Method for PV Systems by using Limited Data Item

    NASA Astrophysics Data System (ADS)

    Oozeki, Takashi; Izawa, Toshiyasu; Otani, Kenji; Tsuzuku, Ken; Koike, Hisafumi; Kurokawa, Kosuke

    Beside photovoltaic (PV) systems are recently expected to introduce around Japan, almost all of them have not been taken care after established since PV systems are called maintenance free. In fact, there are few troubles about PV operations behind owners of PV systems because characteristics of them cannot be identified completely such as the ideal output energy. Therefore, it is very important to evaluate the characteristics of them. For evaluating them, equipments of measuring are required, and they, especially Pyrheliometer, are expensive as much as owners of the PV system cannot equip usually. Consequently, An evaluation method which can reveal the performance of operation such as the performance ratio with a very few kinds of data is necessary. In this paper, proposed method can evaluate performance ratio, shading losses, inverter efficiency losses by using only system output data items. The adequacies of the method are indicated by comparing with actual data and field survey results. As a result, the method is intended to be checking tool of PV system performance.

  17. Evaluation of a faculty development program aimed at increasing residents' active learning in lectures.

    PubMed

    Desselle, Bonnie C; English, Robin; Hescock, George; Hauser, Andrea; Roy, Melissa; Yang, Tong; Chauvin, Sheila W

    2012-12-01

    Active engagement in the learning process is important to enhance learners' knowledge acquisition and retention and the development of their thinking skills. This study evaluated whether a 1-hour faculty development workshop increased the use of active teaching strategies and enhanced residents' active learning and thinking. Faculty teaching in a pediatrics residency participated in a 1-hour workshop (intervention) approximately 1 month before a scheduled lecture. Participants' responses to a preworkshop/postworkshop questionnaire targeted self-efficacy (confidence) for facilitating active learning and thinking and providing feedback about workshop quality. Trained observers assessed each lecture (3-month baseline phase and 3-month intervention phase) using an 8-item scale for use of active learning strategies and a 7-item scale for residents' engagement in active learning. Observers also assessed lecturer-resident interactions and the extent to which residents were asked to justify their answers. Responses to the workshop questionnaire (n  =  32/34; 94%) demonstrated effectiveness and increased confidence. Faculty in the intervention phase demonstrated increased use of interactive teaching strategies for 6 items, with 5 reaching statistical significance (P ≤ .01). Residents' active learning behaviors in lectures were higher in the intervention arm for all 7 items, with 5 reaching statistical significance. Faculty in the intervention group demonstrated increased use of higher-order questioning (P  =  .02) and solicited justifications for answers (P  =  .01). A 1-hour faculty development program increased faculty use of active learning strategies and residents' engagement in active learning during resident core curriculum lectures.

  18. Differential Item Functioning in Primary Healthcare Evaluation Instruments by French/English Version, Educational Level and Urban/Rural Location

    PubMed Central

    Haggerty, Jeannie L.; Bouharaoui, Fatima; Santor, Darcy A.

    2011-01-01

    Evaluating the extent to which groups or subgroups of individuals differ with respect to primary healthcare experience depends on first ruling out the possibility of bias. Objective: To determine whether item or subscale performance differs systematically between French/English, high/low education subgroups and urban/rural residency. Method: A sample of 645 adult users balanced by French/English language (in Quebec and Nova Scotia, respectively), high/low education and urban/rural residency responded to six validated instruments: the Primary Care Assessment Survey (PCAS); the Primary Care Assessment Tool – Short Form (PCAT-S); the Components of Primary Care Index (CPCI); the first version of the EUROPEP (EUROPEP-I); the Interpersonal Processes of Care Survey, version II (IPC-II); and part of the Veterans Affairs National Outpatient Customer Satisfaction Survey (VANOCSS). We normalized subscale scores to a 0-to-10 scale and tested for between-group differences using ANOVA tests. We used a parametric item response model to test for differences between subgroups in item discriminability and item difficulty. We re-examined group differences after removing items with differential item functioning. Results: Experience of care was assessed more positively in the English-speaking (Nova Scotia) than in the French-speaking (Quebec) respondents. We found differential English/French item functioning in 48% of the 153 items: discriminability in 20% and differential difficulty in 28%. English items were more discriminating generally than the French. Removing problematic items did not change the differences in French/English assessments. Differential item functioning by high/low education status affected 27% of items, with items being generally more discriminating in high-education groups. Between-group comparisons were unchanged. In contrast, only 9% of items showed differential item functioning by geography, affecting principally the accessibility attribute. Removing

  19. Agriculture Library of Test Items.

    ERIC Educational Resources Information Center

    Sutherland, Duncan, Ed.

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection is reviewed for content validity and reliability. The test…

  20. A Review of Classical Methods of Item Analysis.

    ERIC Educational Resources Information Center

    French, Christine L.

    Item analysis is a very important consideration in the test development process. It is a statistical procedure to analyze test items that combines methods used to evaluate the important characteristics of test items, such as difficulty, discrimination, and distractibility of the items in a test. This paper reviews some of the classical methods for…

  1. Evaluation of the Fecal Incontinence Quality of Life Scale (FIQL) using item response theory reveals limitations and suggests revisions.

    PubMed

    Peterson, Alexander C; Sutherland, Jason M; Liu, Guiping; Crump, R Trafford; Karimuddin, Ahmer A

    2018-06-01

    The Fecal Incontinence Quality of Life Scale (FIQL) is a commonly used patient-reported outcome measure for fecal incontinence, often used in clinical trials, yet has not been validated in English since its initial development. This study uses modern methods to thoroughly evaluate the psychometric characteristics of the FIQL and its potential for differential functioning by gender. This study analyzed prospectively collected patient-reported outcome data from a sample of patients prior to colorectal surgery. Patients were recruited from 14 general and colorectal surgeons in Vancouver Coastal Health hospitals in Vancouver, Canada. Confirmatory factor analysis was used to assess construct validity. Item response theory was used to evaluate test reliability, describe item-level characteristics, identify local item dependence, and test for differential functioning by gender. 236 patients were included for analysis, with mean age 58 and approximately half female. Factor analysis failed to identify the lifestyle, coping, depression, and embarrassment domains, suggesting lack of construct validity. Items demonstrated low difficulty, indicating that the test has the highest reliability among individuals who have low quality of life. Five items are suggested for removal or replacement. Differential test functioning was minimal. This study has identified specific improvements that can be made to each domain of the Fecal Incontinence Quality of Life Scale and to the instrument overall. Formatting, scoring, and instructions may be simplified, and items with higher difficulty developed. The lifestyle domain can be used as is. The embarrassment domain should be significantly revised before use.

  2. Science Library of Test Items. Volume Four: Practical Testing Guide.

    ERIC Educational Resources Information Center

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test items collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, the guide gives a wide range of questions and activities for the manipulation of scientific equipment to allow assessment of students' practical laboratory skills. Instructions are given to make norm-referenced or…

  3. The PROMIS Physical Function item bank was calibrated to a standardized metric and shown to improve measurement efficiency.

    PubMed

    Rose, Matthias; Bjorner, Jakob B; Gandek, Barbara; Bruce, Bonnie; Fries, James F; Ware, John E

    2014-05-01

    To document the development and psychometric evaluation of the Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function (PF) item bank and static instruments. The items were evaluated using qualitative and quantitative methods. A total of 16,065 adults answered item subsets (n>2,200/item) on the Internet, with oversampling of the chronically ill. Classical test and item response theory methods were used to evaluate 149 PROMIS PF items plus 10 Short Form-36 and 20 Health Assessment Questionnaire-Disability Index items. A graded response model was used to estimate item parameters, which were normed to a mean of 50 (standard deviation [SD]=10) in a US general population sample. The final bank consists of 124 PROMIS items covering upper, central, and lower extremity functions and instrumental activities of daily living. In simulations, a 10-item computerized adaptive test (CAT) eliminated floor and decreased ceiling effects, achieving higher measurement precision than any comparable length static tool across four SDs of the measurement range. Improved psychometric properties were transferred to the CAT's superior ability to identify differences between age and disease groups. The item bank provides a common metric and can improve the measurement of PF by facilitating the standardization of patient-reported outcome measures and implementation of CATs for more efficient PF assessments over a larger range. Copyright © 2014. Published by Elsevier Inc.

  4. Item Writer Judgments of Item Difficulty versus Actual Item Difficulty: A Case Study

    ERIC Educational Resources Information Center

    Sydorenko, Tetyana

    2011-01-01

    This study investigates how accurate one item writer can be on item difficulty estimates and whether factors affecting item writer judgments correspond to predictors of actual item difficulty. The items were based on conversational dialogs (presented as videos online) that focus on pragmatic functions. Thirty-five 2nd-, 3rd-, and 4th-year learners…

  5. Science Library of Test Items. Volume Two.

    ERIC Educational Resources Information Center

    New South Wales Dept. of Education, Sydney (Australia).

    The second volume of test items in the Science Library of Test Items is intended as a resource to assist teachers in implementing and evaluating science courses in the first 4 years of Australian secondary school. The items were selected from questions submitted to the School Certificate Development Unit by teachers in New South Wales. Only the…

  6. Vegetable parenting practices scale: Item response modeling analyses

    USDA-ARS?s Scientific Manuscript database

    Our objective was to evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We al...

  7. Measuring ability to assess claims about treatment effects: a latent trait analysis of items from the 'Claim Evaluation Tools' database using Rasch modelling.

    PubMed

    Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D

    2017-05-25

    The Claim Evaluation Tools database contains multiple-choice items for measuring people's ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Most of the items conformed well to the Rasch model's expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  8. Science Library of Test Items. Volume Twenty. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 1.

    ERIC Educational Resources Information Center

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  9. Science Library of Test Items. Volume Twenty-Two. A Collection of Multiple Choice Test Items Relating Mainly to Skills.

    ERIC Educational Resources Information Center

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  10. An Item Gains and Losses Analysis of False Memories Suggests Critical Items Receive More Item-Specific Processing than List Items

    ERIC Educational Resources Information Center

    Burns, Daniel J.; Martens, Nicholas J.; Bertoni, Alicia A.; Sweeney, Emily J.; Lividini, Michelle D.

    2006-01-01

    In a repeated testing paradigm, list items receiving item-specific processing are more likely to be recovered across successive tests (item gains), whereas items receiving relational processing are likely to be forgotten progressively less on successive tests. Moreover, analysis of cumulative-recall curves has shown that item-specific processing…

  11. Investigating the Impact of Item Parameter Drift for Item Response Theory Models with Mixture Distributions.

    PubMed

    Park, Yoon Soo; Lee, Young-Sun; Xing, Kuan

    2016-01-01

    This study investigates the impact of item parameter drift (IPD) on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT) models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS) were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results also showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effects on item parameters and examinee ability.

  12. Investigating the Impact of Item Parameter Drift for Item Response Theory Models with Mixture Distributions

    PubMed Central

    Park, Yoon Soo; Lee, Young-Sun; Xing, Kuan

    2016-01-01

    This study investigates the impact of item parameter drift (IPD) on parameter and ability estimation when the underlying measurement model fits a mixture distribution, thereby violating the item invariance property of unidimensional item response theory (IRT) models. An empirical study was conducted to demonstrate the occurrence of both IPD and an underlying mixture distribution using real-world data. Twenty-one trended anchor items from the 1999, 2003, and 2007 administrations of Trends in International Mathematics and Science Study (TIMSS) were analyzed using unidimensional and mixture IRT models. TIMSS treats trended anchor items as invariant over testing administrations and uses pre-calibrated item parameters based on unidimensional IRT. However, empirical results showed evidence of two latent subgroups with IPD. Results also showed changes in the distribution of examinee ability between latent classes over the three administrations. A simulation study was conducted to examine the impact of IPD on the estimation of ability and item parameters, when data have underlying mixture distributions. Simulations used data generated from a mixture IRT model and estimated using unidimensional IRT. Results showed that data reflecting IPD using mixture IRT model led to IPD in the unidimensional IRT model. Changes in the distribution of examinee ability also affected item parameters. Moreover, drift with respect to item discrimination and distribution of examinee ability affected estimates of examinee ability. These findings demonstrate the need to caution and evaluate IPD using a mixture IRT framework to understand its effects on item parameters and examinee ability. PMID:26941699

  13. How item banks and their application can influence measurement practice in rehabilitation medicine: a PROMIS fatigue item bank example.

    PubMed

    Lai, Jin-Shei; Cella, David; Choi, Seung; Junghaenel, Doerte U; Christodoulou, Christopher; Gershon, Richard; Stone, Arthur

    2011-10-01

    To illustrate how measurement practices can be advanced by using as an example the fatigue item bank (FIB) and its applications (short forms and computerized adaptive testing [CAT]) that were developed through the National Institutes of Health Patient Reported Outcomes Measurement Information System (PROMIS) Cooperative Group. Psychometric analysis of data collected by an Internet survey company using item response theory-related techniques. A U.S. general population representative sample collected through the Internet. Respondents used for dimensionality evaluation of the PROMIS FIB (N=603) and item calibrations (N=14,931). Not applicable. Fatigue items (112) developed by the PROMIS fatigue domain working group, 13-item Functional Assessment of Chronic Illness Therapy-Fatigue, and 4-item Medical Outcomes Study 36-Item Short Form Health Survey Vitality scale. The PROMIS FIB version 1, which consists of 95 items, showed acceptable psychometric properties. CAT showed consistently better precision than short forms. However, all 3 short forms showed good precision for most participants in that more than 95% of the sample could be measured precisely with reliability greater than 0.9. Measurement practice can be advanced by using a psychometrically sound measurement tool and its applications. This example shows that CAT and short forms derived from the PROMIS FIB can reliably estimate fatigue reported by the U.S. general population. Evaluation in clinical populations is warranted before the item bank can be used for clinical trials. Copyright © 2011 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  14. Item Difficulty Modeling of Paragraph Comprehension Items

    ERIC Educational Resources Information Center

    Gorin, Joanna S.; Embretson, Susan E.

    2006-01-01

    Recent assessment research joining cognitive psychology and psychometric theory has introduced a new technology, item generation. In algorithmic item generation, items are systematically created based on specific combinations of features that underlie the processing required to correctly solve a problem. Reading comprehension items have been more…

  15. Measuring ability to assess claims about treatment effects: a latent trait analysis of items from the ‘Claim Evaluation Tools’ database using Rasch modelling

    PubMed Central

    Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D

    2017-01-01

    Background The Claim Evaluation Tools database contains multiple-choice items for measuring people’s ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. Objectives To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. Participants We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Results Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Conclusion Most of the items conformed well to the Rasch model’s expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. PMID:28550019

  16. Promoting Cold-Start Items in Recommender Systems

    PubMed Central

    Liu, Jin-Hu; Zhou, Tao; Zhang, Zi-Ke; Yang, Zimo; Liu, Chuang; Li, Wei-Min

    2014-01-01

    As one of the major challenges, cold-start problem plagues nearly all recommender systems. In particular, new items will be overlooked, impeding the development of new products online. Given limited resources, how to utilize the knowledge of recommender systems and design efficient marketing strategy for new items is extremely important. In this paper, we convert this ticklish issue into a clear mathematical problem based on a bipartite network representation. Under the most widely used algorithm in real e-commerce recommender systems, the so-called item-based collaborative filtering, we show that to simply push new items to active users is not a good strategy. Interestingly, experiments on real recommender systems indicate that to connect new items with some less active users will statistically yield better performance, namely, these new items will have more chance to appear in other users' recommendation lists. Further analysis suggests that the disassortative nature of recommender systems contributes to such observation. In a word, getting in-depth understanding on recommender systems could pave the way for the owners to popularize their cold-start products with low costs. PMID:25479013

  17. Promoting cold-start items in recommender systems.

    PubMed

    Liu, Jin-Hu; Zhou, Tao; Zhang, Zi-Ke; Yang, Zimo; Liu, Chuang; Li, Wei-Min

    2014-01-01

    As one of the major challenges, cold-start problem plagues nearly all recommender systems. In particular, new items will be overlooked, impeding the development of new products online. Given limited resources, how to utilize the knowledge of recommender systems and design efficient marketing strategy for new items is extremely important. In this paper, we convert this ticklish issue into a clear mathematical problem based on a bipartite network representation. Under the most widely used algorithm in real e-commerce recommender systems, the so-called item-based collaborative filtering, we show that to simply push new items to active users is not a good strategy. Interestingly, experiments on real recommender systems indicate that to connect new items with some less active users will statistically yield better performance, namely, these new items will have more chance to appear in other users' recommendation lists. Further analysis suggests that the disassortative nature of recommender systems contributes to such observation. In a word, getting in-depth understanding on recommender systems could pave the way for the owners to popularize their cold-start products with low costs.

  18. Evaluating increased effort for item disposal to improve recycling at a university.

    PubMed

    Fritz, Jennifer N; Dupuis, Danielle L; Wu, Wai-Ling; Neal, Ashley E; Rettig, Lisa A; Lastrapes, Renée E

    2017-10-01

    An evaluation of increased response effort to dispose of items was conducted to improve recycling at a university. Signs prompting individuals to recycle and notifying them of the location of trash and recycling receptacles were posted in each phase. During the intervention, trashcans were removed from the classrooms, and one large trashcan was available in the hallway next to the recycling receptacles. Results showed that correct recycling increased, and trash left in classrooms increased initially during the second intervention phase before returning to baseline levels. © 2017 Society for the Experimental Analysis of Behavior.

  19. Remembering verbally-presented items as pictures: Brain activity underlying visual mental images in schizophrenia patients with visual hallucinations.

    PubMed

    Stephan-Otto, Christian; Siddi, Sara; Senior, Carl; Cuevas-Esteban, Jorge; Cambra-Martí, Maria Rosa; Ochoa, Susana; Brébion, Gildas

    2017-09-01

    Previous research suggests that visual hallucinations in schizophrenia consist of mental images mistaken for percepts due to failure of the reality-monitoring processes. However, the neural substrates that underpin such dysfunction are currently unknown. We conducted a brain imaging study to investigate the role of visual mental imagery in visual hallucinations. Twenty-three patients with schizophrenia and 26 healthy participants were administered a reality-monitoring task whilst undergoing an fMRI protocol. At the encoding phase, a mixture of pictures of common items and labels designating common items were presented. On the memory test, participants were requested to remember whether a picture of the item had been presented or merely its label. Visual hallucination scores were associated with a liberal response bias reflecting propensity to erroneously remember pictures of the items that had in fact been presented as words. At encoding, patients with visual hallucinations differentially activated the right fusiform gyrus when processing the words they later remembered as pictures, which suggests the formation of visual mental images. On the memory test, the whole patient group activated the anterior cingulate and medial superior frontal gyrus when falsely remembering pictures. However, no differential activation was observed in patients with visual hallucinations, whereas in the healthy sample, the production of visual mental images at encoding led to greater activation of a fronto-parietal decisional network on the memory test. Visual hallucinations are associated with enhanced visual imagery and possibly with a failure of the reality-monitoring processes that enable discrimination between imagined and perceived events. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Item Structural Properties as Predictors of Item Difficulty and Item Association.

    ERIC Educational Resources Information Center

    Solano-Flores, Guillermo

    1993-01-01

    Studied the ability of logical test design (LTD) to predict student performance in reading Roman numerals for 211 sixth graders in Mexico City tested on Roman numeral items varying on LTD-related and non-LTD-related variables. The LTD-related variable item iterativity was found to be the best predictor of item difficulty. (SLD)

  1. Science Library of Test Items. Volume Twenty-One. A Collection of Multiple Choice Test Items Relating Mainly to Physics, 2.

    ERIC Educational Resources Information Center

    New South Wales Dept. of Education, Sydney (Australia).

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items are made available to teachers for the construction of unit tests or term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The test items meet syllabus…

  2. Item generation and design testing of a questionnaire to assess degenerative joint disease-associated pain in cats.

    PubMed

    Zamprogno, Helia; Hansen, Bernie D; Bondell, Howard D; Sumrell, Andrea Thomson; Simpson, Wendy; Robertson, Ian D; Brown, James; Pease, Anthony P; Roe, Simon C; Hardie, Elizabeth M; Wheeler, Simon J; Lascelles, B Duncan X

    2010-12-01

    To determine the items (question topics) for a subjective instrument to assess degenerative joint disease (DJD)-associated chronic pain in cats and determine the instrument design most appropriate for use by cat owners. 100 randomly selected client-owned cats from 6 months to 20 years old. Cats were evaluated to determine degree of radiographic DJD and signs of pain throughout the skeletal system. Two groups were identified: high DJD pain and low DJD pain. Owner-answered questions about activity and signs of pain were compared between the 2 groups to define items relating to chronic DJD pain. Interviews with 45 cat owners were performed to generate items. Fifty-three cat owners who had not been involved in any other part of the study, 19 veterinarians, and 2 statisticians assessed 6 preliminary instrument designs. 22 cats were selected for each group; 19 important items were identified, resulting in 12 potential items for the instrument; and 3 additional items were identified from owner interviews. Owners and veterinarians selected a 5-point descriptive instrument design over 11-point or visual analogue scale formats. Behaviors relating to activity were substantially different between healthy cats and cats with signs of DJD-associated pain. Fifteen items were identified as being potentially useful, and the preferred instrument design was identified. This information could be used to construct an owner-based questionnaire to assess feline DJD-associated pain. Once validated, such a questionnaire would assist in evaluating potential analgesic treatments for these patients.

  3. Development of the PROMIS nicotine dependence item banks.

    PubMed

    Shadel, William G; Edelen, Maria Orlando; Tucker, Joan S; Stucky, Brian D; Hansen, Mark; Cai, Li

    2014-09-01

    Nicotine dependence is a core construct important for understanding cigarette smoking and smoking cessation behavior. This article describes analyses conducted to develop and evaluate item banks for assessing nicotine dependence among daily and nondaily smokers. Using data from a sample of daily (N = 4,201) and nondaily (N =1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning analyses (according to gender, age, and race/ethnicity) to arrive at a unidimensional set of nicotine dependence items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) to efficiently assess dependence. A total of 32 items were included in the Nicotine Dependence item banks; 22 items are common across daily and nondaily smokers, 5 are unique to daily smokers, and 5 are unique to nondaily smokers. For both daily and nondaily smokers, the Nicotine Dependence item banks are strongly unidimensional, highly reliable (reliability = 0.97 and 0.97, respectively), and perform similarly across gender, age, and race/ethnicity groups. SFs common to daily and nondaily smokers consist of 8 and 4 items (reliability = 0.91 and 0.81, respectively). Results from simulated CATs showed that dependence can be assessed with very good precision for most respondents using fewer than 6 items adaptively selected from the item banks. Nicotine dependence on cigarettes can be assessed on the basis of these item banks via one of the SFs, by using CATs, or through a tailored set of items selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  4. Evaluating the content of the communication items in the CAHPS(®) clinician and group survey and supplemental items with what high-performing physicians say they do.

    PubMed

    Quigley, Denise D; Martino, Steven C; Brown, Julie A; Hays, Ron D

    2013-01-01

    A doctor's ability to communicate effectively is key to establishing and maintaining positive doctor-patient relationships. The Consumer Assessment of Healthcare Providers and System (CAHPS(®)) Clinician and Group Survey is the standard for collecting and reporting information about patients' experiences of care in the USA. To evaluate how well CAHPS(®) Clinician and Group 2.0 core and supplemental survey items (CG-CAHPS) with a 12-month reference capture doctor-patient communication. Eleven of the 40 highest-rated physicians on the CG-CAHPS survey treating patients in a Midwest commercial health plan. Data were obtained via semi-structured interviews. Specific behaviors, practices, and opinions about doctor communication were coded and compared to the CG-CAHPS items. CG-CAHPS fully captures six of the nine behaviors most commonly mentioned by high-performing physicians: employing office staff with good people skills; involving office staff in communication with patients; spending enough time with patients; listening carefully; providing clear, simple explanations; and devising an action plan with each patient. Three physician behaviors identified as key were not captured in CG-CAHPS items: use of nonverbal communication; greeting patients and introducing oneself; and tracking personal information about patients. CG-CAHPS survey items capture many of the most commonly mentioned doctor-patient communication behaviors and practices identified by high-performing physicians. Nonverbal communication, greeting patients, and tracking personal information about patients were identified as key aspects of doctor-patient communication, but are not captured by the current CG-CAHPS. We recommend further research to assess patients' perceptions of specific verbal and nonverbal behaviors (such as leaning forward in a chair, casually asking about other family members), followed by the development of new items (if needed) that aim to capture what these specific behaviors

  5. Item Response Models for Examinee-Selected Items

    ERIC Educational Resources Information Center

    Wang, Wen-Chung; Jin, Kuan-Yu; Qiu, Xue-Lan; Wang, Lei

    2012-01-01

    In some tests, examinees are required to choose a fixed number of items from a set of given items to answer. This practice creates a challenge to standard item response models, because more capable examinees may have an advantage by making wiser choices. In this study, we developed a new class of item response models to account for the choice…

  6. Attitudes and evaluative practices: category vs. item and subjective vs. objective constructions in everyday food assessments.

    PubMed

    Wiggins, Sally; Potter, Jonathan

    2003-12-01

    In social psychology, evaluative expressions have traditionally been understood in terms of their relationship to, and as the expression of, underlying 'attitudes'. In contrast, discursive approaches have started to study evaluative expressions as part of varied social practices, considering what such expressions are doing rather than their relationship to attitudinal objects or other putative mental entities. In this study the latter approach will be used to examine the construction of food and drink evaluations in conversation. The data are taken from a corpus of family mealtimes recorded over a period of months. The aim of this study is to highlight two distinctions that are typically obscured in traditional attitude work ('subjective' vs. 'objective' expressions, category vs. item evaluations). A set of extracts is examined to document the presence of these distinctions in talk that evaluates food and the way they are used and rhetorically developed to perform particular activities (accepting/refusing food, complimenting the food provider, persuading someone to eat). The analysis suggests that researchers (a) should be aware of the potential significance of these distinctions; (b) should be cautious when treating evaluative terms as broadly equivalent and (c) should be cautious when blurring categories and instances. This analysis raises the broader question of how far evaluative practices may be specific to particular domains, and what this specificity might consist in. It is concluded that research in this area could benefit from starting to focus on the role of evaluations in practices and charting their association with specific topics and objects.

  7. Overview of Classical Test Theory and Item Response Theory for Quantitative Assessment of Items in Developing Patient-Reported Outcome Measures

    PubMed Central

    Cappelleri, Joseph C.; Lundy, J. Jason; Hays, Ron D.

    2014-01-01

    Introduction The U.S. Food and Drug Administration’s patient-reported outcome (PRO) guidance document defines content validity as “the extent to which the instrument measures the concept of interest” (FDA, 2009, p. 12). “Construct validity is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity” (Strauss & Smith, 2009, p. 7). Hence both qualitative and quantitative information are essential in evaluating the validity of measures. Methods We review classical test theory and item response theory approaches to evaluating PRO measures including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized “difficulty” (severity) order of items is represented by observed responses. Conclusion Classical test theory and item response theory can be useful in providing a quantitative assessment of items and scales during the content validity phase of patient-reported outcome measures. Depending on the particular type of measure and the specific circumstances, either one or both approaches should be considered to help maximize the content validity of PRO measures. PMID:24811753

  8. Do Self Concept Tests Test Self Concept? An Evaluation of the Validity of Items on the Piers Harris and Coopersmith Measures.

    ERIC Educational Resources Information Center

    Lynch, Mervin D.; Chaves, John

    Items from Peirs-Harris and Coopersmith self-concept tests were evaluated against independent measures on three self-constructs, idealized, empathic, and worth. Construct measurements were obtained with the semantic differential and D statistic. Ratings were obtained from 381 children, grades 4-6. For each test, item ratings and construct measures…

  9. Classical Item Analysis Using Latent Variable Modeling: A Note on a Direct Evaluation Procedure

    ERIC Educational Resources Information Center

    Raykov, Tenko; Marcoulides, George A.

    2011-01-01

    A directly applicable latent variable modeling procedure for classical item analysis is outlined. The method allows one to point and interval estimate item difficulty, item correlations, and item-total correlations for composites consisting of categorical items. The approach is readily employed in empirical research and as a by-product permits…

  10. Order information and free recall: evaluating the item-order hypothesis.

    PubMed

    Mulligan, Neil W; Lozito, Jeffrey P

    2007-05-01

    The item-order hypothesis proposes that order information plays an important role in recall from long-term memory, and it is commonly used to account for the moderating effects of experimental design in memory research. Recent research (Engelkamp, Jahn, & Seiler, 2003; McDaniel, DeLosh, & Merritt, 2000) raises questions about the assumptions underlying the item-order hypothesis. Four experiments tested these assumptions by examining the relationship between free recall and order memory for lists of varying length (8, 16, or 24 unrelated words or pictures). Some groups were given standard free-recall instructions, other groups were explicitly instructed to use order information in free recall, and other groups were given free-recall tests intermixed with tests of order memory (order reconstruction). The results for short lists were consistent with the assumptions of the item-order account. For intermediate-length lists, explicit order instructions and intermixed order tests made recall more reliant on order information, but under standard conditions, order information played little role in recall. For long lists, there was little evidence that order information contributed to recall. In sum, the assumptions of the item-order account held for short lists, received mixed support with intermediate lists, and received no support for longer lists.

  11. Evaluation of diagnostic criteria for panic attack using item response theory: findings from the National Comorbidity Survey in USA.

    PubMed

    Ietsugu, Tetsuji; Sukigara, Masune; Furukawa, Toshiaki A

    2007-12-01

    The dichotomous diagnostic systems such as the Diagnostic and Statistical Manual of Mental Disorders (DSM) and International Classification of Diseases (ICD) lose much important information concerning what each symptom can offer. This study explored the characteristics and performances of DSM-IV and ICD-10 diagnostic criteria items for panic attack using modern item response theory (IRT). The National Comorbidity Survey used the Composite International Diagnostic Interview to assess 14 DSM-IV and ICD-10 panic attack diagnostic criteria items in the general population in the USA. The dimensionality and measurement properties of these items were evaluated using dichotomous factor analysis and the two-parameter IRT model. A total of 1213 respondents reported at least one subsyndromal or syndromal panic attack in their lifetime. Factor analysis indicated that all items constitute a unidimensional construct. The two-parameter IRT model produced meaningful and interpretable results. Among items with high discrimination parameters, the difficulty parameter for "palpitation" was relatively low, while those for "choking," "fear of dying" and "paresthesia" were relatively high. Several items including "dry mouth" and "fear of losing control" had low discrimination parameters. The item characteristics of diagnostic criteria among help-seeking clinical populations may be different from those that we observed in the general population and deserve further examination. "Paresthesia," "choking" and "fear of dying" can be thought to be good indicators of severe panic attacks, while "palpitation" can discriminate well between cases and non-cases at low level of panic attack severity. Items such as "dry mouth" would contribute less to the discrimination.

  12. Enhanced Automatic Question Creator--EAQC: Concept, Development and Evaluation of an Automatic Test Item Creation Tool to Foster Modern e-Education

    ERIC Educational Resources Information Center

    Gutl, Christian; Lankmayr, Klaus; Weinhofer, Joachim; Hofler, Margit

    2011-01-01

    Research in automated creation of test items for assessment purposes became increasingly important during the recent years. Due to automatic question creation it is possible to support personalized and self-directed learning activities by preparing appropriate and individualized test items quite easily with relatively little effort or even fully…

  13. [Social anxiety and self-esteem: Hungarian validation of the "Brief Fear of Negative Evaluation Scale - Straightforward Items"].

    PubMed

    Perczel-Forintos, Dóra; Kresznerits, Szilvia

    2017-06-01

    Although social anxiety disorder (SAD) is the third most frequent emotional disorder with 13-15% prevalence rate, it remains unrecognized very often. Social phobia is associated with low self-esteem, high self-criticism and fear of negative evaluation by others. It shows high comorbidity with depression, alcoholism, drug addiction and eating disorders. To adapt the widely used "Fear of Negative Evaluation" (FNE) social phobia questionnaire. Anxiety and mood disorder patients (n = 255) completed the Fear of Negative Evaluation Scale (30, 12 and 8 item-versions) as well as social cognition, anxiety and self-esteem questionnaires. All the three versions of the FNE have strong internal validity (α>0.83) and moderate significant correlation with low self-esteem, negative social cognitions and anxiety. The short 8-item BFNE-S has the strongest disciminative value in differentiating patients with social phobia and with other emotional disorders. The Hungarian version of the BFNE-S is an effective tool for the quick recognition of social phobia. Orv Hetil. 2017; 158(22): 843-850.

  14. Exploratory Item Classification Via Spectral Graph Clustering

    PubMed Central

    Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang

    2017-01-01

    Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire. PMID:29033476

  15. Psychometric validation of the Persian nine-item Internet Gaming Disorder Scale - Short Form: Does gender and hours spent online gaming affect the interpretations of item descriptions?

    PubMed

    Wu, Tzu-Yi; Lin, Chung-Ying; Årestedt, Kristofer; Griffiths, Mark D; Broström, Anders; Pakpour, Amir H

    2017-06-01

    Background and aims The nine-item Internet Gaming Disorder Scale - Short Form (IGDS-SF9) is brief and effective to evaluate Internet Gaming Disorder (IGD) severity. Although its scores show promising psychometric properties, less is known about whether different groups of gamers interpret the items similarly. This study aimed to verify the construct validity of the Persian IGDS-SF9 and examine the scores in relation to gender and hours spent online gaming among 2,363 Iranian adolescents. Methods Confirmatory factor analysis (CFA) and Rasch analysis were used to examine the construct validity of the IGDS-SF9. The effects of gender and time spent online gaming per week were investigated by multigroup CFA and Rasch differential item functioning (DIF). Results The unidimensionality of the IGDS-SF9 was supported in both CFA and Rasch. However, Item 4 (fail to control or cease gaming activities) displayed DIF (DIF contrast = 0.55) slightly over the recommended cutoff in Rasch but was invariant in multigroup CFA across gender. Items 4 (DIF contrast = -0.67) and 9 (jeopardize or lose an important thing because of gaming activity; DIF contrast = 0.61) displayed DIF in Rasch and were non-invariant in multigroup CFA across time spent online gaming. Conclusions Given the Persian IGDS-SF9 was unidimensional, it is concluded that the instrument can be used to assess IGD severity. However, users of the instrument are cautioned concerning the comparisons of the sum scores of the IGDS-SF9 across gender and across adolescents spending different amounts of time online gaming.

  16. Item response theory detects differential item functioning between healthy and ill children in QoL measures

    PubMed Central

    Langer, Michelle M.; Hill, Cheryl D.; Thissen, David; Burwinkle, Tasha M.; Varni, James W.; DeWalt, Darren A.

    2008-01-01

    Objective To demonstrate the value of item response theory (IRT) and differential item functioning (DIF) methods in examining a health-related quality of life (HRQOL) measure in children and adolescents. Study Design and Setting This illustration uses data from 5,429 children using the four subscales of the PedsQL™ 4.0 Generic Core Scales. The IRT model-based likelihood ratio test was used to detect and evaluate DIF between healthy children and children with a chronic condition. Results DIF was detected for a majority of items but cancelled out at the total test score level due to opposing directions of DIF. Post-hoc analysis indicated that this pattern of results may be due to multidimensionality. We discuss issues in detecting and handling DIF. Conclusion This paper describes how to perform DIF analyses in validating a questionnaire to ensure that scores have equivalent meaning across subgroups. It offers insight into ways information gained through the analysis can be used to evaluate an existing scale. PMID:18226750

  17. Qualitative Evaluation of Pediatric Pain Behavior, Quality, and Intensity Item Candidates and the PROMIS Pain Domain Framework in Children With Chronic Pain.

    PubMed

    Jacobson, C Jeffrey; Kashikar-Zuck, Susmita; Farrell, Jennifer; Barnett, Kimberly; Goldschneider, Ken; Dampier, Carlton; Cunningham, Natoshia; Crosby, Lori; DeWitt, Esi Morgan

    2015-12-01

    As initial steps in a broader effort to develop and test pediatric pain behavior and pain quality item banks for the Patient-Reported Outcomes Measurement Information System (PROMIS), we used qualitative interview and item review methods to 1) evaluate the overall conceptual scope and content validity of the PROMIS pain domain framework among children with chronic/recurrent pain conditions, and 2) develop item candidates for further psychometric testing. To elicit the experiential and conceptual scope of pain outcomes across a variety of pediatric recurrent/chronic pain conditions, we conducted 32 semi-structured individual and 2 focus-group interviews with children and adolescents (8-17 years), and 32 individual and 2 focus-group interviews with parents of children with pain. Interviews with pain experts (10) explored the operational limits of pain measurement in children. For item bank development, we identified existing items from measures in the literature, grouped them by concept, removed redundancies, and modified the remaining items to match PROMIS formatting. New items were written as needed and cognitive debriefing was completed with the children and their parents, resulting in 98 pain behavior (47 self, 51 proxy), 54 quality, and 4 intensity items for further testing. Qualitative content analyses suggest that reportable pain outcomes that matter to children with pain are captured within and consistent with the pain domain framework in PROMIS. PROMIS pediatric pain behavior, quality, and intensity items were developed based on a theoretical framework of pain that was evaluated by multiple stakeholders in the measurement of pediatric pain, including researchers, clinicians, and children with pain and their parents, and the appropriateness of the framework was verified. Copyright © 2015 American Pain Society. Published by Elsevier Inc. All rights reserved.

  18. Assessment of visiting activities for young children using the UNAWE Evaluation Guide

    NASA Astrophysics Data System (ADS)

    Tomita, Akihiko

    2015-08-01

    When the target is young children and the activity type is play, the assessment of the activity is not easy. The table of domains of active learning shown in the EU Universe Awareness Programme Evaluation Guide is useful for the assessment; the Guide shows the four domains; motivation, scientific skills, universe knowledge, and intercultural attitudes, and many items of objectives in each domains. The Guide can be a basic format and the items can be modified so as to fit each activity. Taking my activity as an example, I will present an assessment using the Guide. The activity I will present is "Uchu no O-hanashi," a visiting activity which includeds slide show, story telling, and enjoying pictures on large sheets for children at nursery, kindergarten, preschool and other sites. In order to obtain the data, I have recorded the voice of children. The analysis method is a kind of qualitative one. I picked up "motivation" and "scientific skills" words from the record when they muttered about and asked each other what they felt, what they found, and what they got excited about. Among the items in the "scientific skills domain," looking at carefully, asking, exchanging opinions, interpreting or trying to interpret, and trying were frequently appeared. Other skills such as devising and confirming were not frequently appeared but they would sometimes appear later at home or at school after the activity. I also picked up the words of children obtaining scientific way of view and attitude through the activity. One example is "It seems that stars float in the sky and do not move. Do they really set like the Sun, our nearest star? I never saw stars set!" A boy was trying to make a new framework for his understanding. This kind of thinking will enrich his or her future "universe knowledge" and "intercultural attitudes."

  19. Examination of the PROMIS upper extremity item bank.

    PubMed

    Hung, Man; Voss, Maren W; Bounsanga, Jerry; Crum, Anthony B; Tyser, Andrew R

    Clinical measurement. The psychometric properties of the PROMIS v1.2 UE item bank were tested on various samples prior to its release, but have not been fully evaluated among the orthopaedic population. This study assesses the performance of the UE item bank within the UE orthopaedic patient population. The UE item bank was administered to 1197 adult patients presenting to a tertiary orthopaedic clinic specializing in hand and UE conditions and was examined using traditional statistics and Rasch analysis. The UE item bank fits a unidimensional model (outfit MNSQ range from 0.64 to 1.70) and has adequate reliabilities (person = 0.84; item = 0.82) and local independence (item residual correlations range from -0.37 to 0.34). Only one item exhibits gender differential item functioning. Most items target low levels of function. The UE item bank is a useful clinical assessment tool. Additional items covering higher functions are needed to enhance validity. Supplemental testing is recommended for patients at higher levels of function until more high function UE items are developed. 2c. Copyright © 2016 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.

  20. A Psychometric Evaluation of the Core Bereavement Items

    ERIC Educational Resources Information Center

    Holland, Jason M.; Nam, Ilsung; Neimeyer, Robert A.

    2013-01-01

    Despite being a routinely administered assessment of grieving, few studies have empirically examined the psychometric properties of the Core Bereavement Items (CBI). The present study investigated the factor structure, internal reliability, and concurrent validity of the CBI in a large, diverse sample of bereaved young adults (N = 1,366).…

  1. Item Difficulty in the Evaluation of Computer-Based Instruction: An Example from Neuroanatomy

    ERIC Educational Resources Information Center

    Chariker, Julia H.; Naaz, Farah; Pani, John R.

    2012-01-01

    This article reports large item effects in a study of computer-based learning of neuroanatomy. Outcome measures of the efficiency of learning, transfer of learning, and generalization of knowledge diverged by a wide margin across test items, with certain sets of items emerging as particularly difficult to master. In addition, the outcomes of…

  2. Examining Differential Item Functions of Different Item Ordered Test Forms According to Item Difficulty Levels

    ERIC Educational Resources Information Center

    Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem

    2016-01-01

    The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…

  3. Development of an item bank for computerized adaptive test (CAT) measurement of pain.

    PubMed

    Petersen, Morten Aa; Aaronson, Neil K; Chie, Wei-Chu; Conroy, Thierry; Costantini, Anna; Hammerlid, Eva; Hjermstad, Marianne J; Kaasa, Stein; Loge, Jon H; Velikova, Galina; Young, Teresa; Groenvold, Mogens

    2016-01-01

    Patient-reported outcomes should ideally be adapted to the individual patient while maintaining comparability of scores across patients. This is achievable using computerized adaptive testing (CAT). The aim here was to develop an item bank for CAT measurement of the pain domain as measured by the EORTC QLQ-C30 questionnaire. The development process consisted of four steps: (1) literature search, (2) formulation of new items and expert evaluations, (3) pretesting and (4) field-testing and psychometric analyses for the final selection of items. In step 1, we identified 337 pain items from the literature. Twenty-nine new items fitting the QLQ-C30 item style were formulated in step 2 that were reduced to 26 items by expert evaluations. Based on interviews with 31 patients from Denmark, France and the UK, the list was further reduced to 21 items in step 3. In phase 4, responses were obtained from 1103 cancer patients from five countries. Psychometric evaluations showed that 16 items could be retained in a unidimensional item bank. Evaluations indicated that use of the CAT measure may reduce sample size requirements with 15-25% compared to using the QLQ-C30 pain scale. We have established an item bank of 16 items suitable for CAT measurement of pain. While being backward compatible with the QLQ-C30, the new item bank will significantly improve measurement precision of pain. We recommend initiating CAT measurement by screening for pain using the two original QLQ-C30 pain items. The EORTC pain CAT is currently available for "experimental" purposes.

  4. Item response theory analysis of the mechanics baseline test

    NASA Astrophysics Data System (ADS)

    Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.

    2012-02-01

    Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.

  5. Better assessment of physical function: item improvement is neglected but essential.

    PubMed

    Bruce, Bonnie; Fries, James F; Ambrosini, Debbie; Lingala, Bharathi; Gandek, Barbara; Rose, Matthias; Ware, John E

    2009-01-01

    Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models

  6. Better assessment of physical function: item improvement is neglected but essential

    PubMed Central

    2009-01-01

    Introduction Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. Methods The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. Results We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two

  7. Mathematics Library of Test Items. Volume One.

    ERIC Educational Resources Information Center

    Fraser, Graham, Ed.

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from previous tests are made available to teachers for the construction of pretests or posttests, reference tests for inter-class comparisons and general assignments. The collection was reviewed for content…

  8. The Role of Item Models in Automatic Item Generation

    ERIC Educational Resources Information Center

    Gierl, Mark J.; Lai, Hollis

    2012-01-01

    Automatic item generation represents a relatively new but rapidly evolving research area where cognitive and psychometric theories are used to produce tests that include items generated using computer technology. Automatic item generation requires two steps. First, test development specialists create item models, which are comparable to templates…

  9. Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions.

    PubMed

    Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee

    2013-07-01

    Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.

  10. Grouping in decomposition method for multi-item capacitated lot-sizing problem with immediate lost sales and joint and item-dependent setup cost

    NASA Astrophysics Data System (ADS)

    Narenji, M.; Fatemi Ghomi, S. M. T.; Nooraie, S. V. R.

    2011-03-01

    This article examines a dynamic and discrete multi-item capacitated lot-sizing problem in a completely deterministic production or procurement environment with limited production/procurement capacity where lost sales (the loss of customer demand) are permitted. There is no inventory space capacity and the production activity incurs a fixed charge linear cost function. Similarly, the inventory holding cost and the cost of lost demand are both associated with a linear no-fixed charge function. For the sake of simplicity, a unit of each item is assumed to consume one unit of production/procurement capacity. We analyse a different version of setup costs incurred by a production or procurement activity in a given period of the planning horizon. In this version, called the joint and item-dependent setup cost, an additional item-dependent setup cost is incurred separately for each produced or ordered item on top of the joint setup cost.

  11. Geography Library of Test Items. Volume Four.

    ERIC Educational Resources Information Center

    Kouimanos, John, Ed.

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…

  12. Geography Library of Test Items. Volume Three.

    ERIC Educational Resources Information Center

    Kouimanos, John, Ed.

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…

  13. Commerce Library of Test Items. Volume One.

    ERIC Educational Resources Information Center

    Meeve, Brian, Ed.

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…

  14. Geography Library of Test Items. Volume Five.

    ERIC Educational Resources Information Center

    Kouimanos, John, Ed.

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…

  15. Commerce Library of Test Items. Volume Two.

    ERIC Educational Resources Information Center

    Meeve, Brian, Ed.

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…

  16. Geography Library of Test Items. Volume Six.

    ERIC Educational Resources Information Center

    Kouimanos, John, Ed.

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…

  17. Geography: Library of Test Items. Volume II.

    ERIC Educational Resources Information Center

    Kouimanos, John, Ed.

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…

  18. Geography Library of Test Items. Volume One.

    ERIC Educational Resources Information Center

    Kouimanos, John, Ed.

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…

  19. Item Information and Discrimination Functions for Trinary PCM Items.

    ERIC Educational Resources Information Center

    Akkermans, Wies; Muraki, Eiji

    1997-01-01

    For trinary partial credit items, the shape of the item information and item discrimination functions is examined in relation to the item parameters. Conditions under which these functions are unimodal and bimodal are discussed, and the locations and values of maxima are derived. Practical relevance of the results is discussed. (SLD)

  20. ‘Forget me (not)?’ – Remembering Forget-Items Versus Un-Cued Items in Directed Forgetting

    PubMed Central

    Zwissler, Bastian; Schindler, Sebastian; Fischer, Helena; Plewnia, Christian; Kissler, Johanna M.

    2015-01-01

    Humans need to be able to selectively control their memories. This capability is often investigated in directed forgetting (DF) paradigms. In item-method DF, individual items are presented and each is followed by either a forget- or remember-instruction. On a surprise test of all items, memory is then worse for to-be-forgotten items (TBF) compared to to-be-remembered items (TBR). This is thought to result mainly from selective rehearsal of TBR, although inhibitory mechanisms also appear to be recruited by this paradigm. Here, we investigate whether the mnemonic consequences of a forget instruction differ from the ones of incidental encoding, where items are presented without a specific memory instruction. Four experiments were conducted where un-cued items (UI) were interspersed and recognition performance was compared between TBR, TBF, and UI stimuli. Accuracy was encouraged via a performance-dependent monetary bonus. Experiments varied the number of items and their presentation speed and used either letter-cues or symbolic cues. Across all experiments, including perceptually fully counterbalanced variants, memory accuracy for TBF was reduced compared to TBR, but better than for UI. Moreover, participants made consistently fewer false alarms and used a very conservative response criterion when responding to TBF stimuli. Thus, the F-cue results in active processing and reduces false alarm rate, but this does not impair recognition memory beyond an un-cued baseline condition, where only incidental encoding occurs. Theoretical implications of these findings are discussed. PMID:26635657

  1. Validation of 5-item and 2-item questionnaires in Chinese version of Dizziness Handicap Inventory for screening objective benign paroxysmal positional vertigo.

    PubMed

    Chen, Wei; Shu, Liang; Wang, Qian; Pan, Hui; Wu, Jing; Fang, Jie; Sun, Xu-Hong; Zhai, Yu; Dong, You-Rong; Liu, Jian-Ren

    2016-08-01

    As possible candidate screening instruments for benign paroxysmal positional vertigo (BPPV), studies to validate the Dizziness Handicap Inventory (DHI) sub-scale (5-item and 2-item) and total scores are rare in China. From May 2014 to December 2014, 108(55 with and 53 without BPPV) patients complaining of episodic vertigo in the past week from a vertigo outpatient clinic were enrolled for DHI evaluation, as well as demographic and other clinical data. Objective BPPV was subsequently determined by positional evoking maneuvers under the record of optical Frenzel glasses. Cronbach's coefficient α was used to evaluate the reliability of psychometric scales. The validity of DHI total, 5-item and 2-item questionnaires to screen for BPPV was assessed by receiver operating characteristic (ROC) curves. It revealed that the DHI 5-item questionnaire had good internal consistency (Cronbach's coefficient α = 0.72). Area under the curve of total DHI, 5-item and 2-item scores for discriminating BPPV from those without was 0.678 (95 % CI 0.578-0.778), 0.873(95 % CI 0.807-0.940) and 0.895(95 % CI 0.836-0.953), respectively. It revealed 74.5 % sensitivity and 88.7 % specificity in separating BPPV and those without, with a cutoff value of 12 in the 5-item questionnaire. The corresponding rate of sensitivity and specificity was 78.2 and 88.7 %, respectively, with a cutoff value of 6 in 2-item questionnaire. The present study indicated that both 5-item and 2-item questionnaires in the Chinese version of DHI may be more valid than DHI total score for screening objective BPPV and merit further application in clinical practice in China.

  2. CTTITEM: SAS macro and SPSS syntax for classical item analysis.

    PubMed

    Lei, Pui-Wa; Wu, Qiong

    2007-08-01

    This article describes the functions of a SAS macro and an SPSS syntax that produce common statistics for conventional item analysis including Cronbach's alpha, item difficulty index (p-value or item mean), and item discrimination indices (D-index, point biserial and biserial correlations for dichotomous items and item-total correlation for polytomous items). These programs represent an improvement over the existing SAS and SPSS item analysis routines in terms of completeness and user-friendliness. To promote routine evaluations of item qualities in instrument development of any scale, the programs are available at no charge for interested users. The program codes along with a brief user's manual that contains instructions and examples are downloadable from suen.ed.psu.edu/-pwlei/plei.htm.

  3. Effects of Ignoring Item Interaction on Item Parameter Estimation and Detection of Interacting Items

    ERIC Educational Resources Information Center

    Chen, Cheng-Te; Wang, Wen-Chung

    2007-01-01

    This study explores the effects of ignoring item interaction on item parameter estimation and the efficiency of using the local dependence index Q[subscript 3] and the SAS NLMIXED procedure to detect item interaction under the three-parameter logistic model and the generalized partial credit model. Through simulations, it was found that ignoring…

  4. Restricted Interests and Teacher Presentation of Items

    ERIC Educational Resources Information Center

    Stocco, Corey S.; Thompson, Rachel H.; Rodriguez, Nicole M.

    2011-01-01

    Restricted and repetitive behavior (RRB) is more pervasive, prevalent, frequent, and severe in individuals with autism spectrum disorders (ASDs) than in their typical peers. One subtype of RRB is restricted interests in items or activities, which is evident in the manner in which individuals engage with items (e.g., repetitious wheel spinning),…

  5. Development and evaluation of CAHPS survey items assessing how well healthcare providers address health literacy.

    PubMed

    Weidmer, Beverly A; Brach, Cindy; Hays, Ron D

    2012-09-01

    The complexity of health information often exceeds patients' skills to understand and use it. To develop survey items assessing how well healthcare providers communicate health information. Domains and items for the Consumer Assessment of Healthcare Providers and Systems (CAHPS) Item Set for Addressing Health Literacy were identified through an environmental scan and input from stakeholders. The draft item set was translated into Spanish and pretested in both English and Spanish. The revised item set was field tested with a randomly selected sample of adult patients from 2 sites using mail and telephonic data collection. Item-scale correlations, confirmatory factor analysis, and internal consistency reliability estimates were estimated to assess how well the survey items performed and identify composite measures. Finally, we regressed the CAHPS global rating of the provider item on the CAHPS core communication composite and the new health literacy composites. A total of 601 completed surveys were obtained (52% response rate). Two composite measures were identified: (1) Communication to Improve Health Literacy (16 items); and (2) How Well Providers Communicate About Medicines (6 items). These 2 composites were significantly uniquely associated with the global rating of the provider (communication to improve health literacy: P<0.001, b=0.28; and communication about medicines composite: P=0.02, b=0.04). The 2 composites and the CAHPS core communication composite accounted for 51% of the variance in the global rating of the provider. A 5-item subset of the Communication to Improve Health Literacy composite accounted for 90% of the variance of the original 16-item composite. This study provides support for reliability and validity of the CAHPS Item Set for Addressing Health Literacy. These items can serve to assess whether healthcare providers have communicated effectively with their patients and as a tool for quality improvement.

  6. Effects of age on negative subsequent memory effects associated with the encoding of item and item-context information.

    PubMed

    Mattson, Julia T; Wang, Tracy H; de Chastelaine, Marianne; Rugg, Michael D

    2014-12-01

    It has consistently been reported that "negative" subsequent memory effects--lower study activity for later remembered than later forgotten items--are attenuated in older individuals. The present functional magnetic resonance imaging study investigated whether these findings extend to subsequent memory effects associated with successful encoding of item-context information. Older (n = 25) and young (n = 17) subjects were scanned while making 1 of 2 encoding judgments on a series of pictures. Memory was assessed for the study item and, for items judged old, the item's encoding task. Both memory judgments were made using confidence ratings, permitting item and source memory strength to be unconfounded and source confidence to be equated across age groups. Replicating prior findings, negative item effects in regions of the default mode network in young subjects were reversed in older subjects. Negative source effects, however, were invariant with respect to age and, in both age groups, the magnitude of the effects correlated with source memory performance. It is concluded that negative item effects do not reflect processes necessary for the successful encoding of item-context associations in older subjects. Negative source effects, in contrast, appear to reflect the engagement of processes that are equally important for successful episodic encoding in older and younger individuals. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. Few items in the thyroid-related quality of life instrument ThyPRO exhibited differential item functioning.

    PubMed

    Watt, Torquil; Groenvold, Mogens; Hegedüs, Laszlo; Bonnema, Steen Joop; Rasmussen, Åse Krogh; Feldt-Rasmussen, Ulla; Bjorner, Jakob Bue

    2014-02-01

    To evaluate the extent of differential item functioning (DIF) within the thyroid-specific quality of life patient-reported outcome measure, ThyPRO, according to sex, age, education and thyroid diagnosis. A total of 838 patients with benign thyroid diseases completed the ThyPRO questionnaire (84 five-point items, 13 scales). Uniform and nonuniform DIF were investigated using ordinal logistic regression, testing for both statistical significance and magnitude (∆R(2) > 0.02). Scale level was estimated by the sum score, after purification. Twenty instances of DIF in 17 of the 84 items were found. Eight according to diagnosis, where the goiter scale was the one most affected, possibly due to differing perceptions in patients with auto-immune thyroid diseases compared to patients with simple goiter. Eight DIFs according to age were found, of which 5 were in positively worded items, which younger patients were more likely to endorse; one according to gender: women were more likely to report crying, and three according to educational level. The vast majority of DIF had only minor influence on the scale scores (0.1-2.3 points on the 0-100 scales), but two DIF corresponded to a difference of 4.6 and 9.8, respectively. Ordinal logistic regression identified DIF in 17 of 84 items. The potential impact of this on the present scales was low, but items displaying DIF could be avoided when developing abbreviated scales, where the potential impact of DIF (due to fewer items) will be larger.

  8. A confirmative clinimetric analysis of the 36-item Family Assessment Device.

    PubMed

    Timmerby, Nina; Cosci, Fiammetta; Watson, Maggie; Csillag, Claudio; Schmitt, Florence; Steck, Barbara; Bech, Per; Thastum, Mikael

    2018-02-07

    The Family Assessment Device (FAD) is a 60-item questionnaire widely used to evaluate self-reported family functioning. However, the factor structure as well as the number of items has been questioned. A shorter and more user-friendly version of the original FAD-scale, the 36-item FAD, has therefore previously been proposed, based on findings in a nonclinical population of adults. We aimed in this study to evaluate the brief 36-item version of the FAD in a clinical population. Data from a European multinational study, examining factors associated with levels of family functioning in adult cancer patients' families, were used. Both healthy and ill parents completed the 60-item version FAD. The psychometric analyses conducted were Principal Component Analysis and Mokken-analysis. A total of 564 participants were included. Based on the psychometric analysis we confirmed that the 36-item version of the FAD has robust psychometric properties and can be used in clinical populations. The present analysis confirmed that the 36-item version of the FAD (18 items assessing 'well-being' and 18 items assessing 'dysfunctional' family function) is a brief scale where the summed total score is a valid measure of the dimensions of family functioning. This shorter version of the FAD is, in accordance with the concept of 'measurement-based care', an easy to use scale that could be considered when the aim is to evaluate self-reported family functioning.

  9. Dealing with Omitted and Not-Reached Items in Competence Tests: Evaluating Approaches Accounting for Missing Responses in Item Response Theory Models

    ERIC Educational Resources Information Center

    Pohl, Steffi; Gräfe, Linda; Rose, Norman

    2014-01-01

    Data from competence tests usually show a number of missing responses on test items due to both omitted and not-reached items. Different approaches for dealing with missing responses exist, and there are no clear guidelines on which of those to use. While classical approaches rely on an ignorable missing data mechanism, the most recently developed…

  10. Psychometric validation of the Persian nine-item Internet Gaming Disorder Scale – Short Form: Does gender and hours spent online gaming affect the interpretations of item descriptions?

    PubMed Central

    Wu, Tzu-Yi; Lin, Chung-Ying; Årestedt, Kristofer; Griffiths, Mark D.; Broström, Anders; Pakpour, Amir H.

    2017-01-01

    Background and aims The nine-item Internet Gaming Disorder Scale – Short Form (IGDS-SF9) is brief and effective to evaluate Internet Gaming Disorder (IGD) severity. Although its scores show promising psychometric properties, less is known about whether different groups of gamers interpret the items similarly. This study aimed to verify the construct validity of the Persian IGDS-SF9 and examine the scores in relation to gender and hours spent online gaming among 2,363 Iranian adolescents. Methods Confirmatory factor analysis (CFA) and Rasch analysis were used to examine the construct validity of the IGDS-SF9. The effects of gender and time spent online gaming per week were investigated by multigroup CFA and Rasch differential item functioning (DIF). Results The unidimensionality of the IGDS-SF9 was supported in both CFA and Rasch. However, Item 4 (fail to control or cease gaming activities) displayed DIF (DIF contrast = 0.55) slightly over the recommended cutoff in Rasch but was invariant in multigroup CFA across gender. Items 4 (DIF contrast = −0.67) and 9 (jeopardize or lose an important thing because of gaming activity; DIF contrast = 0.61) displayed DIF in Rasch and were non-invariant in multigroup CFA across time spent online gaming. Conclusions Given the Persian IGDS-SF9 was unidimensional, it is concluded that the instrument can be used to assess IGD severity. However, users of the instrument are cautioned concerning the comparisons of the sum scores of the IGDS-SF9 across gender and across adolescents spending different amounts of time online gaming. PMID:28571474

  11. ASCAL: A Microcomputer Program for Estimating Logistic IRT Item Parameters.

    ERIC Educational Resources Information Center

    Vale, C. David; Gialluca, Kathleen A.

    ASCAL is a microcomputer-based program for calibrating items according to the three-parameter logistic model of item response theory. It uses a modified multivariate Newton-Raphson procedure for estimating item parameters. This study evaluated this procedure using Monte Carlo Simulation Techniques. The current version of ASCAL was then compared to…

  12. Calorie changes in chain restaurant menu items: implications for obesity and evaluations of menu labeling.

    PubMed

    Bleich, Sara N; Wolfson, Julia A; Jarlenski, Marian P

    2015-01-01

    Supply-side reductions to the calories in chain restaurants are a possible benefit of upcoming menu labeling requirements. To describe trends in calories available in large U.S. restaurants. Data were obtained from the MenuStat project, a census of menu items in 66 of the 100 largest U.S. restaurant chains, for 2012 and 2013 (N=19,417 items). Generalized linear models were used to calculate (1) the mean change in calories from 2012 to 2013, among items on the menu in both years; and (2) the difference in mean calories, comparing newly introduced items to those on the menu in 2012 only (overall and between core versus non-core items). Data were analyzed in 2014. Mean calories among items on menus in both 2012 and 2013 did not change. Large restaurant chains in the U.S. have recently had overall declines in calories in newly introduced menu items (-56 calories, 12% decline). These declines were concentrated mainly in new main course items (-67 calories, 10% decline). New beverage (-26 calories, 8% decline) and children's (-46 calories, 20% decline) items also had fewer mean calories. Among chain restaurants with a specific focus (e.g., burgers), average calories in new menu items not core to the business declined more than calories in core menu items. Large chain restaurants significantly reduced the number of calories in newly introduced menu items. Supply-side changes to the calories in chain restaurants may have a significant impact on obesity prevention. Copyright © 2015 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.

  13. Calorie Changes in Chain Restaurant Menu Items: Implications for Obesity and Evaluations of Menu Labeling

    PubMed Central

    Bleich, Sara N.; Wolfson, Julia A.; Jarlenski, Marian P.

    2014-01-01

    Background Supply-side reductions to the calories in chain restaurants are a possible benefit of upcoming menu labeling requirements. Purpose To describe trends in calories available in large U.S. restaurants. Methods Data were obtained from the MenuStat project, a census of menu items in 66 of the 100 largest U.S. restaurant chains, for 2012 and 2013 (N=19,417 items). Generalized linear models were used to calculate: (1) the mean change in calories from 2012 to 2013, among items on the menu in both years; and (2) the difference in mean calories, comparing newly introduced items to those on the menu in 2012 only (overall and between core versus non-core items). Data were analyzed in 2014. Results Mean calories among items on menus in both 2012 and 2013 did not change. Large restaurant chains in the U.S. have recently had overall declines in calories in newly introduced menu items (−56 calories, 12% decline). These declines were concentrated mainly in new main course items (−67 calories, 10% decline). New beverage (−26 calories, 8% decline) and children’s (−46 calories, 20% decline) items also had fewer mean calories. Among chain restaurants with a specific focus (e.g., burgers), average calories in new menu items not core to the business declined more than calories in core menu items. Conclusions Large chain restaurants significantly reduced the number of calories in newly introduced menu items. Supply-side changes to the calories in chain restaurants may have a significant impact on obesity prevention. PMID:25306397

  14. Evaluating Differential Item Functioning in the English General Practice Patient Survey: Comparison of South Asian and White British Subgroups.

    PubMed

    Setodji, Claude M; Elliott, Marc N; Abel, Gary; Burt, Jenni; Roland, Martin; Campbell, John

    2015-09-01

    To evaluate two 5-item patient experience scales from the English General Practice (GP) Patient Survey for evidence of differential item functioning (DIF) given prior evidence of substantially worse reported health care experiences for South Asian compared with white British respondents. A national survey of English patients' primary care experiences. We used classic test and item response theory analysis to examine the possibility of DIF by patient ethnicity (South Asian, white British) after controlling for age, sex, health status, and quality of life in the English GP Patient Survey conducted in 2011/2012. Data were available for 873,051 respondents (818,219 white British/54,832 South Asian from 7795 English practices) who answered items relating to experiences of GP or nurses' care. Internal consistency reliability was high and similar for South Asian and white British patients. White British patients reported better average experiences than South Asians, but there was no evidence of DIF or different item response curves for white British and South Asian respondents, even in sensitivity analyses using matched samples. All communication items in the English GP Patient Survey showed similar South Asian versus white British differences, with no evidence of DIF. In contrast, differences due to scale use or expectations are typically variable rather than constant across scales. While other possibilities remain, these findings increase the likelihood that the observed negative responses of South Asian patients to this national survey reflect true differences in their experiences of care.

  15. Development of a PROMIS item bank to measure pain interference.

    PubMed

    Amtmann, Dagmar; Cook, Karon F; Jensen, Mark P; Chen, Wen-Hung; Choi, Seung; Revicki, Dennis; Cella, David; Rothrock, Nan; Keefe, Francis; Callahan, Leigh; Lai, Jin-Shei

    2010-07-01

    This paper describes the psychometric properties of the PROMIS-pain interference (PROMIS-PI) bank. An initial candidate item pool (n=644) was developed and evaluated based on the review of existing instruments, interviews with patients, and consultation with pain experts. From this pool, a candidate item bank of 56 items was selected and responses to the items were collected from large community and clinical samples. A total of 14,848 participants responded to all or a subset of candidate items. The responses were calibrated using an item response theory (IRT) model. A final 41-item bank was evaluated with respect to IRT assumptions, model fit, differential item function (DIF), precision, and construct and concurrent validity. Items of the revised bank had good fit to the IRT model (CFI and NNFI/TLI ranged from 0.974 to 0.997), and the data were strongly unidimensional (e.g., ratio of first and second eigenvalue=35). Nine items exhibited statistically significant DIF. However, adjusting for DIF had little practical impact on score estimates and the items were retained without modifying scoring. Scores provided substantial information across levels of pain; for scores in the T-score range 50-80, the reliability was equivalent to 0.96-0.99. Patterns of correlations with other health outcomes supported the construct validity of the item bank. The scores discriminated among persons with different numbers of chronic conditions, disabling conditions, levels of self-reported health, and pain intensity (p<0.0001). The results indicated that the PROMIS-PI items constitute a psychometrically sound bank. Computerized adaptive testing and short forms are available. Copyright 2010 International Association for the Study of Pain. All rights reserved.

  16. [Development and evaluation of an educational program for promotion of healthy nutrition and physical activity by health volunteers].

    PubMed

    Yamaguchi, Yukio; Kai, Yuko; Kumamoto, Hiroko

    2009-12-01

    The purpose of the present trial was to develop and evaluate an educational program for promotion of healthy nutrition and physical activity by health volunteers. The educational program consisted of the following four phases: preliminary self-learning by mail (3 weeks), basic learning (3 sessions of 3 hours), practice of planned activities (2 months), and a report session (1 session of 3 hours). Beginner volunteers (n=18, mean age 63.3 +/- 6.4) were recruited from two volunteer health organizations in Kurume city. They then participated in a program that taught basic health knowledge regarding nutrition and physical activity, how to plan effective support activities, and methods for self-evaluation. In the preliminary self-learning phase, an assessment sheet, health information, and homework (goal setting, etc.) were delivered to the volunteers by mail. In the basic learning phase, volunteers attended a 3 day seminar on essential principles for behavioral change and assessment methods for volunteer activity. In addition, effective support activities were planned through group discussion. After a 2-month practice of support activities, each group reported and discussed the results of their activity in a 3-hour report session. Main outcome measures were health knowledge (15 items, 0-1 points), self-efficacy for life style support (5 items, 0-100%), and evaluation of the educational program (9 items, 1-5 points). All measures were self-administered. Significant increases in rate of true answers for health knowledge were observed during the preliminary self-learning and before basic learning phases (54.8% --> 67.1%, P < 0.05), and before and after basic learning phases (67.1% --> 87.6%, P < 0.05). Self-efficacy for life style support were significantly higher after the report session than before the preliminary self-learning phase (35.1% --> 53.1%, P < 0.05). In the two-month practice, all groups received feedback through questionnaires completed by participants who

  17. A psychometric evaluation of the four-item version of the Control Attitudes Scale for patients with cardiac disease and their partners.

    PubMed

    Årestedt, Kristofer; Ågren, Susanna; Flemme, Inger; Moser, Debra K; Strömberg, Anna

    2015-08-01

    The four-item Control Attitudes Scale (CAS) was developed to measure control perceived by patients with cardiac disease and their family members, but extensive psychometric evaluation has not been performed. The aim was to translate, culturally adapt and psychometrically evaluate the CAS in a Swedish sample of implantable cardioverter defibrillator (ICD) recipients, heart failure (HF) patients and their partners. A sample (n=391) of ICD recipients, HF patients and partners were used. Descriptive statistics, item-total and inter-item correlations, exploratory factor analysis, ordinal regression modelling and Cronbach's alpha were used to validate the CAS. The findings from the factor analyses revealed that the CAS is a multidimensional scale including two factors, Control and Helplessness. The internal consistency was satisfactory for all scales (α=0.74-0.85), except the family version total scale (α=0.62). No differential item functioning was detected which implies that the CAS can be used to make invariant comparisons between groups of different age and sex. The psychometric properties, together with the simple and short format of the CAS, make it to a useful tool for measuring perceived control among patients with cardiac diseases and their family members. When using the CAS, subscale scores should be preferred. © The European Society of Cardiology 2014.

  18. Inconsistency in the items included in tools used in general health research and physical therapy to evaluate the methodological quality of randomized controlled trials: a descriptive analysis

    PubMed Central

    2013-01-01

    Background Assessing the risk of bias of randomized controlled trials (RCTs) is crucial to understand how biases affect treatment effect estimates. A number of tools have been developed to evaluate risk of bias of RCTs; however, it is unknown how these tools compare to each other in the items included. The main objective of this study was to describe which individual items are included in RCT quality tools used in general health and physical therapy (PT) research, and how these items compare to those of the Cochrane Risk of Bias (RoB) tool. Methods We used comprehensive literature searches and a systematic approach to identify tools that evaluated the methodological quality or risk of bias of RCTs in general health and PT research. We extracted individual items from all quality tools. We calculated the frequency of quality items used across tools and compared them to those in the RoB tool. Comparisons were made between general health and PT quality tools using Chi-squared tests. Results In addition to the RoB tool, 26 quality tools were identified, with 19 being used in general health and seven in PT research. The total number of quality items included in general health research tools was 130, compared with 48 items across PT tools and seven items in the RoB tool. The most frequently included items in general health research tools (14/19, 74%) were inclusion and exclusion criteria, and appropriate statistical analysis. In contrast, the most frequent items included in PT tools (86%, 6/7) were: baseline comparability, blinding of investigator/assessor, and use of intention-to-treat analysis. Key items of the RoB tool (sequence generation and allocation concealment) were included in 71% (5/7) of PT tools, and 63% (12/19) and 37% (7/19) of general health research tools, respectively. Conclusions There is extensive item variation across tools that evaluate the risk of bias of RCTs in health research. Results call for an in-depth analysis of items that should be used to

  19. Inconsistency in the items included in tools used in general health research and physical therapy to evaluate the methodological quality of randomized controlled trials: a descriptive analysis.

    PubMed

    Armijo-Olivo, Susan; Fuentes, Jorge; Ospina, Maria; Saltaji, Humam; Hartling, Lisa

    2013-09-17

    Assessing the risk of bias of randomized controlled trials (RCTs) is crucial to understand how biases affect treatment effect estimates. A number of tools have been developed to evaluate risk of bias of RCTs; however, it is unknown how these tools compare to each other in the items included. The main objective of this study was to describe which individual items are included in RCT quality tools used in general health and physical therapy (PT) research, and how these items compare to those of the Cochrane Risk of Bias (RoB) tool. We used comprehensive literature searches and a systematic approach to identify tools that evaluated the methodological quality or risk of bias of RCTs in general health and PT research. We extracted individual items from all quality tools. We calculated the frequency of quality items used across tools and compared them to those in the RoB tool. Comparisons were made between general health and PT quality tools using Chi-squared tests. In addition to the RoB tool, 26 quality tools were identified, with 19 being used in general health and seven in PT research. The total number of quality items included in general health research tools was 130, compared with 48 items across PT tools and seven items in the RoB tool. The most frequently included items in general health research tools (14/19, 74%) were inclusion and exclusion criteria, and appropriate statistical analysis. In contrast, the most frequent items included in PT tools (86%, 6/7) were: baseline comparability, blinding of investigator/assessor, and use of intention-to-treat analysis. Key items of the RoB tool (sequence generation and allocation concealment) were included in 71% (5/7) of PT tools, and 63% (12/19) and 37% (7/19) of general health research tools, respectively. There is extensive item variation across tools that evaluate the risk of bias of RCTs in health research. Results call for an in-depth analysis of items that should be used to assess risk of bias of RCTs. Further

  20. Development of the PROMIS health expectancies of smoking item banks.

    PubMed

    Edelen, Maria Orlando; Tucker, Joan S; Shadel, William G; Stucky, Brian D; Cerully, Jennifer; Li, Zhen; Hansen, Mark; Cai, Li

    2014-09-01

    Smokers' health-related outcome expectancies are associated with a number of important constructs in smoking research, yet there are no measures currently available that focus exclusively on this domain. This paper describes the development and evaluation of item banks for assessing the health expectancies of smoking. Using data from a sample of daily (N = 4,201) and nondaily (N = 1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning analyses (according to gender, age, and race/ethnicity) to arrive at a unidimensional set of health expectancies items for daily and nondaily smokers. We also evaluated the performance of short forms (SFs) and computer adaptive tests (CATs) to efficiently assess health expectancies. A total of 24 items were included in the Health Expectancies item banks; 13 items are common across daily and nondaily smokers, 6 are unique to daily, and 5 are unique to nondaily. For both daily and nondaily smokers, the Health Expectancies item banks are unidimensional, reliable (reliability = 0.95 and 0.96, respectively), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.87). Results from simulated CATs showed that health expectancies can be assessed with good precision with an average of 5-6 items adaptively selected from the item banks. Health expectancies of smoking can be assessed on the basis of these item banks via SFs, CATs, or through a tailored set of items selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  1. Development of the PROMIS coping expectancies of smoking item banks.

    PubMed

    Shadel, William G; Edelen, Maria Orlando; Tucker, Joan S; Stucky, Brian D; Hansen, Mark; Cai, Li

    2014-09-01

    Smoking is a coping strategy for many smokers who then have difficulty finding new ways to cope with negative affect when they quit. This paper describes analyses conducted to develop and evaluate item banks for assessing the coping expectancies of smoking for daily and nondaily smokers. Using data from a large sample of daily (N = 4,201) and nondaily (N = 1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning (DIF) analyses (according to gender, age, and ethnicity) to arrive at a unidimensional set of items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) for assessing coping expectancies of smoking. For both daily and nondaily smokers, the unidimensional Coping Expectancies item banks (21 items) are relatively DIF free and are highly reliable (0.96 and 0.97, respectively). A common 4-item SF for daily and nondaily smokers also showed good reliability (0.85). Adaptive tests required an average of 4.3 and 3.7 items for simulated daily and nondaily respondents, respectively, and achieved reliabilities of 0.91 for both when the maximum test length was 10 items. This research provides a new set of items that can be used to reliably assess coping expectancies of smoking, through a SF, CAT, or a tailored set selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  2. Evaluating the Wald Test for Item-Level Comparison of Saturated and Reduced Models in Cognitive Diagnosis

    ERIC Educational Resources Information Center

    de la Torre, Jimmy; Lee, Young-Sun

    2013-01-01

    This article used the Wald test to evaluate the item-level fit of a saturated cognitive diagnosis model (CDM) relative to the fits of the reduced models it subsumes. A simulation study was carried out to examine the Type I error and power of the Wald test in the context of the G-DINA model. Results show that when the sample size is small and a…

  3. Identifying Differential Item Functioning in Multi-Stage Computer Adaptive Testing

    ERIC Educational Resources Information Center

    Gierl, Mark J.; Lai, Hollis; Li, Johnson

    2013-01-01

    The purpose of this study is to evaluate the performance of CATSIB (Computer Adaptive Testing-Simultaneous Item Bias Test) for detecting differential item functioning (DIF) when items in the matching and studied subtest are administered adaptively in the context of a realistic multi-stage adaptive test (MST). MST was simulated using a 4-item…

  4. 41 CFR 101-30.302 - Types of items excluded from cataloging.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... Catalog System except when an agency determines that Federal item identification data will be of value in...-FEDERAL CATALOG SYSTEM 30.3-Cataloging Items of Supply § 101-30.302 Types of items excluded from...) Items procured in foreign markets for use in overseas activities of Federal agencies. (e) Printed forms...

  5. 41 CFR 101-30.302 - Types of items excluded from cataloging.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... Catalog System except when an agency determines that Federal item identification data will be of value in...-FEDERAL CATALOG SYSTEM 30.3-Cataloging Items of Supply § 101-30.302 Types of items excluded from...) Items procured in foreign markets for use in overseas activities of Federal agencies. (e) Printed forms...

  6. 41 CFR 101-30.302 - Types of items excluded from cataloging.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... Catalog System except when an agency determines that Federal item identification data will be of value in...-FEDERAL CATALOG SYSTEM 30.3-Cataloging Items of Supply § 101-30.302 Types of items excluded from...) Items procured in foreign markets for use in overseas activities of Federal agencies. (e) Printed forms...

  7. 41 CFR 101-30.302 - Types of items excluded from cataloging.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... Catalog System except when an agency determines that Federal item identification data will be of value in...-FEDERAL CATALOG SYSTEM 30.3-Cataloging Items of Supply § 101-30.302 Types of items excluded from...) Items procured in foreign markets for use in overseas activities of Federal agencies. (e) Printed forms...

  8. 41 CFR 101-30.302 - Types of items excluded from cataloging.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... Catalog System except when an agency determines that Federal item identification data will be of value in...-FEDERAL CATALOG SYSTEM 30.3-Cataloging Items of Supply § 101-30.302 Types of items excluded from...) Items procured in foreign markets for use in overseas activities of Federal agencies. (e) Printed forms...

  9. Assessment of item-writing flaws in multiple-choice questions.

    PubMed

    Nedeau-Cayo, Rosemarie; Laughlin, Deborah; Rus, Linda; Hall, John

    2013-01-01

    This study evaluated the quality of multiple-choice questions used in a hospital's e-learning system. Constructing well-written questions is fraught with difficulty, and item-writing flaws are common. Study results revealed that most items contained flaws and were written at the knowledge/comprehension level. Few items had linked objectives, and no association was found between the presence of objectives and flaws. Recommendations include education for writing test questions.

  10. 48 CFR 252.209-7010 - Critical Safety Items.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... personal injury or loss of life; or (iii) An uncommanded engine shutdown that jeopardizes safety. Design... personal injury or loss of life. (b) Identification of critical safety items. One or more of the items... control activity: (Insert additional lines as necessary) (c) Heightened quality assurance surveillance...

  11. 48 CFR 252.209-7010 - Critical Safety Items.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... personal injury or loss of life; or (iii) An uncommanded engine shutdown that jeopardizes safety. Design... personal injury or loss of life. (b) Identification of critical safety items. One or more of the items... control activity: (Insert additional lines as necessary) (c) Heightened quality assurance surveillance...

  12. 48 CFR 252.209-7010 - Critical Safety Items.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... personal injury or loss of life; or (iii) An uncommanded engine shutdown that jeopardizes safety. Design... personal injury or loss of life. (b) Identification of critical safety items. One or more of the items... control activity: (Insert additional lines as necessary) (c) Heightened quality assurance surveillance...

  13. 48 CFR 252.209-7010 - Critical Safety Items.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... personal injury or loss of life; or (iii) An uncommanded engine shutdown that jeopardizes safety. Design... personal injury or loss of life. (b) Identification of critical safety items. One or more of the items... control activity: (Insert additional lines as necessary) (c) Heightened quality assurance surveillance...

  14. Vegetable parenting practices scale. Item response modeling analyses

    PubMed Central

    Chen, Tzu-An; O’Connor, Teresia; Hughes, Sheryl; Beltran, Alicia; Baranowski, Janice; Diep, Cassandra; Baranowski, Tom

    2015-01-01

    Objective To evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We also tested for differences in the ways item function (called differential item functioning) across child’s gender, ethnicity, age, and household income groups. Method Parents of 3–5 year old children completed a self-reported vegetable parenting practices scale online. Vegetable parenting practices consisted of 14 effective vegetable parenting practices and 12 ineffective vegetable parenting practices items, each with three subscales (responsiveness, structure, and control). Multidimensional polytomous item response modeling was conducted separately on effective vegetable parenting practices and ineffective vegetable parenting practices. Results One effective vegetable parenting practice item did not fit the model well in the full sample or across demographic groups, and another was a misfit in differential item functioning analyses across child’s gender. Significant differential item functioning was detected across children’s age and ethnicity groups, and more among effective vegetable parenting practices than ineffective vegetable parenting practices items. Wright maps showed items only covered parts of the latent trait distribution. The harder- and easier-to-respond ends of the construct were not covered by items for effective vegetable parenting practices and ineffective vegetable parenting practices, respectively. Conclusions Several effective vegetable parenting practices and ineffective vegetable parenting practices scale items functioned differently on the basis of child’s demographic characteristics; therefore, researchers should use these vegetable parenting practices scales with caution. Item response modeling should be incorporated in analyses of parenting

  15. Effects of Delay and Number of Related List Items on Implicit Activation for DRM Critical Items in a Speeded Naming Task

    ERIC Educational Resources Information Center

    Meade, Michelle L.; Hutchison, Keith A.; Rand, Kristina M.

    2010-01-01

    Two experiments examined decay and additivity of semantic priming produced by DRM false memory lists on a naming task. Subjects were presented with study lists containing 14 DRM items that were either all 14 related, the first 7 related, the second 7 related, or all 14 unrelated to the non-presented critical item. Priming was measured on a naming…

  16. Development of a simple 12-item theory-based instrument to assess the impact of continuing professional development on clinical behavioral intentions.

    PubMed

    Légaré, France; Borduas, Francine; Freitas, Adriana; Jacques, André; Godin, Gaston; Luconi, Francesca; Grimshaw, Jeremy

    2014-01-01

    Decision-makers in organizations providing continuing professional development (CPD) have identified the need for routine assessment of its impact on practice. We sought to develop a theory-based instrument for evaluating the impact of CPD activities on health professionals' clinical behavioral intentions. Our multipronged study had four phases. 1) We systematically reviewed the literature for instruments that used socio-cognitive theories to assess healthcare professionals' clinically-oriented behavioral intentions and/or behaviors; we extracted items relating to the theoretical constructs of an integrated model of healthcare professionals' behaviors and removed duplicates. 2) A committee of researchers and CPD decision-makers selected a pool of items relevant to CPD. 3) An international group of experts (n = 70) reached consensus on the most relevant items using electronic Delphi surveys. 4) We created a preliminary instrument with the items found most relevant and assessed its factorial validity, internal consistency and reliability (weighted kappa) over a two-week period among 138 physicians attending a CPD activity. Out of 72 potentially relevant instruments, 47 were analyzed. Of the 1218 items extracted from these, 16% were discarded as improperly phrased and 70% discarded as duplicates. Mapping the remaining items onto the constructs of the integrated model of healthcare professionals' behaviors yielded a minimum of 18 and a maximum of 275 items per construct. The partnership committee retained 61 items covering all seven constructs. Two iterations of the Delphi process produced consensus on a provisional 40-item questionnaire. Exploratory factorial analysis following test-retest resulted in a 12-item questionnaire. Cronbach's coefficients for the constructs varied from 0.77 to 0.85. A 12-item theory-based instrument for assessing the impact of CPD activities on health professionals' clinical behavioral intentions showed adequate validity and reliability

  17. Qualitative Evaluation of Pediatric Pain-Behavior, -Quality and -Intensity Item Candidates and the PROMIS Pain Domain Framework in Children with Chronic Pain

    PubMed Central

    Jacobson, C. Jeffrey; Kashikar-Zuck, Susmita; Farrell, Jennifer; Barnett, Kimberly; Goldschneider, Ken; Dampier, Carlton; Cunningham, Natoshia; Crosby, Lori; DeWitt, Esi Morgan

    2015-01-01

    As initial steps in a broader effort to develop and test pediatric Pain Behavior and Pain Quality item banks for the Patient Reported Outcomes Measurement Information System (PROMIS®), we employed qualitative interview and item review methods to 1) evaluate the overall conceptual scope and content validity of the PROMIS pain domain framework among children with chronic /recurrent pain conditions, and 2) develop item candidates for further psychometric testing. To elicit the experiential and conceptual scope of pain outcomes across a variety of pediatric recurrent/chronic pain conditions, we conducted semi-structured individual (32) and focus-group interviews (2) with children and adolescents (8–17 years), and parents of children with pain (individual (32) and focus group (2)). Interviews with pain experts (10) explored the operational limits of pain measurement in children. For item bank development, we identified existing items from measures in the literature, grouped them by concept, removed redundancies, and modified remaining items to match PROMIS formatting. New items were written as needed and cognitive debriefing was completed with children and their parents, resulting in 98 Pain Behavior (47 self, 51 proxy), 54 Quality and 4 Intensity items for further testing. Qualitative content analyses suggest that reportable pain outcomes that matter to children with pain are captured within and consistent with the pain domain framework in PROMIS. PMID:26335990

  18. Development of a Postacute Hospital Item Bank for the New Pediatric Evaluation of Disability Inventory-Computer Adaptive Test

    ERIC Educational Resources Information Center

    Dumas, Helene M.

    2010-01-01

    The PEDI-CAT is a new computer adaptive test (CAT) version of the Pediatric Evaluation of Disability Inventory (PEDI). Additional PEDI-CAT items specific to postacute pediatric hospital care were recently developed using expert reviews and cognitive interviewing techniques. Expert reviews established face and construct validity, providing positive…

  19. An evaluation of computerized adaptive testing for general psychological distress: combining GHQ-12 and Affectometer-2 in an item bank for public mental health research.

    PubMed

    Stochl, Jan; Böhnke, Jan R; Pickett, Kate E; Croudace, Tim J

    2016-05-20

    Recent developments in psychometric modeling and technology allow pooling well-validated items from existing instruments into larger item banks and their deployment through methods of computerized adaptive testing (CAT). Use of item response theory-based bifactor methods and integrative data analysis overcomes barriers in cross-instrument comparison. This paper presents the joint calibration of an item bank for researchers keen to investigate population variations in general psychological distress (GPD). Multidimensional item response theory was used on existing health survey data from the Scottish Health Education Population Survey (n = 766) to calibrate an item bank consisting of pooled items from the short common mental disorder screen (GHQ-12) and the Affectometer-2 (a measure of "general happiness"). Computer simulation was used to evaluate usefulness and efficacy of its adaptive administration. A bifactor model capturing variation across a continuum of population distress (while controlling for artefacts due to item wording) was supported. The numbers of items for different required reliabilities in adaptive administration demonstrated promising efficacy of the proposed item bank. Psychometric modeling of the common dimension captured by more than one instrument offers the potential of adaptive testing for GPD using individually sequenced combinations of existing survey items. The potential for linking other item sets with alternative candidate measures of positive mental health is discussed since an optimal item bank may require even more items than these.

  20. Influence of inter-item symmetry in visual search.

    PubMed

    Roggeveen, Alexa B; Kingstone, Alan; Enns, James T

    2004-01-01

    Does visual search involve a serial inspection of individual items (Feature Integration Theory) or are items grouped and segregated prior to their consideration as a possible target (Attentional Engagement Theory)? For search items defined by motion and shape there is strong support for prior grouping (Kingstone and Bischof, 1999). The present study tested for grouping based on inter-item shape symmetry. Results showed that target-distractor symmetry strongly influenced search whereas distractor-distractor symmetry influenced search more weakly. This indicates that static shapes are evaluated for similarity to one another prior to their explicit identification as 'target' or 'distractor'. Possible reasons for the unequal contributions of target-distractor and distractor-distractor relations are discussed.

  1. Calibration of the Dutch-Flemish PROMIS Pain Behavior item bank in patients with chronic pain.

    PubMed

    Crins, M H P; Roorda, L D; Smits, N; de Vet, H C W; Westhovens, R; Cella, D; Cook, K F; Revicki, D; van Leeuwen, J; Boers, M; Dekker, J; Terwee, C B

    2016-02-01

    The aims of the current study were to calibrate the item parameters of the Dutch-Flemish PROMIS Pain Behavior item bank using a sample of Dutch patients with chronic pain and to evaluate cross-cultural validity between the Dutch-Flemish and the US PROMIS Pain Behavior item banks. Furthermore, reliability and construct validity of the Dutch-Flemish PROMIS Pain Behavior item bank were evaluated. The 39 items in the bank were completed by 1042 Dutch patients with chronic pain. To evaluate unidimensionality, a one-factor confirmatory factor analysis (CFA) was performed. A graded response model (GRM) was used to calibrate the items. To evaluate cross-cultural validity, Differential item functioning (DIF) for language (Dutch vs. English) was evaluated. Reliability of the item bank was also examined and construct validity was studied using several legacy instruments, e.g. the Roland Morris Disability Questionnaire. CFA supported the unidimensionality of the Dutch-Flemish PROMIS Pain Behavior item bank (CFI = 0.960, TLI = 0.958), the data also fit the GRM, and demonstrated good coverage across the pain behavior construct (threshold parameters range: -3.42 to 3.54). Analysis showed good cross-cultural validity (only six DIF items), reliability (Cronbach's α = 0.95) and construct validity (all correlations ≥0.53). The Dutch-Flemish PROMIS Pain Behavior item bank was found to have good cross-cultural validity, reliability and construct validity. The development of the Dutch-Flemish PROMIS Pain Behavior item bank will serve as the basis for Dutch-Flemish PROMIS short forms and computer adaptive testing (CAT). © 2015 European Pain Federation - EFIC®

  2. An NCME Instructional Module on Item-Fit Statistics for Item Response Theory Models

    ERIC Educational Resources Information Center

    Ames, Allison J.; Penfield, Randall D.

    2015-01-01

    Drawing valid inferences from item response theory (IRT) models is contingent upon a good fit of the data to the model. Violations of model-data fit have numerous consequences, limiting the usefulness and applicability of the model. This instructional module provides an overview of methods used for evaluating the fit of IRT models. Upon completing…

  3. Home Science Library of Test Items. Volume One.

    ERIC Educational Resources Information Center

    Smith, Jan, Ed.

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection is reviewed for content validity and reliability. The test…

  4. Development of the PROMIS negative psychosocial expectancies of smoking item banks.

    PubMed

    Stucky, Brian D; Edelen, Maria Orlando; Tucker, Joan S; Shadel, William G; Cerully, Jennifer; Kuhfeld, Megan; Hansen, Mark; Cai, Li

    2014-09-01

    Negative psychosocial expectancies of smoking include aspects of social disapproval and disappointment in oneself. This paper describes analyses conducted to develop and evaluate item banks for assessing psychosocial expectancies among daily and nondaily smokers. Using data from a sample of daily (N = 4,201) and nondaily (N =1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning analyses (according to gender, age, and race/ethnicity) to arrive at a unidimensional set of psychosocial expectancies items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) to efficiently assess psychosocial expectancies. A total of 21 items were included in the Psychosocial Expectancies item banks: 14 items are common across daily and nondaily smokers, 6 are unique to daily, and 1 is unique to nondaily. For both daily and nondaily smokers, the Psychosocial Expectancies item banks are strongly unidimensional, highly reliable (reliability = 0.95 and 0.93, respectively), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.85). Results from simulated CATs showed that, on average, fewer than 8 items are needed to assess psychosocial expectancies with adequate precision when using the item banks. Psychosocial expectancies of smoking can be assessed on the basis of these item banks via the SF, by using CAT, or through a tailored set of items selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  5. Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

    ERIC Educational Resources Information Center

    He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei

    2013-01-01

    Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…

  6. Use of Jackknifing to Evaluate Effects of Anchor Item Selection on Equating with the Nonequivalent Groups with Anchor Test (NEAT) Design. Research Report. ETS RR-15-10

    ERIC Educational Resources Information Center

    Lu, Ru; Haberman, Shelby; Guo, Hongwen; Liu, Jinghua

    2015-01-01

    In this study, we apply jackknifing to anchor items to evaluate the impact of anchor selection on equating stability. In an ideal world, the choice of anchor items should have little impact on equating results. When this ideal does not correspond to reality, selection of anchor items can strongly influence equating results. This influence does not…

  7. IRT Item Parameter Scaling for Developing New Item Pools

    ERIC Educational Resources Information Center

    Kang, Hyeon-Ah; Lu, Ying; Chang, Hua-Hua

    2017-01-01

    Increasing use of item pools in large-scale educational assessments calls for an appropriate scaling procedure to achieve a common metric among field-tested items. The present study examines scaling procedures for developing a new item pool under a spiraled block linking design. The three scaling procedures are considered: (a) concurrent…

  8. Selecting Lower Priced Items.

    ERIC Educational Resources Information Center

    Kleinert, Harold L.; And Others

    1988-01-01

    A program used to teach moderately to severely mentally handicapped students to select the lower priced items in actual shopping activities is described. Through a five-phase process, students are taught to compare prices themselves as well as take into consideration variations in the sizes of containers and varying product weights. (VW)

  9. Psychometric evaluation of Persian Nomophobia Questionnaire: Differential item functioning and measurement invariance across gender.

    PubMed

    Lin, Chung-Ying; Griffiths, Mark D; Pakpour, Amir H

    2018-03-01

    Background and aims Research examining problematic mobile phone use has increased markedly over the past 5 years and has been related to "no mobile phone phobia" (so-called nomophobia). The 20-item Nomophobia Questionnaire (NMP-Q) is the only instrument that assesses nomophobia with an underlying theoretical structure and robust psychometric testing. This study aimed to confirm the construct validity of the Persian NMP-Q using Rasch and confirmatory factor analysis (CFA) models. Methods After ensuring the linguistic validity, Rasch models were used to examine the unidimensionality of each Persian NMP-Q factor among 3,216 Iranian adolescents and CFAs were used to confirm its four-factor structure. Differential item functioning (DIF) and multigroup CFA were used to examine whether males and females interpreted the NMP-Q similarly, including item content and NMP-Q structure. Results Each factor was unidimensional according to the Rach findings, and the four-factor structure was supported by CFA. Two items did not quite fit the Rasch models (Item 14: "I would be nervous because I could not know if someone had tried to get a hold of me;" Item 9: "If I could not check my smartphone for a while, I would feel a desire to check it"). No DIF items were found across gender and measurement invariance was supported in multigroup CFA across gender. Conclusions Due to the satisfactory psychometric properties, it is concluded that the Persian NMP-Q can be used to assess nomophobia among adolescents. Moreover, NMP-Q users may compare its scores between genders in the knowledge that there are no score differences contributed by different understandings of NMP-Q items.

  10. Gender-Based Differential Item Performance in Mathematics Achievement Items.

    ERIC Educational Resources Information Center

    Doolittle, Allen E.; Cleary, T. Anne

    1987-01-01

    Eight randomly equivalent samples of high school seniors were each given a unique form of the ACT Assessment Mathematics Usage Test (ACTM). Signed measures of differential item performance (DIP) were obtained for each item in the eight ACTM forms. DIP estimates were analyzed and a significant item category effect was found. (Author/LMO)

  11. Differential Item Functioning: Its Consequences. Research Report. ETS RR-10-01

    ERIC Educational Resources Information Center

    Lee, Yi-Hsuan; Zhang, Jinming

    2010-01-01

    This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…

  12. Real and Artificial Differential Item Functioning in Polytomous Items

    ERIC Educational Resources Information Center

    Andrich, David; Hagquist, Curt

    2015-01-01

    Differential item functioning (DIF) for an item between two groups is present if, for the same person location on a variable, persons from different groups have different expected values for their responses. Applying only to dichotomously scored items in the popular Mantel-Haenszel (MH) method for detecting DIF in which persons are classified by…

  13. Neural correlates of differential retrieval orientation: Sustained and item-related components.

    PubMed

    Woodruff, C Chad; Uncapher, Melina R; Rugg, Michael D

    2006-01-01

    Retrieval orientation refers to a cognitive state that biases processing of retrieval cues in service of a specific goal. The present study used a mixed fMRI design to investigate whether adoption of different retrieval orientations - as indexed by differences in the activity elicited by retrieval cues corresponding to unstudied items - is associated with differences in the state-related activity sustained across a block of test trials sharing a common retrieval goal. Subjects studied mixed lists comprising visually presented words and pictures. They then undertook a series of short test blocks in which all test items were visually presented words. The blocks varied according to whether the test items were used to cue retrieval of studied words or studied pictures. In several regions, neural activity elicited by correctly classified new items differed according to whether words or pictures were the targeted material. The loci of these effects suggest that one factor driving differential cue processing is modulation of the degree of overlap between cue and targeted memory representations. In addition to these item-related effects, neural activity sustained throughout the test blocks also differed according to the nature of the targeted material. These findings indicate that the adoption of different retrieval orientations is associated with distinct neural states. The loci of these sustained effects were distinct from those where new item activity varied, suggesting that the effects may play a role in biasing retrieval cue processing in favor of the current retrieval goal.

  14. Practical Guide to Conducting an Item Response Theory Analysis

    ERIC Educational Resources Information Center

    Toland, Michael D.

    2014-01-01

    Item response theory (IRT) is a psychometric technique used in the development, evaluation, improvement, and scoring of multi-item scales. This pedagogical article provides the necessary information needed to understand how to conduct, interpret, and report results from two commonly used ordered polytomous IRT models (Samejima's graded…

  15. Component Identification and Item Difficulty of Raven's Matrices Items.

    ERIC Educational Resources Information Center

    Green, Kathy E.; Kluever, Raymond C.

    Item components that might contribute to the difficulty of items on the Raven Colored Progressive Matrices (CPM) and the Standard Progressive Matrices (SPM) were studied. Subjects providing responses to CPM items were 269 children aged 2 years 9 months to 11 years 8 months, most of whom were referred for testing as potentially gifted. A second…

  16. Statistically Comparing the Performance of Multiple Automated Raters across Multiple Items

    ERIC Educational Resources Information Center

    Kieftenbeld, Vincent; Boyer, Michelle

    2017-01-01

    Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…

  17. The physical activity scale for individuals with physical disabilities: development and evaluation.

    PubMed

    Washburn, Richard A; Zhu, Weimo; McAuley, Edward; Frogley, Michael; Figoni, Stephen F

    2002-02-01

    To evaluate the construct validity of a new 13-item physical activity survey designed to assess physical activity in individuals with physical disabilities. Mail survey requesting information on physical activity, basic demographic characteristics, self-rated health, and self-rated physical activity. In February 2000, surveys were sent to 1176 individuals who had used rehabilitative services at a major midwestern university between 1950 and 1999. Two hundred twenty-seven men and 145 women with disabilities responded to the mail survey (80%, spinal cord or other locomotor injuries; 13%, visual and auditory injuries; 7%, other; 92%, white; mean age +/- standard deviation, 49.8 +/- 12.9y; mean length of disability, 36.9 +/- 14.9y). Not applicable. Physical activity was assessed with the Physical Activity Scale for Individuals with Physical Disabilities (PASIPD). The PASIPD requests the number of days a week and hours daily (categories) of participation in recreational, household, and occupational activities over the past 7 days. Total scores were calculated as the average hours daily times a metabolic equivalent value and summed over items. Pearson correlations between each survey item and the total PASIPD score were all statistically significant (P < .05) and >or= .20 (range, .20- .67). Factor analysis with principal component extraction and varimax orthogonal rotations revealed 5 latent factors (eigenvalues >or= 1, factor loadings >or= .40): home repair and lawn and garden, housework, vigorous sport and recreation, light sport and recreation, and occupation and transportation. These 5 factors accounted for 63% of the total variance. Cronbach alpha coefficients ranged from.37 to.65, indicating low-to-moderate internal consistency within factors. Those who reported being "active/highly active" had higher total and subcategory scores compared with those "not active at all." Those in "excellent" health had higher total, vigorous sport and recreation, and occupation and

  18. Psychological distress in cancer survivors: the further development of an item bank.

    PubMed

    Smith, Adam B; Armes, Jo; Richardson, Alison; Stark, Dan P

    2013-02-01

    Assessment of psychological distress by patient report is necessary to meet patients' needs throughout the cancer journey. We have previously developed an item bank to assess psychological distress but not evaluated it for cancer survivors. Our first aim in this study was to test whether we could extend our item bank to include cancer survivors. The second aim was to examine whether the item bank could assess positive affect as a single construct alongside negative psychological symptoms. Responses from 1315 cancer survivors to the Hospital Anxiety and Depression Scale (HADS) and the Positive and Negative Affect Scale (PANAS) were considered for inclusion in a pre-existing item bank created from a heterogeneous sample of 4914 cancer patients. Differential item functioning (DIF) was used to assess whether HADS responses drawn from the two samples were equivalent. Common-item equating was used to anchor the shared (HADS) items, whilst the PANAS items were added. Item fit was evaluated at each stage, and misfitting items were removed. Unidimensionality was assessed with a principal components factor analysis. The DIF analysis did not reveal any differences between the HADS item locations from the two samples. Three misfitting PANAS items were removed, resulting in a final unidimensional bank of 80 items with good internal reliability (α = 0.85). The new item bank is valid for use across the cancer journey, including cancer survivors, and modestly improves the assessment of all levels of psychological distress and positive psychological function. Copyright © 2011 John Wiley & Sons, Ltd.

  19. Stratified and Maximum Information Item Selection Procedures in Computer Adaptive Testing

    ERIC Educational Resources Information Center

    Deng, Hui; Ansley, Timothy; Chang, Hua-Hua

    2010-01-01

    In this study we evaluated and compared three item selection procedures: the maximum Fisher information procedure (F), the a-stratified multistage computer adaptive testing (CAT) (STR), and a refined stratification procedure that allows more items to be selected from the high a strata and fewer items from the low a strata (USTR), along with…

  20. Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

    NASA Astrophysics Data System (ADS)

    Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

    2016-12-01

    This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.

  1. Technical Evaluation for the Determination of CGI Designation for Safety Class Items Incorporated in Hose-in-Hose Transfer Line Assemblies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    BUCHANAN, J.R.

    2000-05-16

    The purpose of this technical evaluation is to determine whether the secondary hoses are to be categorized as Commercial Grade Items (CGI) or Engineered Equipment. This determination will identify whether or not use of the CGI Dedication process is appropriate.

  2. Distributed Item Review: Administrator User Guide. Technical Report #1603

    ERIC Educational Resources Information Center

    Irvin, P. Shawn

    2016-01-01

    The Distributed Item Review (DIR) is a secure and flexible, web-based system designed to present test items to expert reviewers across a broad geographic area for evaluation of important dimensions of quality (e.g., alignment with standards, bias, sensitivity, and student accessibility). The DIR is comprised of essential features that allow system…

  3. Core Items for a Standardized Resource Use Measure: Expert Delphi Consensus Survey.

    PubMed

    Thorn, Joanna C; Brookes, Sara T; Ridyard, Colin; Riley, Ruth; Hughes, Dyfrig A; Wordsworth, Sarah; Noble, Sian M; Thornton, Gail; Hollingworth, William

    2018-06-01

    Resource use measurement by patient recall is characterized by inconsistent methods and a lack of validation. A validated standardized resource use measure could increase data quality, improve comparability between studies, and reduce research burden. To identify a minimum set of core resource use items that should be included in a standardized adult instrument for UK health economic evaluation from a provider perspective. Health economists with experience of UK-based economic evaluations were recruited to participate in an electronic Delphi survey. Respondents were asked to rate 60 resource use items (e.g., medication names) on a scale of 1 to 9 according to the importance of the item in a generic context. Items considered less important according to predefined consensus criteria were dropped and a second survey was developed. In the second round, respondents received the median score and their own score from round 1 for each item alongside summarized comments and were asked to rerate items. A final project team meeting was held to determine the recommended core set. Forty-five participants completed round 1. Twenty-six items were considered less important and were dropped, 34 items were retained for the second round, and no new items were added. Forty-two respondents (93.3%) completed round 2, and greater consensus was observed. After the final meeting, 10 core items were selected, with further items identified as suitable for "bolt-on" questionnaire modules. The consensus on 10 items considered important in a generic context suggests that a standardized instrument for core resource use items is feasible. Copyright © 2018. Published by Elsevier Inc.

  4. [Investigation of the process of personal hygiene items biodegradation by cellulose-fermenting microorganisms].

    PubMed

    Il'in, V K; Starkov, L V; Kostrov, S V; Belikodvorskaia, G A; Chuvil'skaia, N A; Mukhamedieva, L N; Mikos, K N

    2004-01-01

    Cellulose-containing wastes are one of the heaviest and biggest ingredients of solid domestic wastes piling up during spaceflight. For the most part these are disposable personal hygiene items used in large quantities in the absence of shower. These wastes contain human body products which are very dangerous from the sanitary-epidemiological standpoint. The purpose was to explore potentiality of microbial biodegradation of cellulose-containing hygiene items anaerobically with dry mass transformation into liquid and biogas. Among specific objectives were test cultivation of active strains of reference cultures of cellulose-fermenting anaerobic thermophilic bacteria on hygiene items as the only source of carbon, evaluation of ways and need of pretreatment of gauze pads to stimulate biodegradation, and chemical analysis of resulting biogas. From the investigation it was concluded that gauze pads are susceptible to biodegradation by anaerobic bacteria producing a low toxicity gas fraction. Therefore, the proposed technology can be considered as a candidate for integration into the spacecrew life support system.

  5. Calibration and Validation of the Dutch-Flemish PROMIS Pain Interference Item Bank in Patients with Chronic Pain.

    PubMed

    Crins, Martine H P; Roorda, Leo D; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Terwee, Caroline B

    2015-01-01

    The Dutch-Flemish PROMIS Group translated the adult PROMIS Pain Interference item bank into Dutch-Flemish. The aims of the current study were to calibrate the parameters of these items using an item response theory (IRT) model, to evaluate the cross-cultural validity of the Dutch-Flemish translations compared to the original English items, and to evaluate their reliability and construct validity. The 40 items in the bank were completed by 1085 Dutch chronic pain patients. Before calibrating the items, IRT model assumptions were evaluated using confirmatory factor analysis (CFA). Items were calibrated using the graded response model (GRM), an IRT model appropriate for items with more than two response options. To evaluate cross-cultural validity, differential item functioning (DIF) for language (Dutch vs. English) was examined. Reliability was evaluated based on standard errors and Cronbach's alpha. To evaluate construct validity correlations with scores on legacy instruments (e.g., the Disabilities of the Arm, Shoulder and Hand Questionnaire) were calculated. Unidimensionality of the Dutch-Flemish PROMIS Pain Interference item bank was supported by CFA tests of model fit (CFI = 0.986, TLI = 0.986). Furthermore, the data fit the GRM and showed good coverage across the pain interference continuum (threshold-parameters range: -3.04 to 3.44). The Dutch-Flemish PROMIS Pain Interference item bank has good cross-cultural validity (only two out of 40 items showing DIF), good reliability (Cronbach's alpha = 0.98), and good construct validity (Pearson correlations between 0.62 and 0.75). A computer adaptive test (CAT) and Dutch-Flemish PROMIS short forms of the Dutch-Flemish PROMIS Pain Interference item bank can now be developed.

  6. Calibration and Validation of the Dutch-Flemish PROMIS Pain Interference Item Bank in Patients with Chronic Pain

    PubMed Central

    Crins, Martine H. P.; Roorda, Leo D.; Smits, Niels; de Vet, Henrica C. W.; Westhovens, Rene; Cella, David; Cook, Karon F.; Revicki, Dennis; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Terwee, Caroline B.

    2015-01-01

    The Dutch-Flemish PROMIS Group translated the adult PROMIS Pain Interference item bank into Dutch-Flemish. The aims of the current study were to calibrate the parameters of these items using an item response theory (IRT) model, to evaluate the cross-cultural validity of the Dutch-Flemish translations compared to the original English items, and to evaluate their reliability and construct validity. The 40 items in the bank were completed by 1085 Dutch chronic pain patients. Before calibrating the items, IRT model assumptions were evaluated using confirmatory factor analysis (CFA). Items were calibrated using the graded response model (GRM), an IRT model appropriate for items with more than two response options. To evaluate cross-cultural validity, differential item functioning (DIF) for language (Dutch vs. English) was examined. Reliability was evaluated based on standard errors and Cronbach’s alpha. To evaluate construct validity correlations with scores on legacy instruments (e.g., the Disabilities of the Arm, Shoulder and Hand Questionnaire) were calculated. Unidimensionality of the Dutch-Flemish PROMIS Pain Interference item bank was supported by CFA tests of model fit (CFI = 0.986, TLI = 0.986). Furthermore, the data fit the GRM and showed good coverage across the pain interference continuum (threshold-parameters range: -3.04 to 3.44). The Dutch-Flemish PROMIS Pain Interference item bank has good cross-cultural validity (only two out of 40 items showing DIF), good reliability (Cronbach’s alpha = 0.98), and good construct validity (Pearson correlations between 0.62 and 0.75). A computer adaptive test (CAT) and Dutch-Flemish PROMIS short forms of the Dutch-Flemish PROMIS Pain Interference item bank can now be developed. PMID:26214178

  7. Screening Test Items for Differential Item Functioning

    ERIC Educational Resources Information Center

    Longford, Nicholas T.

    2014-01-01

    A method for medical screening is adapted to differential item functioning (DIF). Its essential elements are explicit declarations of the level of DIF that is acceptable and of the loss function that quantifies the consequences of the two kinds of inappropriate classification of an item. Instead of a single level and a single function, sets of…

  8. Weighted Maximum-a-Posteriori Estimation in Tests Composed of Dichotomous and Polytomous Items

    ERIC Educational Resources Information Center

    Sun, Shan-Shan; Tao, Jian; Chang, Hua-Hua; Shi, Ning-Zhong

    2012-01-01

    For mixed-type tests composed of dichotomous and polytomous items, polytomous items often yield more information than dichotomous items. To reflect the difference between the two types of items and to improve the precision of ability estimation, an adaptive weighted maximum-a-posteriori (WMAP) estimation is proposed. To evaluate the performance of…

  9. Development of a Simple 12-Item Theory-Based Instrument to Assess the Impact of Continuing Professional Development on Clinical Behavioral Intentions

    PubMed Central

    Légaré, France; Borduas, Francine; Freitas, Adriana; Jacques, André; Godin, Gaston; Luconi, Francesca; Grimshaw, Jeremy

    2014-01-01

    Background Decision-makers in organizations providing continuing professional development (CPD) have identified the need for routine assessment of its impact on practice. We sought to develop a theory-based instrument for evaluating the impact of CPD activities on health professionals' clinical behavioral intentions. Methods and Findings Our multipronged study had four phases. 1) We systematically reviewed the literature for instruments that used socio-cognitive theories to assess healthcare professionals' clinically-oriented behavioral intentions and/or behaviors; we extracted items relating to the theoretical constructs of an integrated model of healthcare professionals' behaviors and removed duplicates. 2) A committee of researchers and CPD decision-makers selected a pool of items relevant to CPD. 3) An international group of experts (n = 70) reached consensus on the most relevant items using electronic Delphi surveys. 4) We created a preliminary instrument with the items found most relevant and assessed its factorial validity, internal consistency and reliability (weighted kappa) over a two-week period among 138 physicians attending a CPD activity. Out of 72 potentially relevant instruments, 47 were analyzed. Of the 1218 items extracted from these, 16% were discarded as improperly phrased and 70% discarded as duplicates. Mapping the remaining items onto the constructs of the integrated model of healthcare professionals' behaviors yielded a minimum of 18 and a maximum of 275 items per construct. The partnership committee retained 61 items covering all seven constructs. Two iterations of the Delphi process produced consensus on a provisional 40-item questionnaire. Exploratory factorial analysis following test-retest resulted in a 12-item questionnaire. Cronbach's coefficients for the constructs varied from 0.77 to 0.85. Conclusion A 12-item theory-based instrument for assessing the impact of CPD activities on health professionals' clinical behavioral

  10. Variation in the Readability of Items Within Surveys

    PubMed Central

    Calderón, José L.; Morales, Leo S.; Liu, Honghu; Hays, Ron D.

    2006-01-01

    The objective of this study was to estimate the variation in the readability of survey items within 2 widely used health-related quality-of-life surveys: the National Eye Institute Visual Functioning Questionnaire–25 (VFQ-25) and the Short Form Health Survey, version 2 (SF-36v2). Flesch-Kincaid and Flesch Reading Ease formulas were used to estimate readability. Individual survey item scores and descriptive statistics for each survey were calculated. Variation of individual item scores from the mean survey score was graphically depicted for each survey. The mean reading grade level and reading ease estimates for the VFQ-25 and SF-36v2 were 7.8 (fairly easy) and 6.4 (easy), respectively. Both surveys had notable variation in item readability; individual item readability scores ranged from 3.7 to 12.0 (very easy to difficult) for the VFQ-25 and 2.2 to 12.0 (very easy to difficult) for the SF-36v2. Because survey respondents may not comprehend items with readability scores that exceed their reading ability, estimating the readability of each survey item is an important component of evaluating survey readability. Standards for measuring the readability of surveys are needed. PMID:16401705

  11. Distributed patterns of activity in sensory cortex reflect the precision of multiple items maintained in visual short-term memory.

    PubMed

    Emrich, Stephen M; Riggall, Adam C; Larocque, Joshua J; Postle, Bradley R

    2013-04-10

    Traditionally, load sensitivity of sustained, elevated activity has been taken as an index of storage for a limited number of items in visual short-term memory (VSTM). Recently, studies have demonstrated that the contents of a single item held in VSTM can be decoded from early visual cortex, despite the fact that these areas do not exhibit elevated, sustained activity. It is unknown, however, whether the patterns of neural activity decoded from sensory cortex change as a function of load, as one would expect from a region storing multiple representations. Here, we use multivoxel pattern analysis to examine the neural representations of VSTM in humans across multiple memory loads. In an important extension of previous findings, our results demonstrate that the contents of VSTM can be decoded from areas that exhibit a transient response to visual stimuli, but not from regions that exhibit elevated, sustained load-sensitive delay-period activity. Moreover, the neural information present in these transiently activated areas decreases significantly with increasing load, indicating load sensitivity of the patterns of activity that support VSTM maintenance. Importantly, the decrease in classification performance as a function of load is correlated with within-subject changes in mnemonic resolution. These findings indicate that distributed patterns of neural activity in putatively sensory visual cortex support the representation and precision of information in VSTM.

  12. Languages Library of Test Items. Volume Two: German, Latin.

    ERIC Educational Resources Information Center

    Campbell, Thomas; And Others

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…

  13. Languages Library of Test Items. Volume One: French, Indonesian.

    ERIC Educational Resources Information Center

    Campbell, Thomas; And Others

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…

  14. Textiles and Design Library of Test Items. Volume I.

    ERIC Educational Resources Information Center

    Smith, Jan, Ed.

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection is reviewed for content validity and reliability. The test…

  15. Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures.

    PubMed

    Cappelleri, Joseph C; Jason Lundy, J; Hays, Ron D

    2014-05-01

    The US Food and Drug Administration's guidance for industry document on patient-reported outcomes (PRO) defines content validity as "the extent to which the instrument measures the concept of interest" (FDA, 2009, p. 12). According to Strauss and Smith (2009), construct validity "is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity" (p. 7). Hence, both qualitative and quantitative information are essential in evaluating the validity of measures. We review classical test theory and item response theory (IRT) approaches to evaluating PRO measures, including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized "difficulty" (severity) order of items is represented by observed responses. If a researcher has few qualitative data and wants to get preliminary information about the content validity of the instrument, then descriptive assessments using classical test theory should be the first step. As the sample size grows during subsequent stages of instrument development, confidence in the numerical estimates from Rasch and other IRT models (as well as those of classical test theory) would also grow. Classical test theory and IRT can be useful in providing a quantitative assessment of items and scales during the content-validity phase of PRO-measure development. Depending on the particular type of measure and the specific circumstances, the classical test theory and/or the IRT should be considered to help maximize the content validity of PRO measures. Copyright © 2014 Elsevier HS Journals, Inc. All rights reserved.

  16. A Monte Carlo Study Investigating the Influence of Item Discrimination, Category Intersection Parameters, and Differential Item Functioning Patterns on the Detection of Differential Item Functioning in Polytomous Items

    ERIC Educational Resources Information Center

    Thurman, Carol

    2009-01-01

    The increased use of polytomous item formats has led assessment developers to pay greater attention to the detection of differential item functioning (DIF) in these items. DIF occurs when an item performs differently for two contrasting groups of respondents (e.g., males versus females) after controlling for differences in the abilities of the…

  17. The Impact of Non-attempted and Dually-Attempted Items on Person Abilities Using Item Response Theory

    PubMed Central

    Sideridis, Georgios D.; Tsaousis, Ioannis; Al Harbi, Khaleel

    2016-01-01

    The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction. PMID:27790174

  18. The Impact of Non-attempted and Dually-Attempted Items on Person Abilities Using Item Response Theory.

    PubMed

    Sideridis, Georgios D; Tsaousis, Ioannis; Al Harbi, Khaleel

    2016-01-01

    The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction.

  19. The COPD-SIB: a newly developed disease-specific item bank to measure health-related quality of life in patients with chronic obstructive pulmonary disease.

    PubMed

    Paap, Muirne C S; Lenferink, Lonneke I M; Herzog, Nadine; Kroeze, Karel A; van der Palen, Job

    2016-06-27

    Health-related quality of life (HRQoL) is widely used as an outcome measure in the evaluation of treatment interventions in patients with chronic obstructive pulmonary disease (COPD). In order to address challenges associated with existing fixed-length measures (e.g., too long to be used routinely, too short to ensure both content validity and reliability), a COPD-specific item bank (COPD-SIB) was developed. Items were selected based on literature review and interviews with Dutch COPD patients, with a strong focus on both content validity and item comprehension. The psychometric quality of the item bank was evaluated using Mokken Scale Analysis and parametric Item Response Theory, using data of 666 COPD patients. The final item bank contains 46 items that form a strong scale, tapping into eight important themes that were identified based on literature review and patient interviews: Coping with disease/symptoms, adaptability; Autonomy; Anxiety about the course/end-state of the disease, hopelessness; Positive psychological functioning; Situations triggering or enhancing breathing problems; Symptoms; Activity; Impact. The 46-item COPD-SIB has good psychometric properties and content validity. Items are available in Dutch and English. The COPD-SIB can be used as a stand-alone instrument, or to inform computerised adaptive testing.

  20. Item response theory analysis of the Pain Self-Efficacy Questionnaire.

    PubMed

    Costa, Daniel S J; Asghari, Ali; Nicholas, Michael K

    2017-01-01

    The Pain Self-Efficacy Questionnaire (PSEQ) is a 10-item instrument designed to assess the extent to which a person in pain believes s/he is able to accomplish various activities despite their pain. There is strong evidence for the validity and reliability of both the full-length PSEQ and a 2-item version. The purpose of this study is to further examine the properties of the PSEQ using an item response theory (IRT) approach. We used the two-parameter graded response model to examine the category probability curves, and location and discrimination parameters of the 10 PSEQ items. In item response theory, responses to a set of items are assumed to be probabilistically determined by a latent (unobserved) variable. In the graded-response model specifically, item response threshold (the value of the latent variable for which adjacent response categories are equally likely) and discrimination parameters are estimated for each item. Participants were 1511 mixed, chronic pain patients attending for initial assessment at a tertiary pain management centre. All items except item 7 ('I can cope with my pain without medication') performed well in IRT analysis, and the category probability curves suggested that participants used the 7-point response scale consistently. Items 6 ('I can still do many of the things I enjoy doing, such as hobbies or leisure activity, despite pain'), 8 ('I can still accomplish most of my goals in life, despite the pain') and 9 ('I can live a normal lifestyle, despite the pain') captured higher levels of the latent variable with greater precision. The results from this IRT analysis add to the body of evidence based on classical test theory illustrating the strong psychometric properties of the PSEQ. Despite the relatively poor performance of Item 7, its clinical utility warrants its retention in the questionnaire. The strong psychometric properties of the PSEQ support its use as an effective tool for assessing self-efficacy in people with pain

  1. The Disgust Scale: Item Analysis, Factor Structure, and Suggestions for Refinement

    ERIC Educational Resources Information Center

    Olatunji, Bunmi O.; Williams, Nathan L.; Tolin, David F.; Abramowitz, Jonathan S.; Sawchuk, Craig N.; Lohr, Jeffrey M.; Elwood, Lisa S.

    2007-01-01

    In the 4 studies presented (N = 1,939), a converging set of analyses was conducted to evaluate the item adequacy, factor structure, reliability, and validity of the Disgust Scale (DS; J. Haidt, C. McCauley, & P. Rozin, 1994). The results suggest that 7 items (i.e., Items 2, 7, 8, 21, 23, 24, and 25) should be considered for removal from the DS.…

  2. A Comparison Study of Item Exposure Control Strategies in MCAT

    ERIC Educational Resources Information Center

    Mao, Xiuzhen; Ozdemir, Burhanettin; Wang, Yating; Xiu, Tao

    2016-01-01

    Four item selection indexes with and without exposure control are evaluated and compared in multidimensional computerized adaptive testing (CAT). The four item selection indices are D-optimality, Posterior expectation Kullback-Leibler information (KLP), the minimized error variance of the linear combination score with equal weight (V1), and the…

  3. Development of the 7-Item Binge-Eating Disorder Screener (BEDS-7)

    PubMed Central

    Deal, Linda S.; DiBenedetti, Dana B.; Nelson, Lauren; Fehnel, Sheri E.; Brown, T. Michelle

    2016-01-01

    Objective Develop a brief, patient-reported screening tool designed to identify individuals with probable binge-eating disorder (BED) for further evaluation or referral to specialists. Methods Items were developed on the basis of the DSM-5 diagnostic criteria, existing tools, and input from 3 clinical experts (January 2014). Items were then refined in cognitive debriefing interviews with participants self-reporting BED characteristics (March 2014) and piloted in a multisite, cross-sectional, prospective, noninterventional study consisting of a semistructured diagnostic interview (to diagnose BED) and administration of the pilot Binge-Eating Disorder Screener (BEDS), Binge Eating Scale (BES), and RAND 36-Item Short-Form Health Survey (RAND-36) (June 2014–July 2014). The sensitivity and specificity of classification algorithms (formed from the pilot BEDS item-level responses) in predicting BED diagnosis were evaluated. The final algorithm was selected to minimize false negatives and false positives, while utilizing the fewest number of BEDS items. Results Starting with the initial BEDS item pool (20 items), the 13-item pilot BEDS resulted from the cognitive debriefing interviews (n = 13). Of the 97 participants in the noninterventional study, 16 were diagnosed with BED (10/62 female, 16%; 6/35 male, 17%). Seven BEDS items (BEDS-7) yielded 100% sensitivity and 38.7% specificity. Participants correctly identified (true positives) had poorer BES scores and RAND-36 scores than participants identified as true negatives. Conclusions Implementation of the brief, patient-reported BEDS-7 in real-world clinical practice is expected to promote better understanding of BED characteristics and help physicians identify patients who may have BED. PMID:27486542

  4. Development of the 7-Item Binge-Eating Disorder Screener (BEDS-7).

    PubMed

    Herman, Barry K; Deal, Linda S; DiBenedetti, Dana B; Nelson, Lauren; Fehnel, Sheri E; Brown, T Michelle

    2016-01-01

    Develop a brief, patient-reported screening tool designed to identify individuals with probable binge-eating disorder (BED) for further evaluation or referral to specialists. Items were developed on the basis of the DSM-5 diagnostic criteria, existing tools, and input from 3 clinical experts (January 2014). Items were then refined in cognitive debriefing interviews with participants self-reporting BED characteristics (March 2014) and piloted in a multisite, cross-sectional, prospective, noninterventional study consisting of a semistructured diagnostic interview (to diagnose BED) and administration of the pilot Binge-Eating Disorder Screener (BEDS), Binge Eating Scale (BES), and RAND 36-Item Short-Form Health Survey (RAND-36) (June 2014-July 2014). The sensitivity and specificity of classification algorithms (formed from the pilot BEDS item-level responses) in predicting BED diagnosis were evaluated. The final algorithm was selected to minimize false negatives and false positives, while utilizing the fewest number of BEDS items. Starting with the initial BEDS item pool (20 items), the 13-item pilot BEDS resulted from the cognitive debriefing interviews (n = 13). Of the 97 participants in the noninterventional study, 16 were diagnosed with BED (10/62 female, 16%; 6/35 male, 17%). Seven BEDS items (BEDS-7) yielded 100% sensitivity and 38.7% specificity. Participants correctly identified (true positives) had poorer BES scores and RAND-36 scores than participants identified as true negatives. Implementation of the brief, patient-reported BEDS-7 in real-world clinical practice is expected to promote better understanding of BED characteristics and help physicians identify patients who may have BED.

  5. Refining the Pediatric Evaluation of Disability Inventory-Patient-Reported Outcome (PEDI-PRO) item candidates: interpretation of a self-reported outcome measure of functional performance by young people with neurodevelopmental disabilities.

    PubMed

    Kramer, Jessica M; Schwartz, Ariel

    2017-10-01

    This study examined the item interpretability and rating scale use of the Pediatric Evaluation of Disability Inventory-Patient-Reported Outcome (PEDI-PRO) by young people with developmental disabilities. The PEDI-PRO assesses the functional performance of discrete functional tasks in the context of everyday life situations. A two-phase cognitive interview design was implemented with a convenience sample of 37 young people (mean age 19y, SD 2y 5mo; 13 males and 24 females; 68% with intellectual disability) with developmental disabilities. In phase I, 182 item candidates were each reviewed by an average of four young people. In phase II, 103 items were carried forward or revised and each reviewed by an average of seven additional young people. Two raters coded responses for intended item interpretation and performance quality; codes were analysed using descriptive statistics. Qualitative analysis explored young people's self-evaluation process. Items were interpreted as intended by most young people (mean 86%). Young people can use PEDI-PRO response categories appropriately to describe their performance: 94% of positive performance descriptions coincided with a positive response category choice; 73% of negative descriptions coincided with a negative response category choice. Young people interpreted items in a literal manner, and their self-evaluation incorporated the use of supports that facilitate functional performance. The PEDI-PRO's measurement framework appears to support the self-evaluation of functional performance of young people with developmental disabilities. © 2017 Mac Keith Press.

  6. A New Item Selection Procedure for Mixed Item Type in Computerized Classification Testing.

    ERIC Educational Resources Information Center

    Lau, C. Allen; Wang, Tianyou

    This paper proposes a new Information-Time index as the basis for item selection in computerized classification testing (CCT) and investigates how this new item selection algorithm can help improve test efficiency for item pools with mixed item types. It also investigates how practical constraints such as item exposure rate control, test…

  7. Relationship between Item Responses of Negative Affect Items and the Distribution of the Sum of the Item Scores in the General Population

    PubMed Central

    Kawasaki, Yohei; Ide, Kazuki; Akutagawa, Maiko; Yamada, Hiroshi; Furukawa, Toshiaki A.; Ono, Yutaka

    2016-01-01

    Background Several studies have shown that total depressive symptom scores in the general population approximate an exponential pattern, except for the lower end of the distribution. The Center for Epidemiologic Studies Depression Scale (CES-D) consists of 20 items, each of which may take on four scores: “rarely,” “some,” “occasionally,” and “most of the time.” Recently, we reported that the item responses for 16 negative affect items commonly exhibit exponential patterns, except for the level of “rarely,” leading us to hypothesize that the item responses at the level of “rarely” may be related to the non-exponential pattern typical of the lower end of the distribution. To verify this hypothesis, we investigated how the item responses contribute to the distribution of the sum of the item scores. Methods Data collected from 21,040 subjects who had completed the CES-D questionnaire as part of a Japanese national survey were analyzed. To assess the item responses of negative affect items, we used a parameter r, which denotes the ratio of “rarely” to “some” in each item response. The distributions of the sum of negative affect items in various combinations were analyzed using log-normal scales and curve fitting. Results The sum of the item scores approximated an exponential pattern regardless of the combination of items, whereas, at the lower end of the distributions, there was a clear divergence between the actual data and the predicted exponential pattern. At the lower end of the distributions, the sum of the item scores with high values of r exhibited higher scores compared to those predicted from the exponential pattern, whereas the sum of the item scores with low values of r exhibited lower scores compared to those predicted. Conclusions The distributional pattern of the sum of the item scores could be predicted from the item responses of such items. PMID:27806132

  8. The sensory timecourses associated with conscious visual item memory and source memory.

    PubMed

    Thakral, Preston P; Slotnick, Scott D

    2015-09-01

    Previous event-related potential (ERP) findings have suggested that during visual item and source memory, nonconscious and conscious sensory (occipital-temporal) activity onsets may be restricted to early (0-800 ms) and late (800-1600 ms) temporal epochs, respectively. In an ERP experiment, we tested this hypothesis by separately assessing whether the onset of conscious sensory activity was restricted to the late epoch during source (location) memory and item (shape) memory. We found that conscious sensory activity had a late (>800 ms) onset during source memory and an early (<200 ms) onset during item memory. In a follow-up fMRI experiment, conscious sensory activity was localized to BA17, BA18, and BA19. Of primary importance, the distinct source memory and item memory ERP onsets contradict the hypothesis that there is a fixed temporal boundary separating nonconscious and conscious processing during all forms of visual conscious retrieval. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. Item Response Theory Modeling of the Philadelphia Naming Test.

    PubMed

    Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D

    2015-06-01

    In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating explanatory variables to item difficulty. This article describes the statistical model underlying the computer adaptive PNT presented in a companion article (Hula, Kellough, & Fergadiotis, 2015). Using archival data, we evaluated the fit of the PNT to 1- and 2-parameter logistic models and examined the precision of the resulting parameter estimates. We regressed the item difficulty estimates on three predictor variables: word length, age of acquisition, and contextual diversity. The 2-parameter logistic model demonstrated marginally better fit, but the fit of the 1-parameter logistic model was adequate. Precision was excellent for both person ability and item difficulty estimates. Word length, age of acquisition, and contextual diversity all independently contributed to variance in item difficulty. Item-response-theory methods can be productively used to analyze and quantify anomia severity in aphasia. Regression of item difficulty on lexical variables supported the validity of the PNT and interpretation of anomia severity scores in the context of current word-finding models.

  10. Psychosocial consequences of cancer cachexia: the development of an item bank.

    PubMed

    Häne, Hanspeter; Oberholzer, Rolf; Walker, Jochen; Hopkinson, Jane B; de Wolf-Linder, Susanne; Strasser, Florian

    2013-12-01

    Cancer cachexia syndrome (CCS) is often accompanied by psychosocial consequences (PSC). To alleviate PSC, a systematic assessment method is required. Currently, few assessment tools are available (e.g., Functional Assessment of Anorexia/Cachexia Therapy). There is no systematic assessment tool that captures the PSC of CCS. To develop a pilot item bank to assess the PSC of CCS. A total of 132 questions, generated from patient answers in a previous study, were reduced to 121 items by content analysis and evaluation by multidisciplinary experts (doctor, nutritionists, and nurses). In our two-step, cross-sectional study, patients, judged by staff to have PSC of CCS, were included, and the questions were randomly allocated to the patients. Questions were evaluated for understandability and triggering emotions, and patients were asked to provide a response using a four-point Likert scale. Subsequently, problematic questions were revised, reformulated, and retested. A total of 20 patients with a variety of tumor types participated. Of the 121 questions, 31 had to be reformulated after Step 1 and were retested in Step 2, after which seven were again evaluated as not being perfectly comprehensible. In Step 1, 22 questions were found to trigger emotions, but no item required remodeling. Item performance using the Likert scale revealed no consistent floor or ceiling effects. Our final pilot question bank comprised 117 questions. The final item bank contains questions that are understood and accepted by the patients. This item bank now needs to be developed into a measurement tool that groups items into domains and can be used in future research studies. Copyright © 2013 U.S. Cancer Pain Relief Committee. Published by Elsevier Inc. All rights reserved.

  11. The Consequences of Ignoring Item Parameter Drift in Longitudinal Item Response Models

    ERIC Educational Resources Information Center

    Lee, Wooyeol; Cho, Sun-Joo

    2017-01-01

    Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered…

  12. Item-Level Psychometrics of the Glasgow Outcome Scale: Extended Structured Interviews.

    PubMed

    Hong, Ickpyo; Li, Chih-Ying; Velozo, Craig A

    2016-04-01

    The Glasgow Outcome Scale-Extended (GOSE) structured interview captures critical components of activities and participation, including home, shopping, work, leisure, and family/friend relationships. Eighty-nine community dwelling adults with mild-moderate traumatic brain injury (TBI) were recruited (average = 2.7 year post injury). Nine items of the 19 items were used for the psychometrics analysis purpose. Factor analysis and item-level psychometrics were investigated using the Rasch partial-credit model. Although the principal components analysis of residuals suggests that a single measurement factor dominates the measure, the instrument did not meet the factor analysis criteria. Five items met the rating scale criteria. Eight items fit the Rasch model. The instrument demonstrated low person reliability (0.63), low person strata (2.07), and a slight ceiling effect. The GOSE demonstrated limitations in precisely measuring activities/participation for individuals after TBI. Future studies should examine the impact of the low precision of the GOSE on effect size. © The Author(s) 2016.

  13. Using the Nominal Response Model to Evaluate Response Category Discrimination in the PROMIS Emotional Distress Item Pools

    ERIC Educational Resources Information Center

    Preston, Kathleen; Reise, Steven; Cai, Li; Hays, Ron D.

    2011-01-01

    The authors used a nominal response item response theory model to estimate category boundary discrimination (CBD) parameters for items drawn from the Emotional Distress item pools (Depression, Anxiety, and Anger) developed in the Patient-Reported Outcomes Measurement Information Systems (PROMIS) project. For polytomous items with ordered response…

  14. Automatic Item Generation: A More Efficient Process for Developing Mathematics Achievement Items?

    ERIC Educational Resources Information Center

    Embretson, Susan E.; Kingston, Neal M.

    2018-01-01

    The continual supply of new items is crucial to maintaining quality for many tests. Automatic item generation (AIG) has the potential to rapidly increase the number of items that are available. However, the efficiency of AIG will be mitigated if the generated items must be submitted to traditional, time-consuming review processes. In two studies,…

  15. Solving the measurement invariance anchor item problem in item response theory.

    PubMed

    Meade, Adam W; Wright, Natalie A

    2012-09-01

    The efficacy of tests of differential item functioning (measurement invariance) has been well established. It is clear that when properly implemented, these tests can successfully identify differentially functioning (DF) items when they exist. However, an assumption of these analyses is that the metric for different groups is linked using anchor items that are invariant. In practice, however, it is impossible to be certain which items are DF and which are invariant. This problem of anchor items, or referent indicators, has long plagued invariance research, and a multitude of suggested approaches have been put forth. Unfortunately, the relative efficacy of these approaches has not been tested. This study compares 11 variations on 5 qualitatively different approaches from recent literature for selecting optimal anchor items. A large-scale simulation study indicates that for nearly all conditions, an easily implemented 2-stage procedure recently put forth by Lopez Rivas, Stark, and Chernyshenko (2009) provided optimal power while maintaining nominal Type I error. With this approach, appropriate anchor items can be easily and quickly located, resulting in more efficacious invariance tests. Recommendations for invariance testing are illustrated using a pedagogical example of employee responses to an organizational culture measure.

  16. Using a Model of Analysts' Judgments to Augment an Item Calibration Process

    ERIC Educational Resources Information Center

    Hauser, Carl; Thum, Yeow Meng; He, Wei; Ma, Lingling

    2015-01-01

    When conducting item reviews, analysts evaluate an array of statistical and graphical information to assess the fit of a field test (FT) item to an item response theory model. The process can be tedious, particularly when the number of human reviews (HR) to be completed is large. Furthermore, such a process leads to decisions that are susceptible…

  17. The role of attention in item-item binding in visual working memory.

    PubMed

    Peterson, Dwight J; Naveh-Benjamin, Moshe

    2017-09-01

    An important yet unresolved question regarding visual working memory (VWM) relates to whether or not binding processes within VWM require additional attentional resources compared with processing solely the individual components comprising these bindings. Previous findings indicate that binding of surface features (e.g., colored shapes) within VWM is not demanding of resources beyond what is required for single features. However, it is possible that other types of binding, such as the binding of complex, distinct items (e.g., faces and scenes), in VWM may require additional resources. In 3 experiments, we examined VWM item-item binding performance under no load, articulatory suppression, and backward counting using a modified change detection task. Binding performance declined to a greater extent than single-item performance under higher compared with lower levels of concurrent load. The findings from each of these experiments indicate that processing item-item bindings within VWM requires a greater amount of attentional resources compared with single items. These findings also highlight an important distinction between the role of attention in item-item binding within VWM and previous studies of long-term memory (LTM) where declines in single-item and binding test performance are similar under divided attention. The current findings provide novel evidence that the specific type of binding is an important determining factor regarding whether or not VWM binding processes require attention. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  18. Validity and Reliability of the 8-Item Work Limitations Questionnaire.

    PubMed

    Walker, Timothy J; Tullar, Jessica M; Diamond, Pamela M; Kohl, Harold W; Amick, Benjamin C

    2017-12-01

    Purpose To evaluate factorial validity, scale reliability, test-retest reliability, convergent validity, and discriminant validity of the 8-item Work Limitations Questionnaire (WLQ) among employees from a public university system. Methods A secondary analysis using de-identified data from employees who completed an annual Health Assessment between the years 2009-2015 tested research aims. Confirmatory factor analysis (CFA) (n = 10,165) tested the latent structure of the 8-item WLQ. Scale reliability was determined using a CFA-based approach while test-retest reliability was determined using the intraclass correlation coefficient. Convergent/discriminant validity was tested by evaluating relations between the 8-item WLQ with health/performance variables for convergent validity (health-related work performance, number of chronic conditions, and general health) and demographic variables for discriminant validity (gender and institution type). Results A 1-factor model with three correlated residuals demonstrated excellent model fit (CFI = 0.99, TLI = 0.99, RMSEA = 0.03, and SRMR = 0.01). The scale reliability was acceptable (0.69, 95% CI 0.68-0.70) and the test-retest reliability was very good (ICC = 0.78). Low-to-moderate associations were observed between the 8-item WLQ and the health/performance variables while weak associations were observed between the demographic variables. Conclusions The 8-item WLQ demonstrated sufficient reliability and validity among employees from a public university system. Results suggest the 8-item WLQ is a usable alternative for studies when the more comprehensive 25-item WLQ is not available.

  19. Item Purification in Differential Item Functioning Using Generalized Linear Mixed Models

    ERIC Educational Resources Information Center

    Liu, Qian

    2011-01-01

    For this dissertation, four item purification procedures were implemented onto the generalized linear mixed model for differential item functioning (DIF) analysis, and the performance of these item purification procedures was investigated through a series of simulations. Among the four procedures, forward and generalized linear mixed model (GLMM)…

  20. Identifying items to assess methodological quality in physical therapy trials: a factor analysis.

    PubMed

    Armijo-Olivo, Susan; Cummings, Greta G; Fuentes, Jorge; Saltaji, Humam; Ha, Christine; Chisholm, Annabritt; Pasichnyk, Dion; Rogers, Todd

    2014-09-01

    Numerous tools and individual items have been proposed to assess the methodological quality of randomized controlled trials (RCTs). The frequency of use of these items varies according to health area, which suggests a lack of agreement regarding their relevance to trial quality or risk of bias. The objectives of this study were: (1) to identify the underlying component structure of items and (2) to determine relevant items to evaluate the quality and risk of bias of trials in physical therapy by using an exploratory factor analysis (EFA). A methodological research design was used, and an EFA was performed. Randomized controlled trials used for this study were randomly selected from searches of the Cochrane Database of Systematic Reviews. Two reviewers used 45 items gathered from 7 different quality tools to assess the methodological quality of the RCTs. An exploratory factor analysis was conducted using the principal axis factoring (PAF) method followed by varimax rotation. Principal axis factoring identified 34 items loaded on 9 common factors: (1) selection bias; (2) performance and detection bias; (3) eligibility, intervention details, and description of outcome measures; (4) psychometric properties of the main outcome; (5) contamination and adherence to treatment; (6) attrition bias; (7) data analysis; (8) sample size; and (9) control and placebo adequacy. Because of the exploratory nature of the results, a confirmatory factor analysis is needed to validate this model. To the authors' knowledge, this is the first factor analysis to explore the underlying component items used to evaluate the methodological quality or risk of bias of RCTs in physical therapy. The items and factors represent a starting point for evaluating the methodological quality and risk of bias in physical therapy trials. Empirical evidence of the association among these items with treatment effects and a confirmatory factor analysis of these results are needed to validate these items.

  1. Bayes Factor Covariance Testing in Item Response Models.

    PubMed

    Fox, Jean-Paul; Mulder, Joris; Sinharay, Sandip

    2017-12-01

    Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning the underlying covariance structure are evaluated using (fractional) Bayes factor tests. The support for a unidimensional factor (i.e., assumption of local independence) and differential item functioning are evaluated by testing the covariance components. The posterior distribution of common covariance components is obtained in closed form by transforming latent responses with an orthogonal (Helmert) matrix. This posterior distribution is defined as a shifted-inverse-gamma, thereby introducing a default prior and a balanced prior distribution. Based on that, an MCMC algorithm is described to estimate all model parameters and to compute (fractional) Bayes factor tests. Simulation studies are used to show that the (fractional) Bayes factor tests have good properties for testing the underlying covariance structure of binary response data. The method is illustrated with two real data studies.

  2. Developing and Evaluating a Machine-Scorable, Constrained Constructed-Response Item.

    ERIC Educational Resources Information Center

    Braun, Henry I.; And Others

    The use of constructed response items in large scale standardized testing has been hampered by the costs and difficulties associated with obtaining reliable scores. The advent of expert systems may signal the eventual removal of this impediment. This study investigated the accuracy with which expert systems could score a new, non-multiple choice…

  3. Item analysis of three Spanish naming tests: a cross-cultural investigation.

    PubMed

    Marquez de la Plata, Carlos; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C Munro

    2009-01-01

    Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test's construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (136 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided.

  4. ITEM ANALYSIS OF THREE SPANISH NAMING TESTS: A CROSS-CULTURAL INVESTIGATION

    PubMed Central

    de la Plata, Carlos Marquez; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C. Munro

    2009-01-01

    Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test’s construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (126 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided. PMID:19208960

  5. Lower-fat menu items in restaurants satisfy customers.

    PubMed

    Fitzpatrick, M P; Chapman, G E; Barr, S I

    1997-05-01

    To evaluate a restaurant-based nutrition program by measuring customer satisfaction with lower-fat menu items and assessing patrons' reactions to the program. Questionnaires to assess satisfaction with menu items were administered to patrons in eight of the nine restaurants that volunteered to participate in the nutrition program. One patron from each participating restaurant was randomly selected for a semistructured interview about nutrition programming in restaurants. Persons dining in eight participating restaurants over a 1-week period (n = 686). Independent samples t tests were used to compare respondents' satisfaction with lower-fat and regular menu items. Two-way analysis of variance tests were completed using overall satisfaction as the dependent variable and menu-item classification (ie, lower fat or regular) and one of eight other menu item and respondent characteristics as independent variables. Qualitative methods were used to analyze interview transcripts. Of 1,127 menu items rated for satisfaction, 205 were lower fat, 878 were regular, and 44 were of unknown classification. Customers were significantly more satisfied with lower-fat than with regular menu items (P < .001). Overall satisfaction did not vary by any of the other independent variables. Interview results indicate the importance of restaurant during as an indulgent experience. High satisfaction with lower-fat menu items suggests that customers will support restaurant providing such choices. Dietitians can use these findings to encourage restaurateurs to include lower-fat choices on their menus, and to assure clients that their expectations of being indulged are not incompatible with these choices.

  6. Oxytocin Increases the Perceived Value of Both Self- and Other-Owned Items and Alters Medial Prefrontal Cortex Activity in an Endowment Task.

    PubMed

    Zhao, Weihua; Geng, Yayuan; Luo, Lizhu; Zhao, Zhiying; Ma, Xiaole; Xu, Lei; Yao, Shuxia; Kendrick, Keith M

    2017-01-01

    The neuropeptide oxytocin (OXT) can influence self-processing and may help motivate us to value the attributes of others in a more self-like manner by reducing medial prefrontal cortex (mPFC) responses. We do not know however whether this OXT effect extends to possessions. We tend to place a higher monetary value on specific objects that belong to us compared to others, known as the "endowment effect". In two double-blind, between-subject placebo (PLC) controlled experiments in subjects from a collectivist culture, we investigated the influence of intranasal OXT on the endowment effect, with the second study incorporating functional magnetic resonance imaging (fMRI). In the task, subjects decided whether to buy or sell their own or others' (mother/father/classmate/stranger) possessions at various prices. Both experiments demonstrated an endowment effect in the self-owned condition which extended to close others (mother/father) and OXT increased this for self and all other-owned items. This OXT effect was associated with reduced activity in the ventral mPFC (vmPFC) in the self-owned condition but increased in the mother-condition. For the classmate- and stranger-owned conditions OXT increased activity in the dorsal mPFC (dmPFC). Changes in vmPFC activation were associated with the size of the endowment effect for self- and mother-owned items. Functional connectivity between the dmPFC and ventral striatum (VStr) was reduced by OXT in self- and mother-owned conditions and between vmPFC and precuneus in the self-condition. Overall our results show that OXT enhances the endowment effect for both self- and other-owned items in Chinese subjects. This effect is associated with reduced mPFC activation in the self-condition but enhanced activation in all other-conditions and involves differential actions on both dorsal and ventral regions as well as functional connectivity with brain reward and other self-processing regions. Overall our findings suggest that OXT increases

  7. Repeated retrieval practice and item difficulty: does criterion learning eliminate item difficulty effects?

    PubMed

    Vaughn, Kalif E; Rawson, Katherine A; Pyc, Mary A

    2013-12-01

    A wealth of previous research has established that retrieval practice promotes memory, particularly when retrieval is successful. Although successful retrieval promotes memory, it remains unclear whether successful retrieval promotes memory equally well for items of varying difficulty. Will easy items still outperform difficult items on a final test if all items have been correctly recalled equal numbers of times during practice? In two experiments, normatively difficult and easy Lithuanian-English word pairs were learned via test-restudy practice until each item had been correctly recalled a preassigned number of times (from 1 to 11 correct recalls). Despite equating the numbers of successful recalls during practice, performance on a delayed final cued-recall test was lower for difficult than for easy items. Experiment 2 was designed to diagnose whether the disadvantage for difficult items was due to deficits in cue memory, target memory, and/or associative memory. The results revealed a disadvantage for the difficult versus the easy items only on the associative recognition test, with no differences on cue recognition, and even an advantage on target recognition. Although successful retrieval enhanced memory for both difficult and easy items, equating retrieval success during practice did not eliminate normative item difficulty differences.

  8. Evaluating and Refining the Construct of Sexual Quality With Item Response Theory: Development of the Quality of Sex Inventory.

    PubMed

    Shaw, Amanda M; Rogge, Ronald D

    2016-02-01

    This study took a critical look at the construct of sexual quality. The 65 items of four well-validated self-report measures of sexual satisfaction (the Index of Sexual Satisfaction [ISS], Hudson, Harrison, & Crosscup, 1981; the Global Measure of Sexual Satisfaction [GMSEX], Lawrance & Byers, 1995; the Pinney Sexual Satisfaction Inventory [PSSI], Pinney, Gerrard, & Denney, 1987; the Young Sexual Satisfaction Scale [YSSS], Young, Denny, Luquis, & Young, 1998) and an additional 74 potential sexual quality items were given to 3060 online participants. Using Item Response Theory (IRT), we demonstrated that the ISS, YSSS, and PSSI scales provided suboptimal levels of precision in assessing sexual quality, particularly given the length of those scales. Exploratory factor analyses, IRT, differential item functioning analyses, and longitudinal responsiveness analyses were used to develop and evaluate the Quality of Sex Inventory. Results suggested that, in comparison to existing scales, the QSI (1) offers investigators and clinicians more theoretically focused scales, (2) distinguishes sexual satisfaction from sexual dissatisfaction, and (3) offers greater precision and power for detecting differences with (4) comparably high levels of responsiveness for detecting change over time despite being notably shorter than most of the existing scales. The QSI-satisfaction subscales demonstrated strong convergent validity with other measures of sexual satisfaction and excellent construct validity with anchor scales from the nomological net surrounding that construct, suggesting that they continue to assess the same theoretical construct as prior scales. Implications for research are discussed.

  9. An Empirical Investigation of Methods for Assessing Item Fit for Mixed Format Tests

    ERIC Educational Resources Information Center

    Chon, Kyong Hee; Lee, Won-Chan; Ansley, Timothy N.

    2013-01-01

    Empirical information regarding performance of model-fit procedures has been a persistent need in measurement practice. Statistical procedures for evaluating item fit were applied to real test examples that consist of both dichotomously and polytomously scored items. The item fit statistics used in this study included the PARSCALE's G[squared],…

  10. Combining item response theory with multiple imputation to equate health assessment questionnaires.

    PubMed

    Gu, Chenyang; Gutman, Roee

    2017-09-01

    The assessment of patients' functional status across the continuum of care requires a common patient assessment tool. However, assessment tools that are used in various health care settings differ and cannot be easily contrasted. For example, the Functional Independence Measure (FIM) is used to evaluate the functional status of patients who stay in inpatient rehabilitation facilities, the Minimum Data Set (MDS) is collected for all patients who stay in skilled nursing facilities, and the Outcome and Assessment Information Set (OASIS) is collected if they choose home health care provided by home health agencies. All three instruments or questionnaires include functional status items, but the specific items, rating scales, and instructions for scoring different activities vary between the different settings. We consider equating different health assessment questionnaires as a missing data problem, and propose a variant of predictive mean matching method that relies on Item Response Theory (IRT) models to impute unmeasured item responses. Using real data sets, we simulated missing measurements and compared our proposed approach to existing methods for missing data imputation. We show that, for all of the estimands considered, and in most of the experimental conditions that were examined, the proposed approach provides valid inferences, and generally has better coverages, relatively smaller biases, and shorter interval estimates. The proposed method is further illustrated using a real data set. © 2016, The International Biometric Society.

  11. Item Response Modeling: An Evaluation of the Children's Fruit and Vegetable Self-Efficacy Questionnaire

    ERIC Educational Resources Information Center

    Watson, Kathy; Baranowski, Tom; Thompson, Debbe

    2006-01-01

    Perceived self-efficacy (SE) for eating fruit and vegetables (FV) is a key variable mediating FV change in interventions. This study applies item response modeling (IRM) to a fruit, juice and vegetable self-efficacy questionnaire (FVSEQ) previously validated with classical test theory (CTT) procedures. The 24-item (five-point Likert scale) FVSEQ…

  12. A Psychometric Evaluation of the DSM-IV Criteria for Antisocial Personality Disorder: Dimensionality, Local Reliability, and Differential Item Functioning Across Gender.

    PubMed

    Paap, Muirne C S; Braeken, Johan; Pedersen, Geir; Urnes, Øyvind; Karterud, Sigmund; Wilberg, Theresa; Hummelen, Benjamin

    2017-12-01

    This study aims at evaluating the psychometric properties of the antisocial personality disorder (ASPD) criteria in a large sample of patients, most of whom had one or more personality disorders (PD). PD diagnoses were assessed by experienced clinicians using the Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders, 4th edition, Axis II PDs. Analyses were performed within an item response theory framework. Results of the analyses indicated that ASPD is a unidimensional construct that can be measured reliably at the upper range of the latent trait scale. Differential item functioning across gender was restricted to two criteria and had little impact on the latent ASPD trait level. Patients fulfilling both the adult ASPD criteria and the conduct disorder criteria had similar latent trait distributions as patients fulfilling only the adult ASPD criteria. Overall, the ASPD items fit the purpose of a diagnostic instrument well, that is, distinguishing patients with moderate from those with high antisocial personality scores.

  13. Interactions Between Item Content And Group Membership on Achievement Test Items.

    ERIC Educational Resources Information Center

    Linn, Robert L.; Harnisch, Delwyn L.

    The purpose of this investigation was to examine the interaction of item content and group membership on achievement test items. Estimates of the parameters of the three parameter logistic model were obtained on the 46 item math test for the sample of eighth grade students (N = 2055) participating in the Illinois Inventory of Educational Progress,…

  14. Item difficulty and item validity for the Children's Group Embedded Figures Test.

    PubMed

    Rusch, R R; Trigg, C L; Brogan, R; Petriquin, S

    1994-02-01

    The validity and reliability of the Children's Group Embedded Figures Test was reported for students in Grade 2 by Cromack and Stone in 1980; however, a search of the literature indicates no evidence for internal consistency or item analysis. Hence the purpose of this study was to examine the item difficulty and item validity of the test with children in Grades 1 and 2. Confusion in the literature over development and use of this test was seemingly resolved through analysis of these descriptions and through an interview with the test developer. One early-appearing item was unreasonably difficult. Two or three other items were quite difficult and made little contribution to the total score. Caution is recommended, however, in any reordering or elimination of items based on these findings, given the limited number of subjects (n = 84).

  15. Validation of the Neighborhood Environment Walkability Scale (NEWS) items using geographic information systems.

    PubMed

    Adams, Marc A; Ryan, Sherry; Kerr, Jacqueline; Sallis, James F; Patrick, Kevin; Frank, Lawrence D; Norman, Gregory J

    2009-01-01

    Concurrent validity of Neighborhood Environment Walkability Scale (NEWS) items was evaluated with objective measures of the built environment using geographic information systems (GIS). A sample of 878 parents of children 10 to 16 years old (mean age 43.5 years, SD = 6.8, 34.8% non-White, 63.8% overweight) completed NEWS and the International Physical Activity Questionnaire. GIS was used to develop 1-mile street network buffers around participants' residences. GIS measures of the built environment within participants' buffers included percent of commercial and institutional land uses; number of schools and colleges, recreational facilities, parks, transit stops, and trees; land topography; and traffic congestion. Except for trees and traffic, concordance between the NEWS and GIS measures were significant, with weak to moderate effect sizes (r = -0.09 to -0.36, all P < or = 01). After participants were stratified by physical activity level, stronger concordance was observed among active participants for some measures. A sensitivity analysis of self-reported distance to 15 neighborhood destinations found a 20-minute (compared with 10- or 30-minute) walking threshold generally had the strongest correlations with GIS measures. These findings provide evidence of the concurrent validity of self-reported built environment items with objective measures. Physically active adults may be more knowledgeable about their neighborhood characteristics.

  16. Item Dependency in an Objective Structured Clinical Examination

    ERIC Educational Resources Information Center

    Iramaneerat, Cherdsak; Myford, Carol M.; Yudkowsky, Rachel

    2006-01-01

    An Objective Structured Clinical Examination (OSCE) is an assessment approach employed in medical education, in which residents rotate through multiple stations of standardized clinical tasks to evaluate their clinical competence. Because items used to evaluate residents' performance in each OSCE station are linked to the same task and are rated…

  17. Examination of the Brief Fear of Negative Evaluation Scale-Version 2 and the Brief Fear of Negative Evaluation Scale-Straightforward Items Factor Structure in a Sample of U.S. College Students

    ERIC Educational Resources Information Center

    Liu, Liu; Lowe, Patricia A.

    2016-01-01

    The current study examined the factor structure of the Brief Fear of Negative Evaluation-Straightforward Items (BFNE-S) and the Brief Fear of Negative Evaluation-Version 2 (BFNE-II) among 151 college students from the United States. Results indicated that the BFNE-S and the BFNE-II scores demonstrated excellent internal consistency reliability.…

  18. A Study of the Homogeneity of Items Produced From Item Forms Across Different Taxonomic Levels.

    ERIC Educational Resources Information Center

    Weber, Margaret B.; Argo, Jana K.

    This study determined whether item forms ( rules for constructing items related to a domain or set of tasks) would enable naive item writers to generate multiple-choice items at three taxonomic levels--knowledge, comprehension, and application. Students wrote 120 multiple-choice items from 20 item forms, corresponding to educational objectives…

  19. Differential Item Functioning Analysis Using Rasch Item Information Functions

    ERIC Educational Resources Information Center

    Wyse, Adam E.; Mapuranga, Raymond

    2009-01-01

    Differential item functioning (DIF) analysis is a statistical technique used for ensuring the equity and fairness of educational assessments. This study formulates a new DIF analysis method using the information similarity index (ISI). ISI compares item information functions when data fits the Rasch model. Through simulations and an international…

  20. Evaluation of Floors and Item Gradients for Reading and Math Tests for Young Children

    ERIC Educational Resources Information Center

    Bradley-Johnson, Sharon; Durmusoglu, Gokce

    2005-01-01

    Ignoring the adequacy of floors and item gradients for tests used with young children can have serious consequences. Thus, because of the importance of early intervention for reading and math problems, we used the criteria suggested by Bracken for adequate floors and item gradients, and reviewed 15 reading tests and 12 math tests for ages 4-0…

  1. Assessing the Utility of Item Response Theory Models: Differential Item Functioning.

    ERIC Educational Resources Information Center

    Scheuneman, Janice Dowd

    The current status of item response theory (IRT) is discussed. Several IRT methods exist for assessing whether an item is biased. Focus is on methods proposed by L. M. Rudner (1975), F. M. Lord (1977), D. Thissen et al. (1988) and R. L. Linn and D. Harnisch (1981). Rudner suggested a measure of the area lying between the two item characteristic…

  2. Introduction to Multilevel Item Response Theory Analysis: Descriptive and Explanatory Models

    ERIC Educational Resources Information Center

    Sulis, Isabella; Toland, Michael D.

    2017-01-01

    Item response theory (IRT) models are the main psychometric approach for the development, evaluation, and refinement of multi-item instruments and scaling of latent traits, whereas multilevel models are the primary statistical method when considering the dependence between person responses when primary units (e.g., students) are nested within…

  3. Impact of Eliminating Anchor Items Flagged from Statistical Criteria on Test Score Classifications in Common Item Equating

    ERIC Educational Resources Information Center

    Karkee, Thakur; Choi, Seung

    2005-01-01

    Proper maintenance of a scale established in the baseline year would assure the accurate estimation of growth in subsequent years. Scale maintenance is especially important when the state performance standards must be preserved for future administrations. To ensure proper maintenance of a scale, the selection of anchor items and evaluation of…

  4. Differential item functioning magnitude and impact measures from item response theory models.

    PubMed

    Kleinman, Marjorie; Teresi, Jeanne A

    2016-01-01

    Measures of magnitude and impact of differential item functioning (DIF) at the item and scale level, respectively are presented and reviewed in this paper. Most measures are based on item response theory models. Magnitude refers to item level effect sizes, whereas impact refers to differences between groups at the scale score level. Reviewed are magnitude measures based on group differences in the expected item scores and impact measures based on differences in the expected scale scores. The similarities among these indices are demonstrated. Various software packages are described that provide magnitude and impact measures, and new software presented that computes all of the available statistics conveniently in one program with explanations of their relationships to one another.

  5. Preliminary development of an ultrabrief two-item bedside test for delirium.

    PubMed

    Fick, Donna M; Inouye, Sharon K; Guess, Jamey; Ngo, Long H; Jones, Richard N; Saczynski, Jane S; Marcantonio, Edward R

    2015-10-01

    Delirium is common, morbid, and costly, yet is greatly under-recognized among hospitalized older adults. To identify the best single and pair of mental status test items that predict the presence of delirium. Diagnostic test evaluation study that enrolled medicine inpatients aged 75 years or older at an academic medical center. Patients underwent a clinical reference standard assessment involving a patient interview, medical record review, and interviews with family members and nurses to determine the presence or absence of Diagnostic and Statistical Manual of Mental Disorders, 4th Edition defined delirium. Participants also underwent the three-dimensional Confusion Assessment Method (3D-CAM), a brief, validated assessment for delirium. Individual items and pairs of items from the 3D-CAM were evaluated to determine sensitivity and specificity relative to the reference standard delirium diagnosis. Of the 201 participants (mean age 84 years, 62% female), 42 (21%) had delirium based on the clinical reference standard. The single item with the best test characteristics was "months of the year backwards" with a sensitivity of 83% (95% confidence interval [CI]: 69%-93%) and specificity of 69% (95% CI: 61%-76%). The best 2-item screen was the combination of "months of the year backwards" and "what is the day of the week?" with a sensitivity of 93% (95% CI: 81%-99%) and specificity of 64% (95% CI: 56%-70%). We identified a single item with >80% and pair of items with >90% sensitivity for delirium. If validated prospectively, these items will serve as an initial innovative screening step for delirium identification in hospitalized older adults. © 2015 Society of Hospital Medicine.

  6. Consequences of Ignoring Guessing when Estimating the Latent Density in Item Response Theory

    ERIC Educational Resources Information Center

    Woods, Carol M.

    2008-01-01

    In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters. In extant Monte Carlo evaluations of RC-IRT, the item response function (IRF) used to fit the data is the same one used to generate the data. The present simulation study examines RC-IRT when the IRF is imperfectly…

  7. The Single-Item Math Anxiety Scale: An Alternative Way of Measuring Mathematical Anxiety

    ERIC Educational Resources Information Center

    Núñez-Peña, M. Isabel; Guilera, Georgina; Suárez-Pellicioni, Macarena

    2014-01-01

    This study examined whether the Single-Item Math Anxiety Scale (SIMA), based on the item suggested by Ashcraft, provided valid and reliable scores of mathematical anxiety. A large sample of university students (n = 279) was administered the SIMA and the 25-item Shortened Math Anxiety Rating Scale (sMARS) to evaluate the relation between the scores…

  8. Evaluation properties of the French version of the OUT-PATSAT35 satisfaction with care questionnaire according to classical and item response theory analyses.

    PubMed

    Panouillères, M; Anota, A; Nguyen, T V; Brédart, A; Bosset, J F; Monnier, A; Mercier, M; Hardouin, J B

    2014-09-01

    The present study investigates the properties of the French version of the OUT-PATSAT35 questionnaire, which evaluates the outpatients' satisfaction with care in oncology using classical analysis (CTT) and item response theory (IRT). This cross-sectional multicenter study includes 692 patients who completed the questionnaire at the end of their ambulatory treatment. CTT analyses tested the main psychometric properties (convergent and divergent validity, and internal consistency). IRT analyses were conducted separately for each OUT-PATSAT35 domain (the doctors, the nurses or the radiation therapists and the services/organization) by models from the Rasch family. We examined the fit of the data to the model expectations and tested whether the model assumptions of unidimensionality, monotonicity and local independence were respected. A total of 605 (87.4%) respondents were analyzed with a mean age of 64 years (range 29-88). Internal consistency for all scales separately and for the three main domains was good (Cronbach's α 0.74-0.98). IRT analyses were performed with the partial credit model. No disordered thresholds of polytomous items were found. Each domain showed high reliability but fitted poorly to the Rasch models. Three items in particular, the item about "promptness" in the doctors' domain and the items about "accessibility" and "environment" in the services/organization domain, presented the highest default of fit. A correct fit of the Rasch model can be obtained by dropping these items. Most of the local dependence concerned items about "information provided" in each domain. A major deviation of unidimensionality was found in the nurses' domain. CTT showed good psychometric properties of the OUT-PATSAT35. However, the Rasch analysis revealed some misfitting and redundant items. Taking the above problems into consideration, it could be interesting to refine the questionnaire in a future study.

  9. Preequating with Empirical Item Characteristic Curves: An Observed-Score Preequating Method

    ERIC Educational Resources Information Center

    Zu, Jiyun; Puhan, Gautam

    2014-01-01

    Preequating is in demand because it reduces score reporting time. In this article, we evaluated an observed-score preequating method: the empirical item characteristic curve (EICC) method, which makes preequating without item response theory (IRT) possible. EICC preequating results were compared with a criterion equating and with IRT true-score…

  10. Item Analyses of Memory Differences

    PubMed Central

    Salthouse, Timothy A.

    2017-01-01

    Objective Although performance on memory and other cognitive tests is usually assessed with a score aggregated across multiple items, potentially valuable information is also available at the level of individual items. Method The current study illustrates how analyses of variance with item as one of the factors, and memorability analyses in which item accuracy in one group is plotted as a function of item accuracy in another group, can provide a more detailed characterization of the nature of group differences in memory. Data are reported for two memory tasks, word recall and story memory, across age, ability, repetition, delay, and longitudinal contrasts. Results The item-level analyses revealed evidence for largely uniform differences across items in the age, ability, and longitudinal contrasts, but differential patterns across items in the repetition contrast, and unsystematic item relations in the delay contrast. Conclusion Analyses at the level of individual items have the potential to indicate the manner by which group differences in the aggregate test score are achieved. PMID:27618285

  11. Processes and Metrics to Evaluate Faculty Practice Activities at US Schools of Pharmacy.

    PubMed

    Haines, Stuart T; Sicat, Brigitte L; Haines, Seena L; MacLaughlin, Eric J; Van Amburgh, Jenny A

    2016-05-25

    Objective. To determine what processes and metrics are employed to measure and evaluate pharmacy practice faculty members at colleges and schools of pharmacy in the United States. Methods. A 23-item web-based questionnaire was distributed to pharmacy practice department chairs at schools of pharmacy fully accredited by the Accreditation Council for Pharmacy Education (ACPE) (n=114). Results. Ninety-three pharmacy practice chairs or designees from 92 institutions responded. Seventy-six percent reported that more than 60% of the department's faculty members were engaged in practice-related activities at least eight hours per week. Fewer than half (47%) had written policies and procedures for conducting practice evaluations. Institutions commonly collected data regarding committee service at practice sites, community service events, educational programs, and number of hours engaged in practice-related activities; however, only 24% used a tool to longitudinally collect practice-related data. Publicly funded institutions were more likely than private schools to have written procedures. Conclusion. Data collection tools and best practice recommendations for conducting faculty practice evaluations are needed.

  12. Sleep can reduce the testing effect: it enhances recall of restudied items but can leave recall of retrieved items unaffected.

    PubMed

    Bäuml, Karl-Heinz T; Holterman, Christoph; Abel, Magdalena

    2014-11-01

    The testing effect refers to the finding that retrieval practice in comparison to restudy of previously encoded contents can improve memory performance and reduce time-dependent forgetting. Naturally, long retention intervals include both wake and sleep delay, which can influence memory contents differently. In fact, sleep immediately after encoding can induce a mnemonic benefit, stabilizing and strengthening the encoded contents. We investigated in a series of 5 experiments whether sleep influences the testing effect. After initial study of categorized item material (Experiments 1, 2, and 4A), paired associates (Experiment 3), or educational text material (Experiment 4B), subjects were asked to restudy encoded contents or engage in active retrieval practice. A final recall test was conducted after a 12-hr delay that included diurnal wakefulness or nocturnal sleep. The results consistently showed typical testing effects after the wake delay. However, these testing effects were reduced or even eliminated after sleep, because sleep benefited recall of restudied items but left recall of retrieved items unaffected. The findings are consistent with the bifurcation model of the testing effect (Kornell, Bjork, & Garcia, 2011), according to which the distribution of memory strengths across items is shifted differentially by retrieving and restudying, with retrieval strengthening items to a much higher degree than restudy does. On the basis of this model, most of the retrieved items already fall above recall threshold in the absence of sleep, so additional sleep-induced strengthening may not improve recall of retrieved items any further. PsycINFO Database Record (c) 2014 APA, all rights reserved.

  13. Exploring Alternative Conceptions from Newtonian Dynamics and Simple DC Circuits: Links between Item Difficulty and Item Confidence

    ERIC Educational Resources Information Center

    Planinic, Maja; Boone, William J.; Krsnik, Rudolf; Beilfuss, Meredith L.

    2006-01-01

    Croatian 1st-year and 3rd-year high-school students (N = 170) completed a conceptual physics test. Students were evaluated with regard to two physics topics: Newtonian dynamics and simple DC circuits. Students answered test items and also indicated their confidence in each answer. Rasch analysis facilitated the calculation of three linear…

  14. Designing P-Optimal Item Pools in Computerized Adaptive Tests with Polytomous Items

    ERIC Educational Resources Information Center

    Zhou, Xuechun

    2012-01-01

    Current CAT applications consist of predominantly dichotomous items, and CATs with polytomously scored items are limited. To ascertain the best approach to polytomous CAT, a significant amount of research has been conducted on item selection, ability estimation, and impact of termination rules based on polytomous IRT models. Few studies…

  15. The Usefulness of Differential Item Functioning Methodology in Longitudinal Intervention Studies

    USDA-ARS?s Scientific Manuscript database

    Perceived self-efficacy (SE) for engaging in physical activity (PA) is a key variable mediating PA change in interventions. The purpose of this study is to demonstrate the usefulness of item response modeling-based (IRM) differential item functioning (DIF) in the investigation of group differences ...

  16. Identifying Items to Assess Methodological Quality in Physical Therapy Trials: A Factor Analysis

    PubMed Central

    Cummings, Greta G.; Fuentes, Jorge; Saltaji, Humam; Ha, Christine; Chisholm, Annabritt; Pasichnyk, Dion; Rogers, Todd

    2014-01-01

    Background Numerous tools and individual items have been proposed to assess the methodological quality of randomized controlled trials (RCTs). The frequency of use of these items varies according to health area, which suggests a lack of agreement regarding their relevance to trial quality or risk of bias. Objective The objectives of this study were: (1) to identify the underlying component structure of items and (2) to determine relevant items to evaluate the quality and risk of bias of trials in physical therapy by using an exploratory factor analysis (EFA). Design A methodological research design was used, and an EFA was performed. Methods Randomized controlled trials used for this study were randomly selected from searches of the Cochrane Database of Systematic Reviews. Two reviewers used 45 items gathered from 7 different quality tools to assess the methodological quality of the RCTs. An exploratory factor analysis was conducted using the principal axis factoring (PAF) method followed by varimax rotation. Results Principal axis factoring identified 34 items loaded on 9 common factors: (1) selection bias; (2) performance and detection bias; (3) eligibility, intervention details, and description of outcome measures; (4) psychometric properties of the main outcome; (5) contamination and adherence to treatment; (6) attrition bias; (7) data analysis; (8) sample size; and (9) control and placebo adequacy. Limitation Because of the exploratory nature of the results, a confirmatory factor analysis is needed to validate this model. Conclusions To the authors' knowledge, this is the first factor analysis to explore the underlying component items used to evaluate the methodological quality or risk of bias of RCTs in physical therapy. The items and factors represent a starting point for evaluating the methodological quality and risk of bias in physical therapy trials. Empirical evidence of the association among these items with treatment effects and a confirmatory factor

  17. Mediate gamma radiation effects on some packaged food items

    NASA Astrophysics Data System (ADS)

    Inamura, Patricia Y.; Uehara, Vanessa B.; Teixeira, Christian A. H. M.; del Mastro, Nelida L.

    2012-08-01

    For most of prepackaged foods a 10 kGy radiation dose is considered the maximum dose needed; however, the commercially available and practically accepted packaging materials must be suitable for such application. This work describes the application of ionizing radiation on several packaged food items, using 5 dehydrated food items, 5 ready-to-eat meals and 5 ready-to-eat food items irradiated in a 60Co gamma source with a 3 kGy dose. The quality evaluation of the irradiated samples was performed 2 and 8 months after irradiation. Microbiological analysis (bacteria, fungus and yeast load) was performed. The sensory characteristics were established for appearance, aroma, texture and flavor attributes were also established. From these data, the acceptability of all irradiated items was obtained. All ready-to-eat food items assayed like manioc flour, some pâtés and blocks of raw brown sugar and most of ready-to-eat meals like sausages and chicken with legumes were considered acceptable for microbial and sensory characteristics. On the other hand, the dehydrated food items chosen for this study, such as dehydrated bacon potatoes or pea soups were not accepted by the sensory analysis. A careful dose choice and special irradiation conditions must be used in order to achieve sensory acceptability needed for the commercialization of specific irradiated food items.

  18. Explaining Crossing DIF in Polytomous Items Using Differential Step Functioning Effects

    ERIC Educational Resources Information Center

    Penfield, Randall D.

    2010-01-01

    Crossing, or intersecting, differential item functioning (DIF) is a form of nonuniform DIF that exists when the sign of the between-group difference in expected item performance changes across the latent trait continuum. The presence of crossing DIF presents a problem for many statistics developed for evaluating DIF because positive and negative…

  19. Geography, Years 7-10, Library of Test Items. Volume Eight. Junior Secondary Items To Be Used With 1976 to 1980 H.S.C. Geography Exam. Broadsheets.

    ERIC Educational Resources Information Center

    Kouimanos, John, Ed.

    As one in a series of test item collections developed by the Assessment and Evaluation Unit of the Directorate of Studies, items of value from past tests are made available to teachers for the construction of unit tests, term examinations or as a basis for class discussion. Each collection was reviewed for content validity and reliability. The…

  20. Thirty Years of Nonparametric Item Response Theory.

    ERIC Educational Resources Information Center

    Molenaar, Ivo W.

    2001-01-01

    Discusses relationships between a mathematical measurement model and its real-world applications. Makes a distinction between large-scale data matrices commonly found in educational measurement and smaller matrices found in attitude and personality measurement. Also evaluates nonparametric methods for estimating item response functions and…

  1. Evolution of a Test Item

    ERIC Educational Resources Information Center

    Spaan, Mary

    2007-01-01

    This article follows the development of test items (see "Language Assessment Quarterly", Volume 3 Issue 1, pp. 71-79 for the article "Test and Item Specifications Development"), beginning with a review of test and item specifications, then proceeding to writing and editing of items, pretesting and analysis, and finally selection of an item for a…

  2. Item Banking. ERIC/AE Digest.

    ERIC Educational Resources Information Center

    Rudner, Lawrence

    This digest discusses the advantages and disadvantages of using item banks, and it provides useful information for those who are considering implementing an item banking project in their school districts. The primary advantage of item banking is in test development. Using an item response theory method, such as the Rasch model, items from multiple…

  3. Item Response Theory and Health Outcomes Measurement in the 21st Century

    PubMed Central

    Hays, Ron D.; Morales, Leo S.; Reise, Steve P.

    2006-01-01

    Item response theory (IRT) has a number of potential advantages over classical test theory in assessing self-reported health outcomes. IRT models yield invariant item and latent trait estimates (within a linear transformation), standard errors conditional on trait level, and trait estimates anchored to item content. IRT also facilitates evaluation of differential item functioning, inclusion of items with different response formats in the same scale, and assessment of person fit and is ideally suited for implementing computer adaptive testing. Finally, IRT methods can be helpful in developing better health outcome measures and in assessing change over time. These issues are reviewed, along with a discussion of some of the methodological and practical challenges in applying IRT methods. PMID:10982088

  4. 41 CFR 101-28.306-6 - Sensitive items.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... Regulations System FEDERAL PROPERTY MANAGEMENT REGULATIONS SUPPLY AND PROCUREMENT 28-STORAGE AND DISTRIBUTION... accountable item of personal property. Each customer activity shall take all appropriate measures necessary to... Government use. ...

  5. Effects of spacing of item repetitions in continuous recognition memory: does item retrieval difficulty promote item retention in older adults?

    PubMed

    Kılıç, Aslı; Hoyer, William J; Howard, Marc W

    2013-01-01

    BACKGROUND/STUDY CONTEXT: Older adults exhibit an age-related deficit in item memory as a function of the length of the retention interval, but older adults and young adults usually show roughly equivalent benefits due to the spacing of item repetitions in continuous memory tasks. The current experiment investigates the seemingly paradoxical effects of retention interval and spacing in young and older adults using a continuous recognition memory procedure. Fifty young adults and 52 older adults gave memory confidence ratings to words that were presented once (P1), twice (P2), or three times (P3), and the effects of the lag length and retention interval were assessed at P2 and at P3, respectively. Response times at P2 were disproportionately longer for older adults than for younger adults as a function of the number of items occurring between P1 and P2, suggestive of age-related loss in item memory. Ratings of confidence in memory responses revealed that older adults remembered fewer items at P2 with a high degree of certainty. Confidence ratings given at P3 suggested that young and older adults derived equivalent benefits from the spacing between P1 and P2. Findings of this study support theoretical accounts that suggest that recursive reminding and/or item retrieval difficulty promote item retention in older adults.

  6. Oxytocin Increases the Perceived Value of Both Self- and Other-Owned Items and Alters Medial Prefrontal Cortex Activity in an Endowment Task

    PubMed Central

    Zhao, Weihua; Geng, Yayuan; Luo, Lizhu; Zhao, Zhiying; Ma, Xiaole; Xu, Lei; Yao, Shuxia; Kendrick, Keith M.

    2017-01-01

    The neuropeptide oxytocin (OXT) can influence self-processing and may help motivate us to value the attributes of others in a more self-like manner by reducing medial prefrontal cortex (mPFC) responses. We do not know however whether this OXT effect extends to possessions. We tend to place a higher monetary value on specific objects that belong to us compared to others, known as the “endowment effect”. In two double-blind, between-subject placebo (PLC) controlled experiments in subjects from a collectivist culture, we investigated the influence of intranasal OXT on the endowment effect, with the second study incorporating functional magnetic resonance imaging (fMRI). In the task, subjects decided whether to buy or sell their own or others’ (mother/father/classmate/stranger) possessions at various prices. Both experiments demonstrated an endowment effect in the self-owned condition which extended to close others (mother/father) and OXT increased this for self and all other-owned items. This OXT effect was associated with reduced activity in the ventral mPFC (vmPFC) in the self-owned condition but increased in the mother-condition. For the classmate- and stranger-owned conditions OXT increased activity in the dorsal mPFC (dmPFC). Changes in vmPFC activation were associated with the size of the endowment effect for self- and mother-owned items. Functional connectivity between the dmPFC and ventral striatum (VStr) was reduced by OXT in self- and mother-owned conditions and between vmPFC and precuneus in the self-condition. Overall our results show that OXT enhances the endowment effect for both self- and other-owned items in Chinese subjects. This effect is associated with reduced mPFC activation in the self-condition but enhanced activation in all other-conditions and involves differential actions on both dorsal and ventral regions as well as functional connectivity with brain reward and other self-processing regions. Overall our findings suggest that OXT

  7. An Examination of Two Procedures for Identifying Consequential Item Parameter Drift

    ERIC Educational Resources Information Center

    Wells, Craig S.; Hambleton, Ronald K.; Kirkpatrick, Robert; Meng, Yu

    2014-01-01

    The purpose of the present study was to develop and evaluate two procedures flagging consequential item parameter drift (IPD) in an operational testing program. The first procedure was based on flagging items that exhibit a meaningful magnitude of IPD using a critical value that was defined to represent barely tolerable IPD. The second procedure…

  8. Process dissociation between contextual retrieval and item recognition.

    PubMed

    Weis, Susanne; Specht, Karsten; Klaver, Peter; Tendolkar, Indira; Willmes, Klaus; Ruhlmann, Jürgen; Elger, Christian E; Fernández, Guillén

    2004-12-22

    We employed a source memory task in an event related fMRI study to dissociate MTL processes associated with either contextual retrieval or item recognition. To introduce context during study, stimuli (photographs of buildings and natural landscapes) were transformed into one of four single-color-scales: red, blue, yellow, or green. In the subsequent old/new recognition memory test, all stimuli were presented as gray scale photographs, and old-responses were followed by a four-alternative source judgment referring to the color in which the stimulus was presented during study. Our results suggest a clear-cut process dissociation within the human MTL. While an activity increase accompanies successful retrieval of contextual information, an activity decrease provides a familiarity signal that is sufficient for successful item recognition.

  9. Modeling the World Health Organization Disability Assessment Schedule II using non-parametric item response models.

    PubMed

    Galindo-Garre, Francisca; Hidalgo, María Dolores; Guilera, Georgina; Pino, Oscar; Rojo, J Emilio; Gómez-Benito, Juana

    2015-03-01

    The World Health Organization Disability Assessment Schedule II (WHO-DAS II) is a multidimensional instrument developed for measuring disability. It comprises six domains (getting around, self-care, getting along with others, life activities and participation in society). The main purpose of this paper is the evaluation of the psychometric properties for each domain of the WHO-DAS II with parametric and non-parametric Item Response Theory (IRT) models. A secondary objective is to assess whether the WHO-DAS II items within each domain form a hierarchy of invariantly ordered severity indicators of disability. A sample of 352 patients with a schizophrenia spectrum disorder is used in this study. The 36 items WHO-DAS II was administered during the consultation. Partial Credit and Mokken scale models are used to study the psychometric properties of the questionnaire. The psychometric properties of the WHO-DAS II scale are satisfactory for all the domains. However, we identify a few items that do not discriminate satisfactorily between different levels of disability and cannot be invariantly ordered in the scale. In conclusion the WHO-DAS II can be used to assess overall disability in patients with schizophrenia, but some domains are too general to assess functionality in these patients because they contain items that are not applicable to this pathology. Copyright © 2014 John Wiley & Sons, Ltd.

  10. Developing and investigating the use of single-item measures in organizational research.

    PubMed

    Fisher, Gwenith G; Matthews, Russell A; Gibbons, Alyssa Mitchell

    2016-01-01

    The validity of organizational research relies on strong research methods, which include effective measurement of psychological constructs. The general consensus is that multiple item measures have better psychometric properties than single-item measures. However, due to practical constraints (e.g., survey length, respondent burden) there are situations in which certain single items may be useful for capturing information about constructs that might otherwise go unmeasured. We evaluated 37 items, including 18 newly developed items as well as 19 single items selected from existing multiple-item scales based on psychometric characteristics, to assess 18 constructs frequently measured in organizational and occupational health psychology research. We examined evidence of reliability; convergent, discriminant, and content validity assessments; and test-retest reliabilities at 1- and 3-month time lags for single-item measures using a multistage and multisource validation strategy across 3 studies, including data from N = 17 occupational health subject matter experts and N = 1,634 survey respondents across 2 samples. Items selected from existing scales generally demonstrated better internal consistency reliability and convergent validity, whereas these particular new items generally had higher levels of content validity. We offer recommendations regarding when use of single items may be more or less appropriate, as well as 11 items that seem acceptable, 14 items with mixed results that might be used with caution due to mixed results, and 12 items we do not recommend using as single-item measures. Although multiple-item measures are preferable from a psychometric standpoint, in some circumstances single-item measures can provide useful information. (c) 2016 APA, all rights reserved).

  11. A Procedure to Detect Item Bias Present Simultaneously in Several Items

    DTIC Science & Technology

    1991-04-25

    exhibit a coherent and major biasing influence at the test level. In partic- ular, this can be true even if each individual item displays only a minor...response functions (IRFs) without the use of item parameter estimation algorithms when the sample size is too small for their use. Thissen, Steinberg...convention). A random sample of examinees is drawn from each group, and a test of N items is administered to them. Typically it is suspected that a

  12. Using Explanatory Item Response Models to Evaluate Complex Scientific Tasks Designed for the Next Generation Science Standards

    NASA Astrophysics Data System (ADS)

    Chiu, Tina

    This dissertation includes three studies that analyze a new set of assessment tasks developed by the Learning Progressions in Middle School Science (LPS) Project. These assessment tasks were designed to measure science content knowledge on the structure of matter domain and scientific argumentation, while following the goals from the Next Generation Science Standards (NGSS). The three studies focus on the evidence available for the success of this design and its implementation, generally labelled as "validity" evidence. I use explanatory item response models (EIRMs) as the overarching framework to investigate these assessment tasks. These models can be useful when gathering validity evidence for assessments as they can help explain student learning and group differences. In the first study, I explore the dimensionality of the LPS assessment by comparing the fit of unidimensional, between-item multidimensional, and Rasch testlet models to see which is most appropriate for this data. By applying multidimensional item response models, multiple relationships can be investigated, and in turn, allow for a more substantive look into the assessment tasks. The second study focuses on person predictors through latent regression and differential item functioning (DIF) models. Latent regression models show the influence of certain person characteristics on item responses, while DIF models test whether one group is differentially affected by specific assessment items, after conditioning on latent ability. Finally, the last study applies the linear logistic test model (LLTM) to investigate whether item features can help explain differences in item difficulties.

  13. Developing an Interpretation of Item Parameters for Personality Items: Content Correlates of Parameter Estimates.

    ERIC Educational Resources Information Center

    Zickar, Michael J.; Ury, Karen L.

    2002-01-01

    Attempted to relate content features of personality items to item parameter estimates from the partial credit model of E. Muraki (1990) by administering the Adjective Checklist (L. Goldberg, 1992) to 329 undergraduates. As predicted, the discrimination parameter was related to the item subtlety ratings of personality items but the level of word…

  14. Expertise sensitive item selection.

    PubMed

    Chow, P; Russell, H; Traub, R E

    2000-12-01

    In this paper we describe and illustrate a procedure for selecting items from a large pool for a certification test. The proposed procedure, which is intended to improve the alignment of the certification test with on-the-job performance, is based on an expertise sensitive index. This index for an item is the difference between the item's p values for experts and novices. An example is provided of the application of the index for selecting items to be used in certifying bakers.

  15. Rasch analysis of the patient-rated wrist evaluation questionnaire.

    PubMed

    Esakki, Saravanan; MacDermid, Joy C; Vincent, Joshua I; Packham, Tara L; Walton, David; Grewal, Ruby

    2018-01-01

    The Patient-Rated Wrist Evaluation (PRWE) was developed as a wrist joint specific measure of pain and disability and evidence of sound validity has been accumulated through classical psychometric methods. Rasch analysis (RA) has been endorsed as a newer method for analyzing the clinical measurement properties of self-report outcome measures. The purpose of this study was to evaluate the PRWE using Rasch modeling. We employed the Rasch model to assess overall fit, response scaling, individual item fit, differential item functioning (DIF), local dependency, unidimensionality and person separation index (PSI). A convenience sample of 382 patients with distal radius fracture was recruited from the hand and upper limb clinic at large academic healthcare organization, London, Ontario, Canada, 6-month post-injury scores of the PRWE was used. RA was conducted on the 3 subscales (pain, specific activities, and usual activities) of the PRWE separately. The pain subscale adequately fit the Rasch model when item 4 "Pain - When it is at its worst" was deleted to eliminate non-uniform DIF by age group, and item 5 "How often do you have pain" was rescored by collapsing into 8 intervals to eliminate disordered thresholds. Uniform DIF for "Use my affected hand to push up from the chair" (by work status) and "Use bathroom tissue with my affected hand" (by injured hand) was addressed by splitting the items for analysis. After background rescoring of 2 items in pain subscale, 2 items in specific activities and 3 items in usual activities, all three subscales of the PRWE were well targeted and had high reliability (PSI = 0.86). These changes provided a unidimensional, interval-level scaled measure. Like a previous analysis of the Patient-Rated Wrist and Hand Evaluation, this study found the PRWE could be fit to the Rasch model with rescoring of multiple items. However, the modifications required to achieve fit were not the same across studies, our fit statistics also suggested one

  16. Teaching children with autism spectrum disorders to mand for the removal of stimuli that prevent access to preferred items.

    PubMed

    Shillingsburg, M Alice; Powell, Nicole M; Bowen, Crystal N

    2013-01-01

    Mand training is often a primary focus in early language instruction and typically includes mands that are positively reinforced. However, mands maintained by negative reinforcement are also important skills to teach. These include mands to escape aversive demands or unwanted items. Another type of negatively reinforced mand important to teach involves the removal of a stimulus that prevents access to a preferred activity. We taught 5 participants diagnosed with autism spectrum disorders to mand for the removal of a stimulus in order to access a preferred item that had been blocked. An evaluation was conducted to determine if participants responded differentially when the establishing operations for the preferred item were present versus absent. All participants learned to mand for the removal of the stimulus exclusively under conditions when the establishing operation was present.

  17. Instructional Topics in Educational Measurement (ITEMS) Module: Using Automated Processes to Generate Test Items

    ERIC Educational Resources Information Center

    Gierl, Mark J.; Lai, Hollis

    2013-01-01

    Changes to the design and development of our educational assessments are resulting in the unprecedented demand for a large and continuous supply of content-specific test items. One way to address this growing demand is with automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer…

  18. 76 FR 52138 - Defense Federal Acquisition Regulation Supplement; Identification of Critical Safety Items (DFARS...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-08-19

    ...; or (iii) An uncommanded engine shutdown that jeopardizes safety. Design control activity. (i) With... aviation critical safety item is to be used; and (ii) With respect to a ship critical safety item, means...-AG92 Defense Federal Acquisition Regulation Supplement; Identification of Critical Safety Items (DFARS...

  19. Application of Item Response Theory to Tests of Substance-related Associative Memory

    PubMed Central

    Shono, Yusuke; Grenard, Jerry L.; Ames, Susan L.; Stacy, Alan W.

    2015-01-01

    A substance-related word association test (WAT) is one of the commonly used indirect tests of substance-related implicit associative memory and has been shown to predict substance use. This study applied an item response theory (IRT) modeling approach to evaluate psychometric properties of the alcohol- and marijuana-related WATs and their items among 775 ethnically diverse at-risk adolescents. After examining the IRT assumptions, item fit, and differential item functioning (DIF) across gender and age groups, the original 18 WAT items were reduced to 14- and 15-items in the alcohol- and marijuana-related WAT, respectively. Thereafter, unidimensional one- and two-parameter logistic models (1PL and 2PL models) were fitted to the revised WAT items. The results demonstrated that both alcohol- and marijuana-related WATs have good psychometric properties. These results were discussed in light of the framework of a unified concept of construct validity (Messick, 1975, 1989, 1995). PMID:25134051

  20. Use of item response curves of the Force and Motion Conceptual Evaluation to compare Japanese and American students' views on force and motion

    NASA Astrophysics Data System (ADS)

    Ishimoto, Michi; Davenport, Glen; Wittmann, Michael C.

    2017-12-01

    Student views of force and motion reflect the personal experiences and physics education of the student. With a different language, culture, and educational system, we expect that Japanese students' views on force and motion might be different from those of American students. The Force and Motion Conceptual Evaluation (FMCE) is an instrument used to probe student views on force and motion. It was designed using research on American students, and, as such, the items might function differently for Japanese students. Preliminary results from a translated version indicated that Japanese students had similar misconceptions as those of American students. In this study, we used item response curves (IRCs) to make more detailed item-by-item comparisons. IRCs show the functioning of individual items across all levels of performance by plotting the proportion of each response as a function of the total score. Most of the IRCs showed very similar patterns on both correct and incorrect responses; however, a few of the plots indicate differences between the populations. The similar patterns indicate that students tend to interact with FMCE items similarly, despite differences in culture, language, and education. We speculate about the possible causes for the differences in some of the IRCs. This report is intended to show how IRCs can be used as a part of the validation process when making comparisons across languages and nationalities. Differences in IRCs can help to pinpoint artifacts of translation, contextual effects because of differences in culture, and perhaps intrinsic differences in student understanding of Newtonian motion.

  1. An Evaluation of Item Response Theory Classification Accuracy and Consistency Indices

    ERIC Educational Resources Information Center

    Wyse, Adam E.; Hao, Shiqi

    2012-01-01

    This article introduces two new classification consistency indices that can be used when item response theory (IRT) models have been applied. The new indices are shown to be related to Rudner's classification accuracy index and Guo's classification accuracy index. The Rudner- and Guo-based classification accuracy and consistency indices are…

  2. Secondary Psychometric Examination of the Dimensional Obsessive-Compulsive Scale: Classical Testing, Item Response Theory, and Differential Item Functioning.

    PubMed

    Thibodeau, Michel A; Leonard, Rachel C; Abramowitz, Jonathan S; Riemann, Bradley C

    2015-12-01

    The Dimensional Obsessive-Compulsive Scale (DOCS) is a promising measure of obsessive-compulsive disorder (OCD) symptoms but has received minimal psychometric attention. We evaluated the utility and reliability of DOCS scores. The study included 832 students and 300 patients with OCD. Confirmatory factor analysis supported the originally proposed four-factor structure. DOCS total and subscale scores exhibited good to excellent internal consistency in both samples (α = .82 to α = .96). Patient DOCS total scores reduced substantially during treatment (t = 16.01, d = 1.02). DOCS total scores discriminated between students and patients (sensitivity = 0.76, 1 - specificity = 0.23). The measure did not exhibit gender-based differential item functioning as tested by Mantel-Haenszel chi-square tests. Expected response options for each item were plotted as a function of item response theory and demonstrated that DOCS scores incrementally discriminate OCD symptoms ranging from low to extremely high severity. Incremental differences in DOCS scores appear to represent unbiased and reliable differences in true OCD symptom severity. © The Author(s) 2014.

  3. Putting Humpty together and pulling him apart: accessing and unbinding the hippocampal item-context engram.

    PubMed

    Sadeh, Talya; Maril, Anat; Bitan, Tali; Goshen-Gottstein, Yonatan

    2012-03-01

    A remarkable act of memory entails binding different forms of information. We focus on the timeless question of how the bound engram is accessed such that its component features-item and context-are extracted. To shed light on this question, we investigate the dynamics between brain structures that together mediate the binding and extraction of item and context. Converging evidence has implicated the Parahippocampal cortex (PHc) in contextual processing, the Perirhinal cortex (PRc) in item processing, and the hippocampus in item-context binding. Effective connectivity analysis was conducted on fMRI data gathered during retrieval on tests that differ with regard to the to-be-extracted information. Results revealed that recall is initiated by context-related PHc activity, followed by hippocampal item-context engram activation, and completed with retrieval of the study-item by the PRc. The reverse path was found for recognition. We thus provide novel evidence for dissociative patterns of item-context unbinding during retrieval. Copyright © 2011 Elsevier Inc. All rights reserved.

  4. Evaluation of Linking Methods for Placing Three-Parameter Logistic Item Parameter Estimates onto a One-Parameter Scale

    ERIC Educational Resources Information Center

    Karkee, Thakur B.; Wright, Karen R.

    2004-01-01

    Different item response theory (IRT) models may be employed for item calibration. Change of testing vendors, for example, may result in the adoption of a different model than that previously used with a testing program. To provide scale continuity and preserve cut score integrity, item parameter estimates from the new model must be linked to the…

  5. Item Banking Enables Stand-Alone Measurement of Driving Ability.

    PubMed

    Khadka, Jyoti; Fenwick, Eva K; Lamoureux, Ecosse L; Pesudovs, Konrad

    2016-12-01

    To explore whether large item sets, as used in item banking, enable important latent traits, such as driving, to form stand-alone measures. The 88-item activity limitation (AL) domain of the glaucoma module of the Eye-tem Bank was interviewer-administered to patients with glaucoma. Rasch analysis was used to calibrate all items in AL domain on the same interval-level scale and test its psychometric properties. Based on Rasch dimensionality metrics, the AL scale was separated into subscales. These subscales underwent separate Rasch analyses to test whether they could form stand-alone measures. Independence of these measures was tested with Bland and Altman (B&A) Limit of Agreement (LOA). The AL scale was completed by 293 patients (median age, 71 years). It demonstrated excellent precision (3.12). However, Rasch analysis dimensionality metrics indicated that the domain arguably had other dimensions which were driving, luminance, and reading. Once separated, the remaining AL items, driving and luminance subscales, were unidimensional and had excellent precision of 4.25, 2.94, and 2.22, respectively. The reading subscale showed poor precision (1.66), so it was not examined further. The luminance subscale demonstrated excellent agreement (mean bias, 0.2 logit; 95% LOA, -2.2 to 3.3 logit); however, the driving subscale demonstrated poor agreement (mean bias, 1.1 logit; 95% LOA, -4.8 to 7.0 logit) with the AL scale. These findings indicate that driving items in the AL domain of the glaucoma module were perceived and responded to differently from the other AL items, but the reading and luminance items were not. Therefore, item banking enables stand-alone measurement of driving ability in glaucoma.

  6. The special role of item-context associations in the direct-access region of working memory.

    PubMed

    Campoy, Guillermo

    2017-09-01

    The three-embedded-component model of working memory (WM) distinguishes three representational states corresponding to three WM regions: activated long-term memory, direct-access region (DAR), and focus of attention. Recent neuroimaging research has revealed that access to the DAR is associated with enhanced hippocampal activity. Because the hippocampus mediates the encoding and retrieval of item-context associations, it has been suggested that this hippocampal activation is a consequence of the fact that item-context associations are particularly strong and accessible in the DAR. This study provides behavioral evidence for this view using an item-recognition task to assess the effect of non-intentional encoding and maintenance of item-location associations across WM regions. Five pictures of human faces were sequentially presented in different screen locations followed by a recognition probe. Visual cues immediately preceding the probe indicated the location thereof. When probe stimuli appeared in the same location that they had been presented within the memory set, the presentation of the cue was expected to elicit the activation of the corresponding WM representation through the just-established item-location association, resulting in faster recognition. Results showed this same-location effect, but only for items that, according to their serial position within the memory set, were held in the DAR.

  7. Computerized Adaptive Test (CAT) Applications and Item Response Theory Models for Polytomous Items

    ERIC Educational Resources Information Center

    Aybek, Eren Can; Demirtasli, R. Nukhet

    2017-01-01

    This article aims to provide a theoretical framework for computerized adaptive tests (CAT) and item response theory models for polytomous items. Besides that, it aims to introduce the simulation and live CAT software to the related researchers. Computerized adaptive test algorithm, assumptions of item response theory models, nominal response…

  8. A Psychometric Evaluation of the Learning Styles Questionnaire: 40-Item Version

    ERIC Educational Resources Information Center

    Klein, Britt; McCall, Louise; Austin, David; Piterman, Leon

    2007-01-01

    Sixty-six English-speaking postgraduate distance-education medical students completed the Learning Styles Questionnaire (LSQ: 40-item version). This was completed while attending a residential workshop at the beginning of the semester, and 44 of these students completed the same LSQ questionnaire 5 months later at the completion of the semester.…

  9. Data sharing report characterization of the surveillance and maintenance project miscellaneous process inventory waste items Oak Ridge National Laboratory, Oak Ridge, TN

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Weaver, Phyllis C.

    2013-12-12

    The U.S. Department of Energy (DOE) Oak Ridge Office of Environmental Management (EM-OR) requested Oak Ridge Associated Universities (ORAU), working under the Oak Ridge Institute for Science and Education (ORISE) contract, to provide technical and independent waste management planning support under the American Recovery and Reinvestment Act (ARRA). Specifically, DOE EM-OR requested ORAU to plan and implement a sampling and analysis campaign to target certain items associated with URS|CH2M Oak Ridge, LLC (UCOR) surveillance and maintenance (S&M) process inventory waste. Eight populations of historical and reoccurring S&M waste at the Oak Ridge National Laboratory (ORNL) have been identified in themore » Waste Handling Plan for Surveillance and Maintenance Activities at the Oak Ridge National Laboratory, DOE/OR/01-2565&D2 (WHP) (DOE 2012) for evaluation and processing for final disposal. This waste was generated during processing, surveillance, and maintenance activities associated with the facilities identified in the process knowledge (PK) provided in Appendix A. A list of items for sampling and analysis were generated from a subset of materials identified in the WHP populations (POPs) 4, 5, 6, 7, and 8, plus a small number of items not explicitly addressed by the WHP. Specifically, UCOR S&M project personnel identified 62 miscellaneous waste items that would require some level of evaluation to identify the appropriate pathway for disposal. These items are highly diverse, relative to origin; composition; physical description; contamination level; data requirements; and the presumed treatment, storage, and disposal facility (TSDF). Because of this diversity, ORAU developed a structured approach to address item-specific data requirements necessary for acceptance in a presumed TSDF that includes the Environmental Management Waste Management Facility (EMWMF)—using the approved Waste Lot (WL) 108.1 profile—the Y-12 Sanitary Landfill (SLF) if appropriate; Energy

  10. A Comparison of Latent Growth Models for Constructs Measured by Multiple Items

    ERIC Educational Resources Information Center

    Leite, Walter L.

    2007-01-01

    Univariate latent growth modeling (LGM) of composites of multiple items (e.g., item means or sums) has been frequently used to analyze the growth of latent constructs. This study evaluated whether LGM of composites yields unbiased parameter estimates, standard errors, chi-square statistics, and adequate fit indexes. Furthermore, LGM was compared…

  11. Evaluation of the Multiple Sclerosis Walking Scale-12 (MSWS-12) in a Dutch sample: Application of item response theory.

    PubMed

    Mokkink, Lidwine Brigitta; Galindo-Garre, Francisca; Uitdehaag, Bernard Mj

    2016-12-01

    The Multiple Sclerosis Walking Scale-12 (MSWS-12) measures walking ability from the patients' perspective. We examined the quality of the MSWS-12 using an item response theory model, the graded response model (GRM). A total of 625 unique Dutch multiple sclerosis (MS) patients were included. After testing for unidimensionality, monotonicity, and absence of local dependence, a GRM was fit and item characteristics were assessed. Differential item functioning (DIF) for the variables gender, age, duration of MS, type of MS and severity of MS, reliability, total test information, and standard error of the trait level (θ) were investigated. Confirmatory factor analysis showed a unidimensional structure of the 12 items of the scale, explaining 88% of the variance. Item 2 did not fit into the GRM model. Reliability was 0.93. Items 8 and 9 (of the 11 and 12 item version respectively) showed DIF on the variable severity, based on the Expanded Disability Status Scale (EDSS). However, the EDSS is strongly related to the content of both items. Our results confirm the good quality of the MSWS-12. The trait level (θ) scores and item parameters of both the 12- and 11-item versions were highly comparable, although we do not suggest to change the content of the MSWS-12. © The Author(s), 2016.

  12. Selecting Items for Criterion-Referenced Tests.

    ERIC Educational Resources Information Center

    Mellenbergh, Gideon J.; van der Linden, Wim J.

    1982-01-01

    Three item selection methods for criterion-referenced tests are examined: the classical theory of item difficulty and item-test correlation; the latent trait theory of item characteristic curves; and a decision-theoretic approach for optimal item selection. Item contribution to the standardized expected utility of mastery testing is discussed. (CM)

  13. Geriatric Anxiety Scale: item response theory analysis, differential item functioning, and creation of a ten-item short form (GAS-10).

    PubMed

    Mueller, Anne E; Segal, Daniel L; Gavett, Brandon; Marty, Meghan A; Yochim, Brian; June, Andrea; Coolidge, Frederick L

    2015-07-01

    The Geriatric Anxiety Scale (GAS; Segal et al. (Segal, D. L., June, A., Payne, M., Coolidge, F. L. and Yochim, B. (2010). Journal of Anxiety Disorders, 24, 709-714. doi:10.1016/j.janxdis.2010.05.002) is a self-report measure of anxiety that was designed to address unique issues associated with anxiety assessment in older adults. This study is the first to use item response theory (IRT) to examine the psychometric properties of a measure of anxiety in older adults. A large sample of older adults (n = 581; mean age = 72.32 years, SD = 7.64 years, range = 60 to 96 years; 64% women; 88% European American) completed the GAS. IRT properties were examined. The presence of differential item functioning (DIF) or measurement bias by age and sex was assessed, and a ten-item short form of the GAS (called the GAS-10) was created. All GAS items had discrimination parameters of 1.07 or greater. Items from the somatic subscale tended to have lower discrimination parameters than items on the cognitive or affective subscales. Two items were flagged for DIF, but the impact of the DIF was negligible. Women scored significantly higher than men on the GAS and its subscales. Participants in the young-old group (60 to 79 years old) scored significantly higher on the cognitive subscale than participants in the old-old group (80 years old and older). Results from the IRT analyses indicated that the GAS and GAS-10 have strong psychometric properties among older adults. We conclude by discussing implications and future research directions.

  14. Asymptotic Standard Errors for Item Response Theory True Score Equating of Polytomous Items

    ERIC Educational Resources Information Center

    Cher Wong, Cheow

    2015-01-01

    Building on previous works by Lord and Ogasawara for dichotomous items, this article proposes an approach to derive the asymptotic standard errors of item response theory true score equating involving polytomous items, for equivalent and nonequivalent groups of examinees. This analytical approach could be used in place of empirical methods like…

  15. Psychometric Evaluation of Chinese-Language 44-Item and 10-Item Big Five Personality Inventories, Including Correlations with Chronotype, Mindfulness and Mind Wandering

    PubMed Central

    Carciofo, Richard; Yang, Jiaoyan; Song, Nan; Du, Feng; Zhang, Kan

    2016-01-01

    The 44-item and 10-item Big Five Inventory (BFI) personality scales are widely used, but there is a lack of psychometric data for Chinese versions. Eight surveys (total N = 2,496, aged 18–82), assessed a Chinese-language BFI-44 and/or an independently translated Chinese-language BFI-10. Most BFI-44 items loaded strongly or predominantly on the expected dimension, and values of Cronbach's alpha ranged .698-.807. Test-retest coefficients ranged .694-.770 (BFI-44), and .515-.873 (BFI-10). The BFI-44 and BFI-10 showed good convergent and discriminant correlations, and expected associations with gender (females higher for agreeableness and neuroticism), and age (older age associated with more conscientiousness and agreeableness, and also less neuroticism and openness). Additionally, predicted correlations were found with chronotype (morningness positive with conscientiousness), mindfulness (negative with neuroticism, positive with conscientiousness), and mind wandering/daydreaming frequency (negative with conscientiousness, positive with neuroticism). Exploratory analysis found that the Self-discipline facet of conscientiousness positively correlated with morningness and mindfulness, and negatively correlated with mind wandering/daydreaming frequency. Furthermore, Self-discipline was found to be a mediator in the relationships between chronotype and mindfulness, and chronotype and mind wandering/daydreaming frequency. Overall, the results support the utility of the BFI-44 and BFI-10 for Chinese-language big five personality research. PMID:26918618

  16. Psychometric Evaluation of Chinese-Language 44-Item and 10-Item Big Five Personality Inventories, Including Correlations with Chronotype, Mindfulness and Mind Wandering.

    PubMed

    Carciofo, Richard; Yang, Jiaoyan; Song, Nan; Du, Feng; Zhang, Kan

    2016-01-01

    The 44-item and 10-item Big Five Inventory (BFI) personality scales are widely used, but there is a lack of psychometric data for Chinese versions. Eight surveys (total N = 2,496, aged 18-82), assessed a Chinese-language BFI-44 and/or an independently translated Chinese-language BFI-10. Most BFI-44 items loaded strongly or predominantly on the expected dimension, and values of Cronbach's alpha ranged .698-.807. Test-retest coefficients ranged .694-.770 (BFI-44), and .515-.873 (BFI-10). The BFI-44 and BFI-10 showed good convergent and discriminant correlations, and expected associations with gender (females higher for agreeableness and neuroticism), and age (older age associated with more conscientiousness and agreeableness, and also less neuroticism and openness). Additionally, predicted correlations were found with chronotype (morningness positive with conscientiousness), mindfulness (negative with neuroticism, positive with conscientiousness), and mind wandering/daydreaming frequency (negative with conscientiousness, positive with neuroticism). Exploratory analysis found that the Self-discipline facet of conscientiousness positively correlated with morningness and mindfulness, and negatively correlated with mind wandering/daydreaming frequency. Furthermore, Self-discipline was found to be a mediator in the relationships between chronotype and mindfulness, and chronotype and mind wandering/daydreaming frequency. Overall, the results support the utility of the BFI-44 and BFI-10 for Chinese-language big five personality research.

  17. The Dutch-Flemish PROMIS Physical Function item bank exhibited strong psychometric properties in patients with chronic pain.

    PubMed

    Crins, Martine H P; Terwee, Caroline B; Klausch, Thomas; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis A; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Roorda, Leo D

    2017-07-01

    The objective of this study was to assess the psychometric properties of the Dutch-Flemish Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank in Dutch patients with chronic pain. A bank of 121 items was administered to 1,247 Dutch patients with chronic pain. Unidimensionality was assessed by fitting a one-factor confirmatory factor analysis and evaluating resulting fit statistics. Items were calibrated with the graded response model and its fit was evaluated. Cross-cultural validity was assessed by testing items for differential item functioning (DIF) based on language (Dutch vs. English). Construct validity was evaluated by calculation correlations between scores on the Dutch-Flemish PROMIS Physical Function measure and scores on generic and disease-specific measures. Results supported the Dutch-Flemish PROMIS Physical Function item bank's unidimensionality (Comparative Fit Index = 0.976, Tucker Lewis Index = 0.976) and model fit. Item thresholds targeted a wide range of physical function construct (threshold-parameters range: -4.2 to 5.6). Cross-cultural validity was good as four items only showed DIF for language and their impact on item scores was minimal. Physical Function scores were strongly associated with scores on all other measures (all correlations ≤ -0.60 as expected). The Dutch-Flemish PROMIS Physical Function item bank exhibited good psychometric properties. Development of a computer adaptive test based on the large bank is warranted. Copyright © 2017 Elsevier Inc. All rights reserved.

  18. MIMIC Methods for Assessing Differential Item Functioning in Polytomous Items

    ERIC Educational Resources Information Center

    Wang, Wen-Chung; Shih, Ching-Lin

    2010-01-01

    Three multiple indicators-multiple causes (MIMIC) methods, namely, the standard MIMIC method (M-ST), the MIMIC method with scale purification (M-SP), and the MIMIC method with a pure anchor (M-PA), were developed to assess differential item functioning (DIF) in polytomous items. In a series of simulations, it appeared that all three methods…

  19. Developing Item Response Theory-Based Short Forms to Measure the Social Impact of Burn Injuries.

    PubMed

    Marino, Molly E; Dore, Emily C; Ni, Pengsheng; Ryan, Colleen M; Schneider, Jeffrey C; Acton, Amy; Jette, Alan M; Kazis, Lewis E

    2018-03-01

    To develop self-reported short forms for the Life Impact Burn Recovery Evaluation (LIBRE) Profile. Short forms based on the item parameters of discrimination and average difficulty. A support network for burn survivors, peer support networks, social media, and mailings. Burn survivors (N=601) older than 18 years. Not applicable. The LIBRE Profile. Ten-item short forms were developed to cover the 6 LIBRE Profile scales: Relationships with Family & Friends, Social Interactions, Social Activities, Work & Employment, Romantic Relationships, and Sexual Relationships. Ceiling effects were ≤15% for all scales; floor effects were <1% for all scales. The marginal reliability of the short forms ranged from .85 to .89. The LIBRE Profile-Short Forms demonstrated credible psychometric properties. The short form version provides a viable alternative to administering the LIBRE Profile when resources do not allow computer or Internet access. The full item bank, computerized adaptive test, and short forms are all scored along the same metric, and therefore scores are comparable regardless of the mode of administration. Copyright © 2017 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  20. Random Item IRT Models

    ERIC Educational Resources Information Center

    De Boeck, Paul

    2008-01-01

    It is common practice in IRT to consider items as fixed and persons as random. Both, continuous and categorical person parameters are most often random variables, whereas for items only continuous parameters are used and they are commonly of the fixed type, although exceptions occur. It is shown in the present article that random item parameters…

  1. Improving Measurement Efficiency of the Inner EAR Scale with Item Response Theory.

    PubMed

    Jessen, Annika; Ho, Andrew D; Corrales, C Eduardo; Yueh, Bevan; Shin, Jennifer J

    2018-02-01

    Objectives (1) To assess the 11-item Inner Effectiveness of Auditory Rehabilitation (Inner EAR) instrument with item response theory (IRT). (2) To determine whether the underlying latent ability could also be accurately represented by a subset of the items for use in high-volume clinical scenarios. (3) To determine whether the Inner EAR instrument correlates with pure tone thresholds and word recognition scores. Design IRT evaluation of prospective cohort data. Setting Tertiary care academic ambulatory otolaryngology clinic. Subjects and Methods Modern psychometric methods, including factor analysis and IRT, were used to assess unidimensionality and item properties. Regression methods were used to assess prediction of word recognition and pure tone audiometry scores. Results The Inner EAR scale is unidimensional, and items varied in their location and information. Information parameter estimates ranged from 1.63 to 4.52, with higher values indicating more useful items. The IRT model provided a basis for identifying 2 sets of items with relatively lower information parameters. Item information functions demonstrated which items added insubstantial value over and above other items and were removed in stages, creating a 8- and 3-item Inner EAR scale for more efficient assessment. The 8-item version accurately reflected the underlying construct. All versions correlated moderately with word recognition scores and pure tone averages. Conclusion The 11-, 8-, and 3-item versions of the Inner EAR scale have strong psychometric properties, and there is correlational validity evidence for the observed scores. Modern psychometric methods can help streamline care delivery by maximizing relevant information per item administered.

  2. Correlates of a Single-Item Indicator Versus a Multi-Item Scale of Outness About Same-Sex Attraction

    PubMed Central

    Noor, Syed W.; Galos, Dylan L.; Simon Rosser, B. R.

    2017-01-01

    In this study, we investigated if a single-item indicator measured the degree to which people were open about their same-sex attraction (“out”) as accurately as a multi-item scale. For the multi-item scale, we used the Outness Inventory, which includes three subscales: family, world, and religion. We examined correlations between the single- and multi-item measures; between the single-item indicator and the subscales of the multi-item scale; and between the measures and internalized homonegativity, social attitudes towards homosexuality, and depressive symptoms. In addition, we calculated Tjur’s R2 as a measure of predictive power of the single-item indicator, multi-item scale, and subscales of the multi-item scale in predicting two health-related outcomes: depressive symptoms and condomless anal sex with multiple partners. There was a strong correlation between the single- and multi-item measures (r = 0.73). Furthermore, there were strong correlations between the single-item indicator and each subscale of the multi-item scale: family (r = 0.70), world (r = 0.77), and religion (r = 0.50). In addition, the correlations between the single-item indicator and internalized homonegativity (r = −0.63), social attitudes towards homosexuality (r = −0.38), and depression (r = −0.14) were higher than those between the multi-item scale and internalized homonegativity (r = −0.55), social attitudes towards homosexuality (r = −0.21), and depression (r = −0.13). Contrary to the premise that multi-item measures are superior to single-item measures, our collective findings indicate that the single-item indicator of outness performs better than the multi-item scale of outness. PMID:26292840

  3. Development and Validation of the PROMIS Pediatric Sleep Disturbance and Sleep-Related Impairment Item Banks.

    PubMed

    Forrest, Christopher B; Meltzer, Lisa J; Marcus, Carole L; de la Motte, Anna; Kratchman, Amy; Buysse, Daniel J; Pilkonis, Paul A; Becker, Brandon D; Bevans, Katherine B

    2018-03-13

    To develop and evaluate the measurement properties of child-report and parent-proxy versions of the PROMIS ® Pediatric Sleep Disturbance and Sleep-Related Impairment item banks. A national sample of 1,104 children (8-17 years-old) and 1,477 parents of children 5-17 years-old was recruited from an internet panel to evaluate the psychometric properties of 43 sleep health items. A convenience sample of children and parents recruited from a pediatric sleep clinic was obtained to provide evidence of the measures' validity; polysomnography data were collected from a subgroup of these children. Factor analyses suggested two dimensions: sleep disturbance and daytime sleep-related impairment. The final item banks included 15 items for Sleep Disturbance and 13 for Sleep-Related Impairment. Items were calibrated using the graded response model from item response theory. Of the 28 items, 16 are included in the parallel PROMIS adult sleep health measures. Reliability of the measures exceeded 0.90. Validity was supported by correlations with existing measures of pediatric sleep health and higher sleep disturbance and sleep-related impairment scores for children with sleep problems and those with chronic and neurodevelopmental disorders. The sleep health measures were not correlated with results from polysomnography. The PROMIS Pediatric Sleep Disturbance and Sleep-Related Impairment item banks provide subjective assessments of a child's difficulties falling and staying asleep as well as daytime sleepiness and its impact on functioning. They may prove useful in the future for clinical research and practice. Future research should evaluate their responsiveness to clinical change in diverse patient populations.

  4. Opposing effects of negative emotion on amygdalar and hippocampal memory for items and associations

    PubMed Central

    Horner, Aidan J.; Hørlyck, Lone D.; Burgess, Neil

    2016-01-01

    Although negative emotion can strengthen memory of an event it can also result in memory disturbances, as in post-traumatic stress disorder (PTSD). We examined the effects of negative item content on amygdalar and hippocampal function in memory for the items themselves and for the associations between them. During fMRI, we examined encoding and retrieval of paired associates made up of all four combinations of neutral and negative images. At test, participants were cued with an image and, if recognised, had to retrieve the associated (target) image. The presence of negative images increased item memory but reduced associative memory. At encoding, subsequent item recognition correlated with amygdala activity, while subsequent associative memory correlated with hippocampal activity. Hippocampal activity was reduced by the presence of negative images, during encoding and correct associative retrieval. In contrast, amygdala activity increased for correctly retrieved negative images, even when cued by a neutral image. Our findings support a dual representation account, whereby negative emotion up-regulates the amygdala to strengthen item memory but down-regulates the hippocampus to weaken associative representations. These results have implications for the development and treatment of clinical disorders in which diminished associations between emotional stimuli and their context contribute to negative symptoms, as in PTSD. PMID:26969864

  5. Item-Writing Guidelines for Physics

    ERIC Educational Resources Information Center

    Regan, Tom

    2015-01-01

    A teacher learning how to write test questions (test items) will almost certainly encounter item-writing guidelines--lists of item-writing do's and don'ts. Item-writing guidelines usually are presented as applicable across all assessment settings. Table I shows some guidelines that I believe to be generally applicable and two will be briefly…

  6. Item analysis of examinations in the Faculty of Medicine of Tunis.

    PubMed

    Hermi, Amene; Achour, Wafa

    2016-04-01

    Introduction Item analysis is the process of collecting, summarizing and using information from students' responses to assess test items' quality. This study used this approach to evaluate the quality of items and examinations given in the Faculty of Medicine of Tunis (FMT).    Methods This study concerned the examinations of 2012-2013 (principal session). It analyzed 3138 items from 66 examinations, of which, 46 were multidisciplinary (187 disciplines). A total of 2515 students took the examinations. "AnItem.xls" file was used for the analysis that focused on difficulty, discrimination and internal consistency.  Results Mean difficulty for all examinations was optimum (mean difficulty index: 0.59). Majority of items (89.17%) were either easy or of acceptable difficulty. Mean discrimination for all examinations was moderate (mean item discrimination coefficient: 0.28) with poor discrimination in 23.62% of items. Maximal discrimination occurred with disciplines of difficulty index between 0.4-0.6. « Ideal » items represented 27.02%. Mean internal consistency for all examinations was acceptable (Cronbach's alpha: 0.79). Disciplines with nonacceptable internal consistency (68.45%) contained a maximum of 33 items (each one) and a positive correlation between their alpha and the number of their questions. Distributions were mostly (72.73%) platykurtic and negatively asymmetric (89.39%). First year of studies had the best parameters. Conclusion Our examinations had an acceptable internal consistency, and a good level of difficulty and discrimination. They tended to facility and discriminated basically students of medium level. Item analysis is useful as a guide to item writers to improve the overall quality of questions in the future.

  7. Item Response Data Analysis Using Stata Item Response Theory Package

    ERIC Educational Resources Information Center

    Yang, Ji Seung; Zheng, Xiaying

    2018-01-01

    The purpose of this article is to introduce and review the capability and performance of the Stata item response theory (IRT) package that is available from Stata v.14, 2015. Using a simulated data set and a publicly available item response data set extracted from Programme of International Student Assessment, we review the IRT package from…

  8. Item Selection and Pre-equating with Empirical Item Characteristic Curves.

    ERIC Educational Resources Information Center

    Livingston, Samuel A.

    An empirical item characteristic curve shows the probability of a correct response as a function of the student's total test score. These curves can be estimated from large-scale pretest data. They enable test developers to select items that discriminate well in the score region where decisions are made. A similar set of curves can be used to…

  9. Reconciling findings of emotion-induced memory enhancement and impairment of preceding items

    PubMed Central

    Knight, Marisa; Mather, Mara

    2009-01-01

    A large body of work reveals that people remember emotionally arousing information better than neutral information. However, previous research reveals contradictory effects of emotional events on memory for neutral events that precede or follow them: in some studies emotionally arousing items impair memory for immediately preceding or following items and in others arousing items enhance memory for preceding items. By demonstrating both emotion-induced enhancement and impairment, Experiments 1 and 2 clarified the conditions under which these effects are likely to occur. The results suggest that emotion-induced enhancement is most likely to occur for neutral items that: (1) precede (and so are poised to predict the onset of) emotionally arousing items, (2) have high attentional weights at encoding, and (3) are tested after a delay period of a week rather than within the same experiment session. In contrast, emotion-induced impairment is most likely to occur for neutral items near the onset of emotional arousal that are overshadowed by highly activated competing items during encoding. PMID:20001121

  10. Numerical Differentiation Methods for Computing Error Covariance Matrices in Item Response Theory Modeling: An Evaluation and a New Proposal

    ERIC Educational Resources Information Center

    Tian, Wei; Cai, Li; Thissen, David; Xin, Tao

    2013-01-01

    In item response theory (IRT) modeling, the item parameter error covariance matrix plays a critical role in statistical inference procedures. When item parameters are estimated using the EM algorithm, the parameter error covariance matrix is not an automatic by-product of item calibration. Cai proposed the use of Supplemented EM algorithm for…

  11. Differential item functioning of the Geriatric Depression Scale in an Asian population.

    PubMed

    Broekman, B F P; Nyunt, S Z; Niti, M; Jin, A Z; Ko, S M; Kumar, R; Fones, C S L; Ng, T P

    2008-06-01

    The Geriatric Depression Scale (GDS) is widely used for screening and assessment of major depressive disorder (MDD). Screening scales are often culture-specific and should be evaluated for item response bias (synonymously differential item functioning, DIF) before use in clinical practice and research in a different population. In this study, we examined DIF associated with age, gender, ethnicity and chronic illness in a heterogeneous Asian population in Singapore. The GDS-15 and Structured Clinical Interview for DSM-IV diagnosis of MDD were independently administered by interviewers on 4253 non-institutionalized community living elderly subjects aged 60 years and above who were users of social service agencies. Multiple Indicator Multiple Cause latent variable modelling was used to identify DIF. We found evidence of significant DIF associated with age, gender, ethnicity and chronic illness for 8 items: dropped many activities and interests, afraid something bad is going to happen, prefer staying home to going out, more problems with memory than most, think it is (not) wonderful to be alive, feel pretty worthless, feel (not) full of energy, feel that situation is hopeless. The smaller number of minority Indian and Malay subjects and the self-report of chronic medical illnesses. In a heterogeneous mix of respondents in Singapore, eight items of the GDS-15 showed DIF for age, gender, ethnicity and chronic illness. The awareness and identification of DIF in the GDS-15 provides a rational basis for its use in diverse population groups and guiding the derivation of abbreviated scales.

  12. The Effect of the Position of an Item within a Test on the Item Difficulty Value.

    ERIC Educational Resources Information Center

    Rubin, Lois S.; Mott, David E. W.

    An investigation of the effect on the difficulty value of an item due to position placement within a test was made. Using a 60-item operational test comprised of 5 subtests, 60 items were placed as experimental items on a number of spiralled test forms in three different positions (first, middle, last) within the subtest composed of like items.…

  13. Measurement Equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Pain Interference Short Form Items: Application to Ethnically Diverse Cancer and Palliative Care Populations.

    PubMed

    Teresi, Jeanne A; Ocepek-Welikson, Katja; Cook, Karon F; Kleinman, Marjorie; Ramirez, Mildred; Reid, M Carrington; Siu, Albert

    2016-01-01

    Reducing the response burden of standardized pain measures is desirable, particularly for individuals who are frail or live with chronic illness, e.g., those suffering from cancer and those in palliative care. The Patient Reported Outcome Measurement Information System ® (PROMIS ® ) project addressed this issue with the provision of computerized adaptive tests (CAT) and short form measures that can be used clinically and in research. Although there has been substantial evaluation of PROMIS item banks, little is known about the performance of PROMIS short forms, particularly in ethnically diverse groups. Reviewed in this article are findings related to the differential item functioning (DIF) and reliability of the PROMIS pain interference short forms across diverse sociodemographic groups. DIF hypotheses were generated for the PROMIS short form pain interference items. Initial analyses tested item response theory (IRT) model assumptions of unidimensionality and local independence. Dimensionality was evaluated using factor analytic methods; local dependence (LD) was tested using IRT-based LD indices. Wald tests were used to examine group differences in IRT parameters, and to test DIF hypotheses. A second DIF-detection method used in sensitivity analyses was based on ordinal logistic regression with a latent IRT-derived conditioning variable. Magnitude and impact of DIF were investigated, and reliability and item and scale information statistics were estimated. The reliability of the short form item set was excellent. However, there were a few items with high local dependency, which affected the estimation of the final discrimination parameters. As a result, the item, "How much did pain interfere with enjoyment of social activities?" was excluded in the DIF analyses for all subgroup comparisons. No items were hypothesized to show DIF for race and ethnicity; however, five items showed DIF after adjustment for multiple comparisons in both primary and sensitivity

  14. Measurement Equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Pain Interference Short Form Items: Application to Ethnically Diverse Cancer and Palliative Care Populations

    PubMed Central

    Teresi, Jeanne A.; Ocepek-Welikson, Katja; Cook, Karon F.; Kleinman, Marjorie; Ramirez, Mildred; Reid, M. Carrington; Siu, Albert

    2017-01-01

    Reducing the response burden of standardized pain measures is desirable, particularly for individuals who are frail or live with chronic illness, e.g., those suffering from cancer and those in palliative care. The Patient Reported Outcome Measurement Information System® (PROMIS®) project addressed this issue with the provision of computerized adaptive tests (CAT) and short form measures that can be used clinically and in research. Although there has been substantial evaluation of PROMIS item banks, little is known about the performance of PROMIS short forms, particularly in ethnically diverse groups. Reviewed in this article are findings related to the differential item functioning (DIF) and reliability of the PROMIS pain interference short forms across diverse sociodemographic groups. Methods DIF hypotheses were generated for the PROMIS short form pain interference items. Initial analyses tested item response theory (IRT) model assumptions of unidimensionality and local independence. Dimensionality was evaluated using factor analytic methods; local dependence (LD) was tested using IRT-based LD indices. Wald tests were used to examine group differences in IRT parameters, and to test DIF hypotheses. A second DIF-detection method used in sensitivity analyses was based on ordinal logistic regression with a latent IRT-derived conditioning variable. Magnitude and impact of DIF were investigated, and reliability and item and scale information statistics were estimated. Results The reliability of the short form item set was excellent. However, there were a few items with high local dependency, which affected the estimation of the final discrimination parameters. As a result, the item, “How much did pain interfere with enjoyment of social activities?” was excluded in the DIF analyses for all subgroup comparisons. No items were hypothesized to show DIF for race and ethnicity; however, five items showed DIF after adjustment for multiple comparisons in both primary and

  15. PROC IRT: A SAS Procedure for Item Response Theory

    PubMed Central

    Matlock Cole, Ki; Paek, Insu

    2017-01-01

    This article reviews the procedure for item response theory (PROC IRT) procedure in SAS/STAT 14.1 to conduct item response theory (IRT) analyses of dichotomous and polytomous datasets that are unidimensional or multidimensional. The review provides an overview of available features, including models, estimation procedures, interfacing, input, and output files. A small-scale simulation study evaluates the IRT model parameter recovery of the PROC IRT procedure. The use of the IRT procedure in Statistical Analysis Software (SAS) may be useful for researchers who frequently utilize SAS for analyses, research, and teaching.

  16. Medial temporal lobe contributions to cued retrieval of items and contexts.

    PubMed

    Hannula, Deborah E; Libby, Laura A; Yonelinas, Andrew P; Ranganath, Charan

    2013-10-01

    Several models have proposed that different regions of the medial temporal lobes contribute to different aspects of episodic memory. For instance, according to one view, the perirhinal cortex represents specific items, parahippocampal cortex represents information regarding the context in which these items were encountered, and the hippocampus represents item-context bindings. Here, we used event-related functional magnetic resonance imaging (fMRI) to test a specific prediction of this model-namely, that successful retrieval of items from context cues will elicit perirhinal recruitment and that successful retrieval of contexts from item cues will elicit parahippocampal cortex recruitment. Retrieval of the bound representation in either case was expected to elicit hippocampal engagement. To test these predictions, we had participants study several item-context pairs (i.e., pictures of objects and scenes, respectively), and then had them attempt to recall items from associated context cues and contexts from associated item cues during a scanned retrieval session. Results based on both univariate and multivariate analyses confirmed a role for hippocampus in content-general relational memory retrieval, and a role for parahippocampal cortex in successful retrieval of contexts from item cues. However, we also found that activity differences in perirhinal cortex were correlated with successful cued recall for both items and contexts. These findings provide partial support for the above predictions and are discussed with respect to several models of medial temporal lobe function. Copyright © 2013 Elsevier Ltd. All rights reserved.

  17. Medial Temporal Lobe Contributions to Cued Retrieval of Items and Contexts

    PubMed Central

    Hannula, Deborah E.; Libby, Laura A.; Yonelinas, Andrew P.; Ranganath, Charan

    2013-01-01

    Several models have proposed that different regions of the medial temporal lobes contribute to different aspects of episodic memory. For instance, according to one view, the perirhinal cortex represents specific items, parahippocampal cortex represents information regarding the context in which these items were encountered, and the hippocampus represents item-context bindings. Here, we used event-related functional magnetic resonance imaging (fMRI) to test a specific prediction of this model – namely, that successful retrieval of items from context cues will elicit perirhinal recruitment and that successful retrieval of contexts from item cues will elicit parahippocampal cortex recruitment. Retrieval of the bound representation in either case was expected to elicit hippocampal engagement. To test these predictions, we had participants study several item-context pairs (i.e., pictures of objects and scenes, respectively), and then had them attempt to recall items from associated context cues and contexts from associated item cues during a scanned retrieval session. Results based on both univariate and multivariate analyses confirmed a role for hippocampus in content-general relational memory retrieval, and a role for parahippocampal cortex in successful retrieval of contexts from item cues. However, we also found that activity differences in perirhinal cortex were correlated with successful cued recall for both items and contexts. These findings provide partial support for the above predictions and are discussed with respect to several models of medial temporal lobe function. PMID:23466350

  18. Language-related differential item functioning between English and German PROMIS Depression items is negligible.

    PubMed

    Fischer, H Felix; Wahl, Inka; Nolte, Sandra; Liegl, Gregor; Brähler, Elmar; Löwe, Bernd; Rose, Matthias

    2017-12-01

    To investigate differential item functioning (DIF) of PROMIS Depression items between US and German samples we compared data from the US PROMIS calibration sample (n = 780), a German general population survey (n = 2,500) and a German clinical sample (n = 621). DIF was assessed in an ordinal logistic regression framework, with 0.02 as criterion for R 2 -change and 0.096 for Raju's non-compensatory DIF. Item parameters were initially fixed to the PROMIS Depression metric; we used plausible values to account for uncertainty in depression estimates. Only four items showed DIF. Accounting for DIF led to negligible effects for the full item bank as well as a post hoc simulated computer-adaptive test (< 0.1 point on the PROMIS metric [mean = 50, standard deviation =10]), while the effect on the short forms was small (< 1 point). The mean depression severity (43.6) in the German general population sample was considerably lower compared to the US reference value of 50. Overall, we found little evidence for language DIF between US and German samples, which could be addressed by either replacing the DIF items by items not showing DIF or by scoring the short form in German samples with the corrected item parameters reported. Copyright © 2016 John Wiley & Sons, Ltd.

  19. Item Response Theory Applied to Factors Affecting the Patient Journey Towards Hearing Rehabilitation

    PubMed Central

    Chenault, Michelene; Berger, Martijn; Kremer, Bernd; Anteunis, Lucien

    2016-01-01

    To develop a tool for use in hearing screening and to evaluate the patient journey towards hearing rehabilitation, responses to the hearing aid rehabilitation questionnaire scales aid stigma, pressure, and aid unwanted addressing respectively hearing aid stigma, experienced pressure from others; perceived hearing aid benefit were evaluated with item response theory. The sample was comprised of 212 persons aged 55 years or more; 63 were hearing aid users, 64 with and 85 persons without hearing impairment according to guidelines for hearing aid reimbursement in the Netherlands. Bias was investigated relative to hearing aid use and hearing impairment within the differential test functioning framework. Items compromising model fit or demonstrating differential item functioning were dropped. The aid stigma scale was reduced from 6 to 4, the pressure scale from 7 to 4, and the aid unwanted scale from 5 to 4 items. This procedure resulted in bias-free scales ready for screening purposes and application to further understand the help-seeking process of the hearing impaired. PMID:28028428

  20. Measuring organizational effectiveness in information and communication technology companies using item response theory.

    PubMed

    Trierweiller, Andréa Cristina; Peixe, Blênio César Severo; Tezza, Rafael; Pereira, Vera Lúcia Duarte do Valle; Pacheco, Waldemar; Bornia, Antonio Cezar; de Andrade, Dalton Francisco

    2012-01-01

    The aim of this paper is to measure the effectiveness of the organizations Information and Communication Technology (ICT) from the point of view of the manager, using Item Response Theory (IRT). There is a need to verify the effectiveness of these organizations which are normally associated to complex, dynamic, and competitive environments. In academic literature, there is disagreement surrounding the concept of organizational effectiveness and its measurement. A construct was elaborated based on dimensions of effectiveness towards the construction of the items of the questionnaire which submitted to specialists for evaluation. It demonstrated itself to be viable in measuring organizational effectiveness of ICT companies under the point of view of a manager through using Two-Parameter Logistic Model (2PLM) of the IRT. This modeling permits us to evaluate the quality and property of each item placed within a single scale: items and respondents, which is not possible when using other similar tools.

  1. The effects of relative food item size on optimal tooth cusp sharpness during brittle food item processing

    PubMed Central

    Berthaume, Michael A.; Dumont, Elizabeth R.; Godfrey, Laurie R.; Grosse, Ian R.

    2014-01-01

    Teeth are often assumed to be optimal for their function, which allows researchers to derive dietary signatures from tooth shape. Most tooth shape analyses normalize for tooth size, potentially masking the relationship between relative food item size and tooth shape. Here, we model how relative food item size may affect optimal tooth cusp radius of curvature (RoC) during the fracture of brittle food items using a parametric finite-element (FE) model of a four-cusped molar. Morphospaces were created for four different food item sizes by altering cusp RoCs to determine whether optimal tooth shape changed as food item size changed. The morphospaces were also used to investigate whether variation in efficiency metrics (i.e. stresses, energy and optimality) changed as food item size changed. We found that optimal tooth shape changed as food item size changed, but that all optimal morphologies were similar, with one dull cusp that promoted high stresses in the food item and three cusps that acted to stabilize the food item. There were also positive relationships between food item size and the coefficients of variation for stresses in food item and optimality, and negative relationships between food item size and the coefficients of variation for stresses in the enamel and strain energy absorbed by the food item. These results suggest that relative food item size may play a role in selecting for optimal tooth shape, and the magnitude of these selective forces may change depending on food item size and which efficiency metric is being selected. PMID:25320068

  2. [Instrument to measure adherence in hypertensive patients: contribution of Item Response Theory].

    PubMed

    Rodrigues, Malvina Thaís Pacheco; Moreira, Thereza Maria Magalhaes; Vasconcelos, Alexandre Meira de; Andrade, Dalton Francisco de; Silva, Daniele Braz da; Barbetta, Pedro Alberto

    2013-06-01

    To analyze, by means of "Item Response Theory", an instrument to measure adherence to t treatment for hypertension. Analytical study with 406 hypertensive patients with associated complications seen in primary care in Fortaleza, CE, Northeastern Brazil, 2011 using "Item Response Theory". The stages were: dimensionality test, calibrating the items, processing data and creating a scale, analyzed using the gradual response model. A study of the dimensionality of the instrument was conducted by analyzing the polychoric correlation matrix and factor analysis of complete information. Multilog software was used to calibrate items and estimate the scores. Items relating to drug therapy are the most directly related to adherence while those relating to drug-free therapy need to be reworked because they have less psychometric information and low discrimination. The independence of items, the small number of levels in the scale and low explained variance in the adjustment of the models show the main weaknesses of the instrument analyzed. The "Item Response Theory" proved to be a relevant analysis technique because it evaluated respondents for adherence to treatment for hypertension, the level of difficulty of the items and their ability to discriminate between individuals with different levels of adherence, which generates a greater amount of information. The instrument analyzed is limited in measuring adherence to hypertension treatment, by analyzing the "Item Response Theory" of the item, and needs adjustment. The proper formulation of the items is important in order to accurately measure the desired latent trait.

  3. Development and validation of a ten-item questionnaire with explanatory illustrations to assess upper extremity disorders: favorable effect of illustrations in the item reduction process.

    PubMed

    Kurimoto, Shigeru; Suzuki, Mikako; Yamamoto, Michiro; Okui, Nobuyuki; Imaeda, Toshihiko; Hirata, Hitoshi

    2011-11-01

    The purpose of this study is to develop a short and valid measure for upper extremity disorders and to assess the effect of attached illustrations in item reduction of a self-administered disability questionnaire while retaining psychometric properties. A validated questionnaire used to assess upper extremity disorders, the Hand20, was reduced to ten items using two item-reduction techniques. The psychometric properties of the abbreviated form, the Hand10, were evaluated on an independent sample that was used for the shortening process. Validity, reliability, and responsiveness of the Hand10 were retained in the item reduction process. It was possible that the use of explanatory illustrations attached to the Hand10 helped with its reproducibility. The illustrations for the Hand10 promoted text comprehension and motivation to answer the items. These changes resulted in high acceptability; more than 99.3% of patients, including 98.5% of elderly patients, could complete the Hand10 properly. The illustrations had favorable effects on the item reduction process and made it possible to retain precision of the instrument. The Hand10 is a reliable and valid instrument for individual-level applications with the advantage of being compact and broadly applicable, even in elderly individuals.

  4. The short- and long-term fates of memory items retained outside the focus of attention.

    PubMed

    LaRocque, Joshua J; Eichenbaum, Adam S; Starrett, Michael J; Rose, Nathan S; Emrich, Stephen M; Postle, Bradley R

    2015-04-01

    When a test of working memory (WM) requires the retention of multiple items, a subset of them can be prioritized. Recent studies have shown that, although prioritized (i.e., attended) items are associated with active neural representations, unprioritized (i.e., unattended) memory items can be retained in WM despite the absence of such active representations, and with no decrement in their recognition if they are cued later in the trial. These findings raise two intriguing questions about the nature of the short-term retention of information outside the focus of attention. First, when the focus of attention shifts from items in WM, is there a loss of fidelity for those unattended memory items? Second, could the retention of unattended memory items be accomplished by long-term memory mechanisms? We addressed the first question by comparing the precision of recall of attended versus unattended memory items, and found a significant decrease in precision for unattended memory items, reflecting a degradation in the quality of those representations. We addressed the second question by asking subjects to perform a WM task, followed by a surprise memory test for the items that they had seen in the WM task. Long-term memory for unattended memory items from the WM task was not better than memory for items that had remained selected by the focus of attention in the WM task. These results show that unattended WM representations are degraded in quality and are not preferentially represented in long-term memory, as compared to attended memory items.

  5. The short- and long-term fates of memory items retained outside the focus of attention

    PubMed Central

    Eichenbaum, Adam S.; Starrett, Michael J.; Rose, Nathan S.; Emrich, Stephen M.; Postle, Bradley R.

    2015-01-01

    When a test of working memory (WM) requires the retention of multiple items, a subset of them can be prioritized. Recent studies have shown that, although prioritized (i.e., attended) items are associated with active neural representations, unprioritized (i.e., unattended) memory items can be retained in WM despite the absence of such active representations, and with no decrement in their recognition if they are cued later in the trial. These findings raise two intriguing questions about the nature of the short-term retention of information outside the focus of attention. First, when the focus of attention shifts from items in WM, is there a loss of fidelity for those unattended memory items? Second, could the retention of unattended memory items be accomplished by long-term memory mechanisms? We addressed the first question by comparing the precision of recall of attended versus unattended memory items, and found a significant decrease in precision for unattended memory items, reflecting a degradation in the quality of those representations. We addressed the second question by asking subjects to perform a WM task, followed by a surprise memory test for the items that they had seen in the WM task. Long-term memory for unattended memory items from the WM task was not better than memory for items that had remained selected by the focus of attention in the WM task. These results show that unattended WM representations are degraded in quality and are not preferentially represented in long-term memory, as compared to attended memory items. PMID:25472902

  6. Evaluating HIV Knowledge Questionnaires Among Men Who Have Sex with Men: A Multi-Study Item Response Theory Analysis.

    PubMed

    Janulis, Patrick; Newcomb, Michael E; Sullivan, Patrick; Mustanski, Brian

    2018-01-01

    Knowledge about the transmission, prevention, and treatment of HIV remains a critical element in psychosocial models of HIV risk behavior and is commonly used as an outcome in HIV prevention interventions. However, most HIV knowledge questions have not undergone rigorous psychometric testing such as using item response theory. The current study used data from six studies of men who have sex with men (MSM; n = 3565) to (1) examine the item properties of HIV knowledge questions, (2) test for differential item functioning on commonly studied characteristics (i.e., age, race/ethnicity, and HIV risk behavior), (3) select items with the optimal item characteristics, and (4) leverage this combined dataset to examine the potential moderating effect of age on the relationship between condomless anal sex (CAS) and HIV knowledge. Findings indicated that existing questions tend to poorly differentiate those with higher levels of HIV knowledge, but items were relatively robust across diverse individuals. Furthermore, age moderated the relationship between CAS and HIV knowledge with older MSM having the strongest association. These findings suggest that additional items are required in order to capture a more nuanced understanding of HIV knowledge and that the association between CAS and HIV knowledge may vary by age.

  7. Applying Bayesian Item Selection Approaches to Adaptive Tests Using Polytomous Items

    ERIC Educational Resources Information Center

    Penfield, Randall D.

    2006-01-01

    This study applied the maximum expected information (MEI) and the maximum posterior-weighted information (MPI) approaches of computer adaptive testing item selection to the case of a test using polytomous items following the partial credit model. The MEI and MPI approaches are described. A simulation study compared the efficiency of ability…

  8. Optimal Item Selection with Credentialing Examinations.

    ERIC Educational Resources Information Center

    Hambleton, Ronald K.; And Others

    The study compared two promising item response theory (IRT) item-selection methods, optimal and content-optimal, with two non-IRT item selection methods, random and classical, for use in fixed-length certification exams. The four methods were used to construct 20-item exams from a pool of approximately 250 items taken from a 1985 certification…

  9. Using Automatic Item Generation to Meet the Increasing Item Demands of High-Stakes Educational and Occupational Assessment

    ERIC Educational Resources Information Center

    Arendasy, Martin E.; Sommer, Markus

    2012-01-01

    The use of new test administration technologies such as computerized adaptive testing in high-stakes educational and occupational assessments demands large item pools. Classic item construction processes and previous approaches to automatic item generation faced the problems of a considerable loss of items after the item calibration phase. In this…

  10. Rehearsal of to-be-remembered items is unnecessary to perform directed forgetting within working memory: Support for an active control mechanism.

    PubMed

    Festini, Sara B; Reuter-Lorenz, Patricia A

    2017-01-01

    Directed forgetting tasks instruct people to forget targeted memoranda. In the context of working memory, people attempt to forget representations that are currently held in mind. Here, we evaluated candidate mechanisms of directed forgetting within working memory, by (a) testing the influence of articulatory suppression, a rehearsal-reducing and attention-demanding secondary task, on directed forgetting efficacy, and by (b) assessing the ability of people to perform forgetting in the absence of to-be-remembered competitors to rehearse. In Experiment 1, articulatory suppression interfered with directed forgetting, increasing the proportion of false alarms to to-be-forgotten probes in the working memory phase and decreasing the magnitude of the directed forgetting effect as assessed by an incidental long-term memory recognition test. Experiment 2 replicated the effects of articulatory suppression and tested whether the simultaneous requirement to retain, and presumably rehearse, to-be-remembered items was necessary for successful forgetting. The long-term directed forgetting effect was equivalent whether or not participants had to-be-remembered items to rehearse during the working memory phase. Experiment 3 included an additional comparison condition and confirmed that articulatory suppression interfered with directed forgetting and that participants were as efficient at directed forgetting with and without competitors to remember. In combination, these experiments suggest that directed forgetting in working memory requires an active control process that is limited by articulatory suppression, and that the demand to remember a concurrent memory set is unnecessary for this control process to operate. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  11. An Evaluation of the Brief Symptom Inventory-18 Using Item Response Theory: Which Items Are Most Strongly Related to Psychological Distress?

    ERIC Educational Resources Information Center

    Meijer, Rob R.; de Vries, Rivka M.; van Bruggen, Vincent

    2011-01-01

    The psychometric structure of the Brief Symptom Inventory-18 (BSI-18; Derogatis, 2001) was investigated using Mokken scaling and parametric item response theory. Data of 487 outpatients, 266 students, and 207 prisoners were analyzed. Results of the Mokken analysis indicated that the BSI-18 formed a strong Mokken scale for outpatients and…

  12. A Model-Free Diagnostic for Single-Peakedness of Item Responses Using Ordered Conditional Means.

    PubMed

    Polak, Marike; de Rooij, Mark; Heiser, Willem J

    2012-09-01

    In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone & Chave's (1929) criterion of irrelevance, which is a graphical, exploratory method for evaluating the "relevance" of dichotomous attitude items. We generalized this criterion to graded response items and quantified the relevance by fitting a unimodal smoother. The resulting goodness-of-fit was used to determine item fit and aggregated scale fit. Based on a simulation procedure, cutoff values were proposed for the measures of item fit. These cutoff values showed high power rates and acceptable Type I error rates. We present 2 applications of the OCM method. First, we apply the OCM method to personality data from the Developmental Profile; second, we analyze attitude data collected by Roberts and Laughlin (1996) concerning opinions of capital punishment.

  13. Detecting Differential Item Discrimination (DID) and the Consequences of Ignoring DID in Multilevel Item Response Models

    ERIC Educational Resources Information Center

    Lee, Woo-yeol; Cho, Sun-Joo

    2017-01-01

    Cross-level invariance in a multilevel item response model can be investigated by testing whether the within-level item discriminations are equal to the between-level item discriminations. Testing the cross-level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model…

  14. An Explanatory Item Response Theory Approach for a Computer-Based Case Simulation Test

    ERIC Educational Resources Information Center

    Kahraman, Nilüfer

    2014-01-01

    Problem: Practitioners working with multiple-choice tests have long utilized Item Response Theory (IRT) models to evaluate the performance of test items for quality assurance. The use of similar applications for performance tests, however, is often encumbered due to the challenges encountered in working with complicated data sets in which local…

  15. Brain activity is related to individual differences in the number of items stored in auditory short-term memory for pitch: evidence from magnetoencephalography.

    PubMed

    Grimault, Stephan; Nolden, Sophie; Lefebvre, Christine; Vachon, François; Hyde, Krista; Peretz, Isabelle; Zatorre, Robert; Robitaille, Nicolas; Jolicoeur, Pierre

    2014-07-01

    We used magnetoencephalography (MEG) to examine brain activity related to the maintenance of non-verbal pitch information in auditory short-term memory (ASTM). We focused on brain activity that increased with the number of items effectively held in memory by the participants during the retention interval of an auditory memory task. We used very simple acoustic materials (i.e., pure tones that varied in pitch) that minimized activation from non-ASTM related systems. MEG revealed neural activity in frontal, temporal, and parietal cortices that increased with a greater number of items effectively held in memory by the participants during the maintenance of pitch representations in ASTM. The present results reinforce the functional role of frontal and temporal cortices in the retention of pitch information in ASTM. This is the first MEG study to provide both fine spatial localization and temporal resolution on the neural mechanisms of non-verbal ASTM for pitch in relation to individual differences in the capacity of ASTM. This research contributes to a comprehensive understanding of the mechanisms mediating the representation and maintenance of basic non-verbal auditory features in the human brain. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. Development of a parent‐reported questionnaire evaluating upper limb activity limitation in children with cerebral palsy

    PubMed Central

    Preston, N.; Levesley, M.; Mon‐Williams, M.; O'Connor, R.J.

    2017-01-01

    Abstract Background and purpose Upper limb activity measures for children with cerebral palsy have a number of limitations, for example, lack of validity and poor responsiveness. To overcome these limitations, we developed the Children's Arm Rehabilitation Measure (ChARM), a parent‐reported questionnaire validated for children with cerebral palsy aged 5–16 years. This paper describes both the development of the ChARM items and response categories and its psychometric testing and further refinement using the Rasch measurement model. Methods To generate valid items for the ChARM, we collected goals of therapy specifically developed by therapists, children with cerebral palsy, and their parents for improving activity limitation of the upper limb. The activities, which were the focus of these goals, formed the basis for the items. Therapists typically break an activity into natural stages for the purpose of improving activity performance, and these natural orders of achievement formed each item's response options. Items underwent face validity testing with health care professionals, parents of children with cerebral palsy, academics, and lay persons. A Rasch analysis was performed on ChARM questionnaires completed by the parents of 170 children with cerebral palsy from 12 hospital paediatric services. The ChARM was amended, and the procedure repeated on 148 ChARMs (from children's mean age: 10 years and 1 month; range: 4 years and 8 months to 16 years and 11 months; 85 males; Manual Ability Classification System Levels I = 9, II = 26, III = 48, IV = 45, and V = 18). Results The final 19‐item unidimensional questionnaire displayed fit to the Rasch model (chi‐square p = .18), excellent reliability (person separation index = 0.95, α = 0.95), and no floor or ceiling effects. Items showed no response bias for gender, distribution of impairment, age, or learning disability. Discussion The ChARM is a psychometrically sound measure of upper limb

  17. The Communicative Participation Item Bank (CPIB): Item bank calibration and development of a disorder-generic short form

    PubMed Central

    Baylor, Carolyn; Yorkston, Kathryn; Eadie, Tanya; Kim, Jiseon; Chung, Hyewon; Amtmann, Dagmar

    2015-01-01

    Purpose The purpose of this study was to calibrate the items for the Communicative Participation Item Bank (CPIB) using Item Response Theory (IRT). One overriding objective was to examine if the IRT item parameters would be consistent across different diagnostic groups, thereby allowing creation of a disorder-generic instrument. The intended outcomes were the final item bank and a short form ready for clinical and research applications. Methods Self-report data were collected from 701 individuals representing four diagnoses: multiple sclerosis, Parkinson’s disease, amyotrophic lateral sclerosis and head and neck cancer. Participants completed the CPIB and additional self-report questionnaires. CPIB data were analyzed using the IRT Graded Response Model (GRM). Results The initial set of 94 candidate CPIB items were reduced to an item bank of 46 items demonstrating unidimensionality, local independence, good item fit, and good measurement precision. Differential item function (DIF) analyses detected no meaningful differences across diagnostic groups. A 10-item, disorder-generic short form was generated. Conclusions The CPIB provides speech-language pathologists with a unidimensional, self-report outcomes measurement instrument dedicated to the construct of communicative participation. This instrument may be useful to clinicians and researchers wanting to implement measures of communicative participation in their work. PMID:23816661

  18. Key Items to Get Right When Conducting a Randomized Controlled Trial in Education

    ERIC Educational Resources Information Center

    Coalition for Evidence-Based Policy, 2005

    2005-01-01

    This is a checklist of key items to get right when conducting a randomized controlled trial to evaluate an educational program or practice ("intervention"). It is intended as a practical resource for researchers and sponsors of research, describing items that are often critical to the success of a randomized controlled trial. A significant…

  19. Executive control processes underlying multi-item working memory

    PubMed Central

    Lara, Antonio H.; Wallis, Jonathan D.

    2014-01-01

    A dominant view of prefrontal cortex (PFC) function is that it stores task-relevant information in working memory. To examine this and determine how it applies when multiple pieces of information must be stored, we trained two macaque monkeys to perform a multi-item color change-detection task and recorded activity of neurons in PFC. Few neurons encoded the color of the items. Instead, the predominant encoding was spatial: a static signal reflecting the item's position and a dynamic signal reflecting the animal's covert attention. These findings challenge the notion that PFC stores task-relevant information. Instead, we suggest that the contribution of PFC is in controlling the allocation of resources to support working memory. In support of this, we found that increased power in the alpha and theta bands of PFC local field potentials, which are thought to reflect long-range communication with other brain areas, was correlated with more precise color representations. PMID:24747574

  20. Item-specific processing reduces false memories.

    PubMed

    McCabe, David P; Presmanes, Alison G; Robertson, Chuck L; Smith, Anderson D

    2004-12-01

    We examined the effect of item-specific and relational encoding instructions on false recognition in two experiments in which the DRM paradigm was used (Deese, 1959; Roediger & McDermott, 1995). Type of encoding (item-specific or relational) was manipulated between subjects in Experiment 1 and within subjects in Experiment 2. Decision-based explanations (e.g., the distinctiveness heuristic) predict reductions in false recognition in between-subjects designs, but not in within-subjects designs, because they are conceptualized as global shifts in decision criteria. Memory-based explanations predict reductions in false recognition in both designs, resulting from enhanced recollection of item-specific details. False recognition was reduced following item-specific encoding instructions in both experiments, favoring a memory-based explanation. These results suggest that providing unique cues for the retrieval of individual studied items results in enhanced discrimination between those studied items and critical lures. Conversely, enhancing the similarity of studied items results in poor discrimination among items within a particular list theme. These results are discussed in terms of the item-specific/ relational framework (Hunt & McDaniel, 1993).

  1. An item response theory analysis of the narcissistic personality inventory.

    PubMed

    Ackerman, Robert A; Donnellan, M Brent; Robins, Richard W

    2012-01-01

    This research uses item response theory methods to evaluate the Narcissistic Personality Inventory (NPI; Raskin & Terry, 1988). Analyses using the 2-parameter logistic model were conducted on the total score and the Corry, Merritt, Mrug, and Pamp (2008) and Ackerman et al. (2011) subscales for the NPI. In addition to offering precise information about the psychometric properties of the NPI item pool, these analyses generated insights that can be used to develop new measures of the personality constructs embedded within this frequently used inventory.

  2. Audio Adapted Assessment Data: Does the Addition of Audio to Written Items Modify the Item Calibration?

    ERIC Educational Resources Information Center

    Snyder, James

    2010-01-01

    This dissertation research examined the changes in item RIT calibration that occurred when adding audio to a set of currently calibrated RIT items and then placing these new items as field test items in the modified assessments on the NWEA MAP test platform. The researcher used test results from over 600 students in the Poway School District in…

  3. Hidden Item Variance in Multiple Mini-Interview Scores

    ERIC Educational Resources Information Center

    Zaidi, Nikki L.; Swoboda, Christopher M.; Kelcey, Benjamin M.; Manuel, R. Stephen

    2017-01-01

    The extant literature has largely ignored a potentially significant source of variance in multiple mini-interview (MMI) scores by "hiding" the variance attributable to the sample of attributes used on an evaluation form. This potential source of hidden variance can be defined as rating items, which typically comprise an MMI evaluation…

  4. Uncovering underlying processes of semantic priming by correlating item-level effects.

    PubMed

    Heyman, Tom; Hutchison, Keith A; Storms, Gert

    2016-04-01

    The current study examines the underlying processes of semantic priming using the largest priming database available (i.e., Semantic Priming Project, Hutchison et al. Behavior Research Methods, 45(4), 1099-1114, 2013). Specifically, it compares priming effects in two tasks: lexical decision and pronunciation. Task similarities were assessed at two different stimulus onset asynchronies (SOAs) (i.e., 200 and 1,200 ms) and for both primary and other associates. To evaluate how consistent priming is across these two tasks, item-level priming effects obtained in each task were correlated for each condition separately. The results revealed significant correlations at the short SOA for both primary and other associates. The correlations at the long SOA were significantly smaller and only reached significance when z-transformed response times were used. Furthermore, this pattern remained essentially the same when only asymmetric forward associates (e.g., panda-bear) were considered, suggesting that the cross-task stability at the short SOA was not merely caused by retrospective processes such as semantic matching. Instead, these findings provide evidence for a rapidly operating, item-based, relational characteristic such as spreading activation.

  5. Quality of surgical randomized controlled trials for acute cholecystitis: assessment based on CONSORT and additional check items.

    PubMed

    Shikata, Satoru; Nakayama, Takeo; Yamagishi, Hisakazu

    2008-01-01

    In this study, we conducted a limited survey of reports of surgical randomized controlled trials, using the consolidated standards of reporting trials (CONSORT) statement and additional check items to clarify problems in the evaluation of surgical reports. A total of 13 randomized trials were selected from two latest review articles on biliary surgery. Each randomized trial was evaluated according to 28 quality measures that comprised items from the CONSORT statement plus additional items. Analysis focused on relationships between the quality of each study and the estimated effect gap ("pooled estimate in meta-analysis" -- "estimated effect of each study"). No definite relationships were found between individual study quality and the estimated effect gap. The following items could have been described but were not provided in almost all the surgical RCT reports: "clearly defined outcomes"; "details of randomization"; "participant flow charts"; "intention-to-treat analysis"; "ancillary analyses"; and "financial conflicts of interest". The item, "participation of a trial methodologist in the study" was not found in any of the reports. Although the quality of reporting trials is not always related to a biased estimation of treatment effect, the items used for quality measures must be described to enable readers to evaluate the quality and applicability of the reporting. Further development of an assessment tool is needed for items specific to surgical randomized controlled trials.

  6. A Mixed Effects Randomized Item Response Model

    ERIC Educational Resources Information Center

    Fox, J.-P.; Wyrick, Cheryl

    2008-01-01

    The randomized response technique ensures that individual item responses, denoted as true item responses, are randomized before observing them and so-called randomized item responses are observed. A relationship is specified between randomized item response data and true item response data. True item response data are modeled with a (non)linear…

  7. Item Response Theory with Covariates (IRT-C): Assessing Item Recovery and Differential Item Functioning for the Three-Parameter Logistic Model

    ERIC Educational Resources Information Center

    Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.

    2016-01-01

    In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…

  8. 41 CFR 102-36.435 - How do we identify Munitions List Items (MLIs)/Commerce Control List Items (CCLIs) requiring...

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... Munitions List Items (MLIs)/Commerce Control List Items (CCLIs) requiring demilitarization? 102-36.435... Personal Property Whose Disposal Requires Special Handling Munitions List Items/commerce Control List Items (mlis/cclis) § 102-36.435 How do we identify Munitions List Items (MLIs)/Commerce Control List Items...

  9. 41 CFR 102-36.435 - How do we identify Munitions List Items (MLIs)/Commerce Control List Items (CCLIs) requiring...

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... Munitions List Items (MLIs)/Commerce Control List Items (CCLIs) requiring demilitarization? 102-36.435... Personal Property Whose Disposal Requires Special Handling Munitions List Items/commerce Control List Items (mlis/cclis) § 102-36.435 How do we identify Munitions List Items (MLIs)/Commerce Control List Items...

  10. 41 CFR 102-36.435 - How do we identify Munitions List Items (MLIs)/Commerce Control List Items (CCLIs) requiring...

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... Munitions List Items (MLIs)/Commerce Control List Items (CCLIs) requiring demilitarization? 102-36.435... Personal Property Whose Disposal Requires Special Handling Munitions List Items/commerce Control List Items (mlis/cclis) § 102-36.435 How do we identify Munitions List Items (MLIs)/Commerce Control List Items...

  11. 41 CFR 102-36.435 - How do we identify Munitions List Items (MLIs)/Commerce Control List Items (CCLIs) requiring...

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... Munitions List Items (MLIs)/Commerce Control List Items (CCLIs) requiring demilitarization? 102-36.435... Personal Property Whose Disposal Requires Special Handling Munitions List Items/commerce Control List Items (mlis/cclis) § 102-36.435 How do we identify Munitions List Items (MLIs)/Commerce Control List Items...

  12. 41 CFR 102-36.435 - How do we identify Munitions List Items (MLIs)/Commerce Control List Items (CCLIs) requiring...

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... Munitions List Items (MLIs)/Commerce Control List Items (CCLIs) requiring demilitarization? 102-36.435... Personal Property Whose Disposal Requires Special Handling Munitions List Items/commerce Control List Items (mlis/cclis) § 102-36.435 How do we identify Munitions List Items (MLIs)/Commerce Control List Items...

  13. A new item response theory model to adjust data allowing examinee choice

    PubMed Central

    Costa, Marcelo Azevedo; Braga Oliveira, Rivert Paulo

    2018-01-01

    In a typical questionnaire testing situation, examinees are not allowed to choose which items they answer because of a technical issue in obtaining satisfactory statistical estimates of examinee ability and item difficulty. This paper introduces a new item response theory (IRT) model that incorporates information from a novel representation of questionnaire data using network analysis. Three scenarios in which examinees select a subset of items were simulated. In the first scenario, the assumptions required to apply the standard Rasch model are met, thus establishing a reference for parameter accuracy. The second and third scenarios include five increasing levels of violating those assumptions. The results show substantial improvements over the standard model in item parameter recovery. Furthermore, the accuracy was closer to the reference in almost every evaluated scenario. To the best of our knowledge, this is the first proposal to obtain satisfactory IRT statistical estimates in the last two scenarios. PMID:29389996

  14. Developing an item bank and short forms that assess the impact of asthma on quality of life.

    PubMed

    Stucky, Brian D; Edelen, Maria Orlando; Sherbourne, Cathy D; Eberhart, Nicole K; Lara, Marielena

    2014-02-01

    The present work describes the process of developing an item bank and short forms that measure the impact of asthma on quality of life (QoL) that avoids confounding QoL with asthma symptomatology and functional impairment. Using a diverse national sample of adults with asthma (N = 2032) we conducted exploratory and confirmatory factor analyses, and item response theory and differential item functioning analyses to develop a 65-item unidimensional item bank and separate short form assessments. A psychometric evaluation of the RAND Impact of Asthma on QoL item bank (RAND-IAQL) suggests that though the concept of asthma impact on QoL is multi-faceted, it may be measured as a single underlying construct. The performance of the bank was then evaluated with a real-data simulated computer adaptive test. From the RAND-IAQL item bank we then developed two short forms consisting of 4 and 12 items (reliability = 0.86 and 0.93, respectively). A real-data simulated computer adaptive test suggests that as few as 4-5 items from the bank are needed to obtain highly precise scores. Preliminary validity results indicate that the RAND-IAQL measures distinguish between levels of asthma control. To measure the impact of asthma on QoL, users of these items may choose from two highly reliable short forms, computer adaptive test administration, or content-specific subsets of items from the bank tailored to their specific needs. Copyright © 2013 Elsevier Ltd. All rights reserved.

  15. Psychometric properties of the 25-item National Eye Institute Visual Function Questionnaire (NEI VFQ-25), Japanese version.

    PubMed

    Suzukamo, Yoshimi; Oshika, Tetsuro; Yuzawa, Mitsuko; Tokuda, Yoshihiro; Tomidokoro, Atsuo; Oki, Kotaro; Mangione, Carol M; Green, Joseph; Fukuhara, Shunichi

    2005-10-26

    The importance of evaluating the outcomes of health care from the standpoint of the patient is now widely recognized. The purpose of this study is to develop and test a Japanese version of the National Eye Institute Visual Function Questionnaire (NEI VFQ-25). A Japanese version was developed with a previously standardized method. The questionnaire and optional items were completed by 245 patients with cataracts, glaucoma, or age-related macular degeneration, by 110 others before and after cataract surgery, and by a reference group (n = 31). We computed rates of missing data, measured reproducibility and internal consistency reliability, and tested for convergent and discriminant validity, concurrent validity, known-groups validity, factor structure, and responsiveness to change. Based on information from the participants, some items were changed to 2-step items (asking if an activity was done, and if it was done, then asking how difficult it was). The near-vision and distance-vision subscales each had 1 item that was endorsed by very few participants, so these items were replaced with items that were optional in the English version. For example, more than 60% of participants did not drive, so the driving question was excluded. Reliability and validity were adequate for all subscales except driving, ocular pain, color vision, and peripheral vision. With cataract surgery, most scores improved by at least 20 points. With minor modifications from the English version, the Japanese NEI VFQ-25 can give reliable, valid, responsive data on vision-related quality of life, for group-level comparisons or for tracking therapeutic outcomes.

  16. Identification and Development of Items Comprising Organizational Citizenship Behaviors Among Pharmacy Faculty

    PubMed Central

    Semsick, Gretchen R.

    2016-01-01

    Objective. Identify behaviors that can compose a measure of organizational citizenship by pharmacy faculty. Methods. A four-round, modified Delphi procedure using open-ended questions (Round 1) was conducted with 13 panelists from pharmacy academia. The items generated were evaluated and refined for inclusion in subsequent rounds. A consensus was reached after completing four rounds. Results. The panel produced a set of 26 items indicative of extra-role behaviors by faculty colleagues considered to compose a measure of citizenship, which is an expressed manifestation of collegiality. Conclusions. The items generated require testing for validation and reliability in a large sample to create a measure of organizational citizenship. Even prior to doing so, the list of items can serve as a resource for mentorship of junior and senior faculty alike. PMID:28179717

  17. Ramsay-Curve Differential Item Functioning

    ERIC Educational Resources Information Center

    Woods, Carol M.

    2011-01-01

    Differential item functioning (DIF) occurs when an item on a test, questionnaire, or interview has different measurement properties for one group of people versus another, irrespective of true group-mean differences on the constructs being measured. This article is focused on item response theory based likelihood ratio testing for DIF (IRT-LR or…

  18. Item Response Theory analysis of Fagerström Test for Cigarette Dependence.

    PubMed

    Svicher, Andrea; Cosci, Fiammetta; Giannini, Marco; Pistelli, Francesco; Fagerström, Karl

    2018-02-01

    The Fagerström Test for Cigarette Dependence (FTCD) and the Heaviness of Smoking Index (HSI) are the gold standard measures to assess cigarette dependence. However, FTCD reliability and factor structure have been questioned and HSI psychometric properties are in need of further investigations. The present study examined the psychometrics properties of the FTCD and the HSI via the Item Response Theory. The study was a secondary analysis of data collected in 862 Italian daily smokers. Confirmatory factor analysis was run to evaluate the dimensionality of FTCD. A Grade Response Model was applied to FTCD and HSI to verify the fit to the data. Both item and test functioning were analyzed and item statistics, Test Information Function, and scale reliabilities were calculated. Mokken Scale Analysis was applied to estimate homogeneity and Loevinger's coefficients were calculated. The FTCD showed unidimensionality and homogeneity for most of the items and for the total score. It also showed high sensitivity and good reliability from medium to high levels of cigarette dependence, although problems related to some items (i.e., items 3 and 5) were evident. HSI had good homogeneity, adequate item functioning, and high reliability from medium to high levels of cigarette dependence. Significant Differential Item Functioning was found for items 1, 4, 5 of the FTCD and for both items of HSI. HSI seems highly recommended in clinical settings addressed to heavy smokers while FTCD would be better used in smokers with a level of cigarette dependence ranging between low and high. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Psychometric properties of the Triarchic Psychopathy Measure: An item response theory approach.

    PubMed

    Shou, Yiyun; Sellbom, Martin; Xu, Jing

    2018-05-01

    There is cumulative evidence for the cross-cultural validity of the Triarchic Psychopathy Measure (TriPM; Patrick, 2010) among non-Western populations. Recent studies using correlational and regression analyses show promising construct validity of the TriPM in Chinese samples. However, little is known about the efficiency of items in TriPM in assessing the proposed latent traits. The current study evaluated the psychometric properties of the Chinese TriPM at the item level using item response theory analyses. It also examined the measurement invariance of the TriPM between the Chinese and the U.S. student samples by applying differential item functioning analyses under the item response theory framework. The results supported the unidimensional nature of the Disinhibition and Meanness scales. Both scales had a greater level of precision in the respective underlying constructs at the positive ends. The two scales, however, had several items that were weakly associated with their respective latent traits in the Chinese student sample. Boldness, on the other hand, was found to be multidimensional, and reflected a more normally distributed range of variation. The examination of measurement bias via differential item functioning analyses revealed that a number of items of the TriPM were not equivalent across the Chinese and the U.S. Some modification and adaptation of items might be considered for improving the precision of the TriPM for Chinese participants. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  20. Item-Level Time Limits Are Not a Panacea

    ERIC Educational Resources Information Center

    Schmitz, Florian; Wilhelm, Oliver

    2015-01-01

    The excellent paper by Goldhammer (this issue) deals with a most relevant and very pervasive problem of ability assessment: the evaluation of performance by considering speed and accuracy of performance. Goldhammer proposes item-level time limits as a possible remedy for individual differences in the speed-accuracy trade-off (SATO) to keep time…

  1. 41 CFR 102-36.430 - May we dispose of excess Munitions List Items (MLIs)/Commerce Control List Items (CCLIs)?

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... Munitions List Items (MLIs)/Commerce Control List Items (CCLIs)? 102-36.430 Section 102-36.430 Public... Disposal Requires Special Handling Munitions List Items/commerce Control List Items (mlis/cclis) § 102-36.430 May we dispose of excess Munitions List Items (MLIs)/Commerce Control List Items (CCLIs)? You may...

  2. 41 CFR 102-36.430 - May we dispose of excess Munitions List Items (MLIs)/Commerce Control List Items (CCLIs)?

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... Munitions List Items (MLIs)/Commerce Control List Items (CCLIs)? 102-36.430 Section 102-36.430 Public... Disposal Requires Special Handling Munitions List Items/commerce Control List Items (mlis/cclis) § 102-36.430 May we dispose of excess Munitions List Items (MLIs)/Commerce Control List Items (CCLIs)? You may...

  3. 41 CFR 102-36.430 - May we dispose of excess Munitions List Items (MLIs)/Commerce Control List Items (CCLIs)?

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... Munitions List Items (MLIs)/Commerce Control List Items (CCLIs)? 102-36.430 Section 102-36.430 Public... Disposal Requires Special Handling Munitions List Items/commerce Control List Items (mlis/cclis) § 102-36.430 May we dispose of excess Munitions List Items (MLIs)/Commerce Control List Items (CCLIs)? You may...

  4. 41 CFR 102-36.430 - May we dispose of excess Munitions List Items (MLIs)/Commerce Control List Items (CCLIs)?

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... Munitions List Items (MLIs)/Commerce Control List Items (CCLIs)? 102-36.430 Section 102-36.430 Public... Disposal Requires Special Handling Munitions List Items/commerce Control List Items (mlis/cclis) § 102-36.430 May we dispose of excess Munitions List Items (MLIs)/Commerce Control List Items (CCLIs)? You may...

  5. 41 CFR 102-36.430 - May we dispose of excess Munitions List Items (MLIs)/Commerce Control List Items (CCLIs)?

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... Munitions List Items (MLIs)/Commerce Control List Items (CCLIs)? 102-36.430 Section 102-36.430 Public... Disposal Requires Special Handling Munitions List Items/commerce Control List Items (mlis/cclis) § 102-36.430 May we dispose of excess Munitions List Items (MLIs)/Commerce Control List Items (CCLIs)? You may...

  6. Effects of Anchor Item Methods on the Detection of Differential Item Functioning within the Family of Rasch Models

    ERIC Educational Resources Information Center

    Wang, Wen-Chung

    2004-01-01

    Scale indeterminacy in analysis of differential item functioning (DIF) within the framework of item response theory can be resolved by imposing 3 anchor item methods: the equal-mean-difficulty method, the all-other anchor item method, and the constant anchor item method. In this article, applicability and limitations of these 3 methods are…

  7. Unidimensional Interpretations for Multidimensional Test Items

    ERIC Educational Resources Information Center

    Kahraman, Nilufer

    2013-01-01

    This article considers potential problems that can arise in estimating a unidimensional item response theory (IRT) model when some test items are multidimensional (i.e., show a complex factorial structure). More specifically, this study examines (1) the consequences of model misfit on IRT item parameter estimates due to unintended minor item-level…

  8. The utility of single-item readiness screeners in middle school.

    PubMed

    Lewis, Crystal G; Herman, Keith C; Huang, Francis L; Stormont, Melissa; Grossman, Caroline; Eddy, Colleen; Reinke, Wendy M

    2017-10-01

    This study examined the benefit of utilizing one-item academic and one-item behavior readiness teacher-rated screeners at the beginning of the school year to predict end-of-school year outcomes for middle school students. The Middle School Academic and Behavior Readiness (M-ABR) screeners were developed to provide an efficient and effective way to assess readiness in students. Participants included 889 students in 62 middle school classrooms in an urban Missouri school district. Concurrent validity with the M-ABR items and other indicators of readiness in the fall were evaluated using Pearson product-moment correlation coefficients, with the academic readiness item having medium to strong correlations with other baseline academic indicators (r=±0.56 to 0.91) and the behavior readiness item having low to strong correlations with baseline behavior items (r=±0.20 to 0.79). Next, the predictive validity of the M-ABR items was analyzed with hierarchical linear regressions using end-of-year outcomes as the dependent variable. The academic and behavior readiness items demonstrated adequate validity for all outcomes with moderate effects (β=±0.31 to 0.73 for academic outcomes and β=±0.24 to 0.59 for behavioral outcomes) after controlling for baseline demographics. Even after controlling for baseline scores, the M-ABR items predicted unique variance in almost all outcome variables. Four conditional probability indices were calculated to obtain an optimal cut score, to determine ready vs. not ready, for both single-item M-ABR scales. The cut point of "fair" yielded the most acceptable values for the indices. The odd ratios (OR) of experiencing negative outcomes given a "fair" or lower readiness rating (2 or below on the M-ABR screeners) at the beginning of the year were significant and strong for all outcomes (OR=2.29 to OR=14.46), except for internalizing problems. These findings suggest promise for using single readiness items to screen for varying negative end

  9. The Assignment of Raters to Items: Controlling for Rater Effects.

    ERIC Educational Resources Information Center

    Sykes, Robert C.; Heidorn, Mark; Lee, Guemin

    A study was conducted to evaluate the effect of different modes (modalities) of assigning raters to test items. The impact on total constructed response (c.r.) score, and subsequently on total test score, of assigning a single versus multiple raters to an examination reading of a student's set of c.r. responses was evaluated for several mixed-item…

  10. Development of the PROMIS positive emotional and sensory expectancies of smoking item banks.

    PubMed

    Tucker, Joan S; Shadel, William G; Edelen, Maria Orlando; Stucky, Brian D; Li, Zhen; Hansen, Mark; Cai, Li

    2014-09-01

    The positive emotional and sensory expectancies of cigarette smoking include improved cognitive abilities, positive affective states, and pleasurable sensorimotor sensations. This paper describes development of Positive Emotional and Sensory Expectancies of Smoking item banks that will serve to standardize the assessment of this construct among daily and nondaily cigarette smokers. Data came from daily (N = 4,201) and nondaily (N =1,183) smokers who completed an online survey. To identify a unidimensional set of items, we conducted item factor analyses, item response theory analyses, and differential item functioning analyses. Additionally, we evaluated the performance of fixed-item short forms (SFs) and computer adaptive tests (CATs) to efficiently assess the construct. Eighteen items were included in the item banks (15 common across daily and nondaily smokers, 1 unique to daily, 2 unique to nondaily). The item banks are strongly unidimensional, highly reliable (reliability = 0.95 for both), and perform similarly across gender, age, and race/ethnicity groups. A SF common to daily and nondaily smokers consists of 6 items (reliability = 0.86). Results from simulated CATs indicated that, on average, less than 8 items are needed to assess the construct with adequate precision using the item banks. These analyses identified a new set of items that can assess the positive emotional and sensory expectancies of smoking in a reliable and standardized manner. Considerable efficiency in assessing this construct can be achieved by using the item bank SF, employing computer adaptive tests, or selecting subsets of items tailored to specific research or clinical purposes. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  11. An Item-Driven Adaptive Design for Calibrating Pretest Items. Research Report. ETS RR-14-38

    ERIC Educational Resources Information Center

    Ali, Usama S.; Chang, Hua-Hua

    2014-01-01

    Adaptive testing is advantageous in that it provides more efficient ability estimates with fewer items than linear testing does. Item-driven adaptive pretesting may also offer similar advantages, and verification of such a hypothesis about item calibration was the main objective of this study. A suitability index (SI) was introduced to adaptively…

  12. Identifying content for the glaucoma-specific item bank to measure quality-of-life parameters.

    PubMed

    Khadka, Jyoti; McAlinden, Colm; Craig, Jamie E; Fenwick, Eva K; Lamoureux, Ecosse L; Pesudovs, Konrad

    2015-01-01

    Patient-reported outcomes (PROs) have become essential clinical trial end points. However, a comprehensive, multidimensional, patient-relevant, and precise glaucoma-specific PRO instrument is not available. Therefore, the purpose of this study was to identify content for a new, glaucoma-specific, quality-of-life (QOL) item bank. Content identification was undertaken in 5 phases: (1) identification of extant items in glaucoma-specific instruments and the qualitative literature; (2) focus groups and interviews with glaucoma patients; (3) item classification and selection; (4) expert review and revision of items; and (5) cognitive interviews with patients. A total of 737 unique items (extant items from PRO instruments, 247; qualitative articles, 14 items; focus groups and semistructured interviews, 476 items) were identified. These items were classified into 10 QOL domains. Four criteria (item redundancy, item inconsistent with domain definition, item content too narrow to have wider applicability, and item clarity) were used to remove and refine the items. After the cognitive interviews, the final minimally representative item set had a total of 342 unique items belonging to 10 domains: activity limitation (88), mobility (20), visual symptoms (19), ocular surface symptoms (22), general symptoms (15), convenience (39), health concerns (45), emotional well-being (49), social issues (23), and economic issues (22). The systematic content identification process identified 10 QOL domains, which were important to patients with glaucoma. The majority of the items were identified from the patient-specific focus groups and semistructured interviews suggesting that the existing PRO instruments do not adequately address QOL issues relevant to individuals with glaucoma.

  13. Demand Characteristics of Multiple-Choice Items.

    ERIC Educational Resources Information Center

    Diamond, James J.; Williams, David V.

    Thirteen graduate students were asked to indicate for each of 24 multiple-choice items whether the item tested "recall of specific information," a "higher order skill," or "don't know." The students were also asked to state their general basis for judging the items. The 24 items had been previously classified according to Bloom's cognitive-skills…

  14. 17 CFR 260.7a-16 - Inclusion of items, differentiation between items and answers, omission of instructions.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 17 Commodity and Securities Exchanges 3 2012-04-01 2012-04-01 false Inclusion of items, differentiation between items and answers, omission of instructions. 260.7a-16 Section 260.7a-16 Commodity and... INDENTURE ACT OF 1939 Formal Requirements § 260.7a-16 Inclusion of items, differentiation between items and...

  15. 17 CFR 260.7a-16 - Inclusion of items, differentiation between items and answers, omission of instructions.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 17 Commodity and Securities Exchanges 4 2014-04-01 2014-04-01 false Inclusion of items, differentiation between items and answers, omission of instructions. 260.7a-16 Section 260.7a-16 Commodity and... INDENTURE ACT OF 1939 Formal Requirements § 260.7a-16 Inclusion of items, differentiation between items and...

  16. 17 CFR 260.7a-16 - Inclusion of items, differentiation between items and answers, omission of instructions.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 17 Commodity and Securities Exchanges 3 2013-04-01 2013-04-01 false Inclusion of items, differentiation between items and answers, omission of instructions. 260.7a-16 Section 260.7a-16 Commodity and... INDENTURE ACT OF 1939 Formal Requirements § 260.7a-16 Inclusion of items, differentiation between items and...

  17. 17 CFR 260.7a-16 - Inclusion of items, differentiation between items and answers, omission of instructions.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 17 Commodity and Securities Exchanges 3 2011-04-01 2011-04-01 false Inclusion of items, differentiation between items and answers, omission of instructions. 260.7a-16 Section 260.7a-16 Commodity and... INDENTURE ACT OF 1939 Formal Requirements § 260.7a-16 Inclusion of items, differentiation between items and...

  18. 17 CFR 260.7a-16 - Inclusion of items, differentiation between items and answers, omission of instructions.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 17 Commodity and Securities Exchanges 3 2010-04-01 2010-04-01 false Inclusion of items, differentiation between items and answers, omission of instructions. 260.7a-16 Section 260.7a-16 Commodity and... INDENTURE ACT OF 1939 Formal Requirements § 260.7a-16 Inclusion of items, differentiation between items and...

  19. Sample Size and Item Parameter Estimation Precision When Utilizing the One-Parameter "Rasch" Model

    ERIC Educational Resources Information Center

    Custer, Michael

    2015-01-01

    This study examines the relationship between sample size and item parameter estimation precision when utilizing the one-parameter model. Item parameter estimates are examined relative to "true" values by evaluating the decline in root mean squared deviation (RMSD) and the number of outliers as sample size increases. This occurs across…

  20. Improving the Quality of Innovative Item Types: Four Tasks for Design and Development

    ERIC Educational Resources Information Center

    Parshall, Cynthia G.; Harmes, J. Christine

    2009-01-01

    Many exam programs have begun to include innovative item types in their operational assessments. While innovative item types appear to have great promise for expanding measurement, there can also be genuine challenges to their successful implementation. In this paper we present a set of four activities that can be beneficially incorporated into…

  1. Relational and item-specific influences on generate-recognize processes in recall.

    PubMed

    Guynn, Melissa J; McDaniel, Mark A; Strosser, Garrett L; Ramirez, Juan M; Castleberry, Erica H; Arnett, Kristen H

    2014-02-01

    The generate-recognize model and the relational-item-specific distinction are two approaches to explaining recall. In this study, we consider the two approaches in concert. Following Jacoby and Hollingshead (Journal of Memory and Language 29:433-454, 1990), we implemented a production task and a recognition task following production (1) to evaluate whether generation and recognition components were evident in cued recall and (2) to gauge the effects of relational and item-specific processing on these components. An encoding task designed to augment item-specific processing (anagram-transposition) produced a benefit on the recognition component (Experiments 1-3) but no significant benefit on the generation component (Experiments 1-3), in the context of a significant benefit to cued recall. By contrast, an encoding task designed to augment relational processing (category-sorting) did produce a benefit on the generation component (Experiment 3). These results converge on the idea that in recall, item-specific processing impacts a recognition component, whereas relational processing impacts a generation component.

  2. Comparing Methods for Item Analysis: The Impact of Different Item-Selection Statistics on Test Difficulty

    ERIC Educational Resources Information Center

    Jones, Andrew T.

    2011-01-01

    Practitioners often depend on item analysis to select items for exam forms and have a variety of options available to them. These include the point-biserial correlation, the agreement statistic, the B index, and the phi coefficient. Although research has demonstrated that these statistics can be useful for item selection, no research as of yet has…

  3. The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory

    ERIC Educational Resources Information Center

    Sahin, Alper; Anil, Duygu

    2017-01-01

    This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…

  4. Behavioral decoding of working memory items inside and outside the focus of attention.

    PubMed

    Mallett, Remington; Lewis-Peacock, Jarrod A

    2018-03-31

    How we attend to our thoughts affects how we attend to our environment. Holding information in working memory can automatically bias visual attention toward matching information. By observing attentional biases on reaction times to visual search during a memory delay, it is possible to reconstruct the source of that bias using machine learning techniques and thereby behaviorally decode the content of working memory. Can this be done when more than one item is held in working memory? There is some evidence that multiple items can simultaneously bias attention, but the effects have been inconsistent. One explanation may be that items are stored in different states depending on the current task demands. Recent models propose functionally distinct states of representation for items inside versus outside the focus of attention. Here, we use behavioral decoding to evaluate whether multiple memory items-including temporarily irrelevant items outside the focus of attention-exert biases on visual attention. Only the single item in the focus of attention was decodable. The other item showed a brief attentional bias that dissipated until it returned to the focus of attention. These results support the idea of dynamic, flexible states of working memory across time and priority. © 2018 New York Academy of Sciences.

  5. Approximation Preserving Reductions among Item Pricing Problems

    NASA Astrophysics Data System (ADS)

    Hamane, Ryoso; Itoh, Toshiya; Tomita, Kouhei

    When a store sells items to customers, the store wishes to determine the prices of the items to maximize its profit. Intuitively, if the store sells the items with low (resp. high) prices, the customers buy more (resp. less) items, which provides less profit to the store. So it would be hard for the store to decide the prices of items. Assume that the store has a set V of n items and there is a set E of m customers who wish to buy those items, and also assume that each item i ∈ V has the production cost di and each customer ej ∈ E has the valuation vj on the bundle ej ⊆ V of items. When the store sells an item i ∈ V at the price ri, the profit for the item i is pi = ri - di. The goal of the store is to decide the price of each item to maximize its total profit. We refer to this maximization problem as the item pricing problem. In most of the previous works, the item pricing problem was considered under the assumption that pi ≥ 0 for each i ∈ V, however, Balcan, et al. [In Proc. of WINE, LNCS 4858, 2007] introduced the notion of “loss-leader, ” and showed that the seller can get more total profit in the case that pi < 0 is allowed than in the case that pi < 0 is not allowed. In this paper, we derive approximation preserving reductions among several item pricing problems and show that all of them have algorithms with good approximation ratio.

  6. Item-Specific and Generalization Effects on Brain Activation when Learning Chinese Characters

    ERIC Educational Resources Information Center

    Deng, Yuan; Booth, James R.; Chou, Tai-Li; Ding, Guo-Sheng; Peng, Dan-Ling

    2008-01-01

    Neural changes related to learning of the meaning of Chinese characters in English speakers were examined using functional magnetic resonance imaging (fMRI). We examined item specific learning effects for trained characters, but also the generalization of semantic knowledge to novel transfer characters that shared a semantic radical (part of a…

  7. Which Statistic Should Be Used to Detect Item Preknowledge When the Set of Compromised Items Is Known?

    PubMed

    Sinharay, Sandip

    2017-09-01

    Benefiting from item preknowledge is a major type of fraudulent behavior during educational assessments. Belov suggested the posterior shift statistic for detection of item preknowledge and showed its performance to be better on average than that of seven other statistics for detection of item preknowledge for a known set of compromised items. Sinharay suggested a statistic based on the likelihood ratio test for detection of item preknowledge; the advantage of the statistic is that its null distribution is known. Results from simulated and real data and adaptive and nonadaptive tests are used to demonstrate that the Type I error rate and power of the statistic based on the likelihood ratio test are very similar to those of the posterior shift statistic. Thus, the statistic based on the likelihood ratio test appears promising in detecting item preknowledge when the set of compromised items is known.

  8. 48 CFR 52.212-3 - Offeror Representations and Certifications-Commercial Items.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ...— (i) To restrict the free flow of unbiased information in Iran; or (ii) To disrupt, monitor, or... End Products: Line Item No.: Country of Origin: (List as necessary) (3) The Government will evaluate... will evaluate offers in accordance with the policies and procedures of FAR Part 25. (2) Buy American...

  9. 48 CFR 52.212-3 - Offeror Representations and Certifications-Commercial Items.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... that is to be used specifically— (i) To restrict the free flow of unbiased information in Iran; or (ii... End Products: Line Item No.: Country of Origin: (List as necessary) (3) The Government will evaluate... necessary) (iv) The Government will evaluate offers in accordance with the policies and procedures of FAR...

  10. Trash track--active location sensing for evaluating e-waste transportation.

    PubMed

    Offenhuber, Dietmar; Wolf, Malima I; Ratti, Carlo

    2013-02-01

    Waste and recycling systems are complex and far-reaching, but its mechanisms are poorly understood by the public, in some cases government organizations and even the waste management sector itself. The lack of empirical data makes it challenging to assess the environmental impact of trash collection, removal and disposal. This is especially the case for the global movement of electronic wastes. Senseable City Lab's Trash Track project tackles this scarcity of data by following the trajectories of individual objects. The project presents a methodology involving active location sensors that were placed on end-of-life products donated by volunteers in the Seattle, Washington area. These tags sent location messages chronicling their journey, some over the course of a month or more. In this paper, the authors focus on the analysis of traces acquired from 146 items of electronic waste, estimating evaluating the environmental impact, including the travel distances and end-of-life treatments for the products. Combining this information with impact evaluation from the US Environmental Protection Agency's Waste Reduction Model (WARM) allows for the creation of environmental impact profiles for individual pieces of trash.

  11. Factoring handedness data: I. Item analysis.

    PubMed

    Messinger, H B; Messinger, M I

    1995-12-01

    Recently in this journal Peters and Murphy challenged the validity of factor analyses done on bimodal handedness data, suggesting instead that right- and left-handers be studied separately. But bimodality may be avoidable if attention is paid to Oldfield's questionnaire format and instructions for the subjects. Two characteristics appear crucial: a two-column LEFT-RIGHT format for the body of the instrument and what we call Oldfield's Admonition: not to indicate strong preference for handedness item, such as write, unless "... the preference is so strong that you would never try to use the other hand unless absolutely forced to...". Attaining unimodality of an item distribution would seem to overcome the objections of Peters and Murphy. In a 1984 survey in Boston we used Oldfield's ten-item questionnaire exactly as published. This produced unimodal item distributions. With reflection of the five-point item scale and a logarithmic transformation, we achieved a degree of normalization for the items. Two surveys elsewhere based on Oldfield's 20-item list but with changes in the questionnaire format and the instructions, yielded markedly different item distributions with peaks at each extreme and sometimes in the middle as well.

  12. Reliability and validity of the 12-item WHODAS 2.0 in patients with Kashin-Beck disease.

    PubMed

    Younus, Mohammad Imran; Wang, Di-Miao; Yu, Fang-Fang; Fang, Hua; Guo, Xiong

    2017-09-01

    The purpose of this study was to check the reliability and validity of the 12-item Chinese version of the World Health Organization Disability Assessment Schedule 2.0 (WHODAS 2.0) for the assessment of disability in patients with Kashin-Beck disease (KBD). We recruited 219 patients with KBD from the high-risk KBD area in the Shaanxi province, using stratified multistage random sampling. We assessed each patient using the Chinese version of the 12-item WHODAS 2.0 and the Western Ontario and McMaster Universities Index of Osteoarthritis (WOMAC). Statistical evaluations of the instruments consisted of Cronbach's alpha, intraclass correlation coefficient (ICC), confirmatory factor analysis (CFA), and Pearson's correlation coefficient. Cronbach's alpha and ICC for the six domains ranged from 0.704 to 0.906 and 0.690 to 0.852, respectively. A six-factor structure fits the data well (CFI = 0.967, TLI = 0.944, RMSEA = 0.08). Regarding convergent validity, the four domains of the 12-item WHODAS 2.0 (getting around, self-care, life activity, and participation) showed moderate-to-strong correlation for all three domains of the WOMAC (0.428 < |r| < 0.804). Regarding divergent validity, the two domains of the 12-item WHODAS 2.0 (understanding and communication, and getting along with people) showed weak correlation for the three domains of WOMAC (0.182 < |r| < 0.295). The Chinese version of 12-item WHODAS 2.0 questionnaire is a reliable and valid instrument when administered to KBD patients.

  13. Female Sexual Function Index Short Version: A MsFLASH Item Response Analysis.

    PubMed

    Carpenter, Janet S; Jones, Salene M W; Studts, Christina R; Heiman, Julia R; Reed, Susan D; Newton, Katherine M; Guthrie, Katherine A; Larson, Joseph C; Cohen, Lee S; Freeman, Ellen W; Jane Lau, R; Learman, Lee A; Shifren, Jan L

    2016-11-01

    The Female Sexual Function Index (FSFI) is a psychometrically sound and popular 19-item self-report measure, but its length may preclude its use in studies with multiple outcome measures, especially when sexual function is not a primary endpoint. Only one attempt has been made to create a shorter scale, resulting in the Italian FSFI-6, later translated into Spanish and Korean without further psychometric analysis. Our study evaluated whether a subset of items on the 19-item English-language FSFI would perform as well as the full-length FSFI in peri- and postmenopausal women. We used baseline data from 898 peri- and postmenopausal women recruited from multiple communities, ages 42-62 years, and enrolled in randomized controlled trials for vasomotor symptom management. Goals were to (1) create a psychometrically sound, shorter version of the FSFI for use in peri- and postmenopausal women as a continuous measure and (2) compare it to the Italian FSFI-6. Results indicated that a 9-item scale provided more information than the FSFI-6 across a spectrum of sexual functioning, was able to capture sample variability, and showed sufficient range without floor or ceiling effects. All but one of the items from the Italian 6-item version were included in the 9-item version. Most omitted FSFI items focused on frequency of events or experiences. When assessment of sexual function is a secondary endpoint and subject burden related to questionnaire length is a priority, the 9-item FSFI may provide important information about sexual function in English-speaking peri- and postmenopausal women.

  14. Diagnostic Utility of Craving in Predicting Nicotine Dependence: Impact of Craving Content and Item Stability

    PubMed Central

    2013-01-01

    Introduction: Craving is useful in the diagnosis of drug dependence, but it is unclear how various items used to assess craving might influence the diagnostic performance of craving measures. This study determined the diagnostic performance of individual items and item subgroups of the 32-item Questionnaire on Smoking Urges (QSU) as a function of item wording, level of craving intensity, and item stability. Methods: Nondaily and daily smokers (n = 222) completed the QSU on 6 separate occasions, and item responses were averaged across the administrations. Nicotine dependence was assessed with the Wisconsin Inventory of Smoking Dependence Motives. The discriminative performance of the QSU items was evaluated with receiver-operating characteristic curves and area under the curve statistics. Results: Although each of the QSU items and selected subgroups of items significantly discriminated dependent from nondependent smokers, certain item subgroups outperformed others. There was no difference in discriminative performance between use of the specific terms urge and crave or between items assessing intention to smoke relative to those assessing desire to smoke, but there were significant differences in the two major factors represented on the QSU and in craving items reflecting more intense relative to less intense craving. Stability of the item scores was strongly related to the discriminative performance of craving. Conclusions: Items indexing stable, high-intensity aspects of craving that reflect the negative reinforcing effects of smoking will likely be most useful for diagnostic purposes. Future directions and implications are discussed. PMID:23817585

  15. Diagnostic utility of craving in predicting nicotine dependence: impact of craving content and item stability.

    PubMed

    Germeroth, Lisa J; Wray, Jennifer M; Gass, Julie C; Tiffany, Stephen T

    2013-12-01

    Craving is useful in the diagnosis of drug dependence, but it is unclear how various items used to assess craving might influence the diagnostic performance of craving measures. This study determined the diagnostic performance of individual items and item subgroups of the 32-item Questionnaire on Smoking Urges (QSU) as a function of item wording, level of craving intensity, and item stability. Nondaily and daily smokers (n = 222) completed the QSU on 6 separate occasions, and item responses were averaged across the administrations. Nicotine dependence was assessed with the Wisconsin Inventory of Smoking Dependence Motives. The discriminative performance of the QSU items was evaluated with receiver-operating characteristic curves and area under the curve statistics. Although each of the QSU items and selected subgroups of items significantly discriminated dependent from nondependent smokers, certain item subgroups outperformed others. There was no difference in discriminative performance between use of the specific terms urge and crave or between items assessing intention to smoke relative to those assessing desire to smoke, but there were significant differences in the two major factors represented on the QSU and in craving items reflecting more intense relative to less intense craving. Stability of the item scores was strongly related to the discriminative performance of craving. Items indexing stable, high-intensity aspects of craving that reflect the negative reinforcing effects of smoking will likely be most useful for diagnostic purposes. Future directions and implications are discussed.

  16. Construct validity of the items on the Stroke Specific Quality of Life (SS-QOL) questionnaire that evaluate the participation component of the International Classification of Functioning, Disability and Health.

    PubMed

    Silva, Soraia Micaela; Corrêa, Fernanda Ishida; Pereira, Gabriela Santos; Faria, Christina Danielli Coelho de Morais; Corrêa, João Carlos Ferrari

    2018-01-01

    Analyze the construct validity and internal consistency of the Stroke Specific Quality of Life (SS-QOL) items that address the participation component of the ICF as well as analyze the ceiling and floor effects. One hundred subjects were analyzed: 85 community-dwelling and 15 institutionalized individuals. The analysis of construct validity was performed using classic psychometrics: (1) the comparison of known groups (individuals without restriction to participation vs. those with restriction to participation) using the Mann-Whitney test and (2) convergent validity - correlation between the scores on the SS-QOL items that address participation and the subscale scores of measures used to evaluate the similar constructs and concepts [the Short-Form Health Survey (SF-36), Functional Independence Measure (FIM) and grip strength test]. Spearman's correlation coefficients were calculated for this analysis. Cronbach's α was used for the analysis of internal consistency and both the ceiling and floor effects were analyzed. The level of significance for all analyses was α = 0.05. The a priori hypotheses regarding construct validity were partially demonstrated, as only five of the eight domains exhibited positive moderate to strong correlations (r > 0.40) with measures that address constructs similar to those addressed on the SS-QOL questionnaire. The items demonstrated adequate internal consistency and are capable of differentiating individuals with and without restriction to participation. The ceiling and floor effects were considered adequate for the total SS-QOL score, but beyond acceptable standards for some domains. The 26 items of the SS-QOL questionnaire measure a multidimensional construct and therefore do not only address participation. However, the items demonstrated adequate internal consistency and are capable of differentiating individuals with and without restriction to participation. Implications for rehabilitation The 26 items of the SS

  17. Development of Rasch-based item banks for the assessment of work performance in patients with musculoskeletal diseases.

    PubMed

    Mueller, Evelyn A; Bengel, Juergen; Wirtz, Markus A

    2013-12-01

    This study aimed to develop a self-description assessment instrument to measure work performance in patients with musculoskeletal diseases. In terms of the International Classification of Functioning, Disability and Health (ICF), work performance is defined as the degree of meeting the work demands (activities) at the actual workplace (environment). To account for the fact that work performance depends on the work demands of the job, we strived to develop item banks that allow a flexible use of item subgroups depending on the specific work demands of the patients' jobs. Item development included the collection of work tasks from literature and content validation through expert surveys and patient interviews. The resulting 122 items were answered by 621 patients with musculoskeletal diseases. Exploratory factor analysis to ascertain dimensionality and Rasch analysis (partial credit model) for each of the resulting dimensions were performed. Exploratory factor analysis resulted in four dimensions, and subsequent Rasch analysis led to the following item banks: 'impaired productivity' (15 items), 'impaired cognitive performance' (18), 'impaired coping with stress' (13) and 'impaired physical performance' (low physical workload 20 items, high physical workload 10 items). The item banks exhibited person separation indices (reliability) between 0.89 and 0.96. The assessment of work performance adds the activities component to the more commonly employed participation component of the ICF-model. The four item banks can be adapted to specific jobs where necessary without losing comparability of person measures, as the item banks are based on Rasch analysis.

  18. Evaluation of measurement equivalence of the Family Satisfaction with the End-of-Life Care in an ethnically diverse cohort: Tests of differential item functioning

    PubMed Central

    Teresi, Jeanne A; Ocepek-Welikson, Katja; Ramirez, Mildred; Kleinman, Marjorie; Ornstein, Katherine; Siu, Albert

    2016-01-01

    Background The Family Satisfaction with End-of-Life Care is an internationally used measure of satisfaction with cancer care. However, the Family Satisfaction with End-of-Life Care has not been studied for equivalence of item endorsement across different socio-demographic groups using differential item functioning. Aims The aims of this secondary data analysis were (1) to examine potential differential item functioning in the family satisfaction item set with respect to type of caregiver, race, and patient age, gender, and education and (2) to provide parameters and documentation of differential item functioning for an item bank. Design A mixed qualitative and quantitative analysis was conducted. A priori hypotheses regarding potential group differences in item response were established. Item response theory and Wald tests were used for the analyses of differential item functioning, accompanied by magnitude and impact measures. Results Very little significant differential item functioning was observed for patient's age and gender. For race, 13 items showed differential item functioning after multiple comparison adjustment, 10 with non-uniform differential item functioning. No items evidenced differential item functioning of high magnitude, and the impact was negligible. For education, 5 items evidenced uniform differential item functioning after adjustment, none of high magnitude. Differential item functioning impact was trivial. One item evidenced differential item functioning for the caregiver relationship variable. Conclusion Differential item functioning was observed primarily for race and education. No differential item functioning of high magnitude was observed for any item, and the overall impact of differential item functioning was negligible. One item, satisfaction with “the patient's pain relief,” might be singled out for further study, given that this item was both hypothesized and observed to show differential item functioning for race and education

  19. Caries Risk Assessment Item Importance

    PubMed Central

    Chaffee, B.W.; Featherstone, J.D.B.; Gansky, S.A.; Cheng, J.; Zhan, L.

    2016-01-01

    Caries risk assessment (CRA) is widely recommended for dental caries management. Little is known regarding how practitioners use individual CRA items to determine risk and which individual items independently predict clinical outcomes in children younger than 6 y. The objective of this study was to assess the relative importance of pediatric CRA items in dental providers’ decision making regarding patient risk and in association with clinically evident caries, cross-sectionally and longitudinally. CRA information was abstracted retrospectively from electronic patient records of children initially aged 6 to 72 mo at a university pediatric dentistry clinic (n = 3,810 baseline; n = 1,315 with follow-up). The 17-item CRA form included caries risk indicators, caries protective items, and clinical indicators. Conditional random forests classification trees were implemented to identify and assign variable importance to CRA items independently associated with baseline high-risk designation, baseline evident tooth decay, and follow-up evident decay. Thirteen individual CRA items, including all clinical indicators and all but 1 risk indicator, were independently and statistically significantly associated with student/resident providers’ caries risk designation. Provider-assigned baseline risk category was strongly associated with follow-up decay, which increased from low (20.4%) to moderate (30.6%) to high/extreme risk patients (68.7%). Of baseline CRA items, before adjustment, 12 were associated with baseline decay and 7 with decay at follow-up; however, in the conditional random forests models, only the clinical indicators (evident decay, dental plaque, and recent restoration placement) and 1 risk indicator (frequent snacking) were independently and statistically significantly associated with future disease, for which baseline evident decay was the strongest predictor. In this predominantly high-risk population under caries-preventive care, more individual CRA items

  20. Validation of the Spanish versions of the long (26 items) and short (12 items) forms of the Self-Compassion Scale (SCS).

    PubMed

    Garcia-Campayo, Javier; Navarro-Gil, Mayte; Andrés, Eva; Montero-Marin, Jesús; López-Artal, Lorena; Demarzo, Marcelo Marcos Piva

    2014-01-10

    Self-compassion is a key psychological construct for assessing clinical outcomes in mindfulness-based interventions. The aim of this study was to validate the Spanish versions of the long (26 item) and short (12 item) forms of the Self-Compassion Scale (SCS). The translated Spanish versions of both subscales were administered to two independent samples: Sample 1 was comprised of university students (n = 268) who were recruited to validate the long form, and Sample 2 was comprised of Aragon Health Service workers (n = 271) who were recruited to validate the short form. In addition to SCS, the Mindful Attention Awareness Scale (MAAS), the State-Trait Anxiety Inventory-Trait (STAI-T), the Beck Depression Inventory (BDI) and the Perceived Stress Questionnaire (PSQ) were administered. Construct validity, internal consistency, test-retest reliability and convergent validity were tested. The Confirmatory Factor Analysis (CFA) of the long and short forms of the SCS confirmed the original six-factor model in both scales, showing goodness of fit. Cronbach's α for the 26 item SCS was 0.87 (95% CI = 0.85-0.90) and ranged between 0.72 and 0.79 for the 6 subscales. Cronbach's α for the 12-item SCS was 0.85 (95% CI = 0.81-0.88) and ranged between 0.71 and 0.77 for the 6 subscales. The long (26-item) form of the SCS showed a test-retest coefficient of 0.92 (95% CI = 0.89-0.94). The Intraclass Correlation (ICC) for the 6 subscales ranged from 0.84 to 0.93. The short (12-item) form of the SCS showed a test-retest coefficient of 0.89 (95% CI: 0.87-0.93). The ICC for the 6 subscales ranged from 0.79 to 0.91. The long and short forms of the SCS exhibited a significant negative correlation with the BDI, the STAI and the PSQ, and a significant positive correlation with the MAAS. The correlation between the total score of the long and short SCS form was r = 0.92. The Spanish versions of the long (26-item) and short (12-item) forms of the SCS are valid and

  1. Validation of the Spanish versions of the long (26 items) and short (12 items) forms of the Self-Compassion Scale (SCS)

    PubMed Central

    2014-01-01

    (12-item) forms of the SCS are valid and reliable instruments for the evaluation of self-compassion among the general population. These results substantiate the use of this scale in research and clinical practice. PMID:24410742

  2. A Psychometric Theory of Evaluation of Item and Scale Translations: Fidelity across Languages.

    ERIC Educational Resources Information Center

    Hulin, Charles L.

    1987-01-01

    Addresses the problem of the equivalence of linguistically translated items that form measurement scales used to assess psychological traits or constructs in source and target cultures and languages. Outlines assessment procedures that are standardized but that also reflect cultural-specific concepts and values. (PS)

  3. Development of a Questionnaire Assessing School Physical Activity Environment

    ERIC Educational Resources Information Center

    Robertson-Wilson, Jennifer; Levesque, Lucie; Holden, Ronald R.

    2007-01-01

    This study was designed to develop the Questionnaire Assessing School Physical Activity Environment (Q--SPACE) based on student perceptions. Twenty-eight items rated on 4-point Likert scales were administered to 244 middle school students in 9 schools. Exploratory factor analysis was used to evaluate the underlying structure of the items and 2…

  4. Psychometrical assessment and item analysis of the General Health Questionnaire in victims of terrorism.

    PubMed

    Delgado-Gomez, David; Lopez-Castroman, Jorge; de Leon-Martinez, Victoria; Baca-Garcia, Enrique; Cabanas-Arrate, Maria Luisa; Sanchez-Gonzalez, Antonio; Aguado, David

    2013-03-01

    There is a need to assess the psychiatric morbidity that appears as a consequence of terrorist attacks. The General Health Questionnaire (GHQ) has been used to this end, but its psychometric properties have never been evaluated in a population affected by terrorism. A sample of 891 participants included 162 direct victims of terrorist attacks and 729 relatives of the victims. All participants were evaluated using the 28-item version of the GHQ (GHQ-28). We examined the reliability and external validity of scores on the scale using Cronbach's alpha and Pearson correlation with the State-Trait Anxiety Inventory (STAI), respectively. The factor structure of the scale was analyzed with varimax rotation. Samejima's (1969) graded response model was used to explore the item properties. The GHQ-28 scores showed good reliability and item-scale correlations. The factor analysis identified 3 factors: anxious-somatic symptoms, social dysfunction, and depression symptoms. All factors showed good correlation with the STAI. Before rotation, the first, second, and third factor explained 44.0%, 6.4%, and 5.0% of the variance, respectively. Varimax rotation redistributed the percentages of variance accounted for to 28.4%, 13.8%, and 13.2%, respectively. Items with the highest loadings in the first factor measured anxiety symptoms, whereas items with the highest loadings in the third factor measured suicide ideation. Samejima's model found that high scores in suicide-related items were associated with severe depression. The factor structure of the GHQ-28 found in this study underscores the preeminence of anxiety symptoms among victims of terrorism and their relatives. Item response analysis identified the most difficult and significant items for each factor. PsycINFO Database Record (c) 2013 APA, all rights reserved.

  5. Application of Group-Level Item Response Models in the Evaluation of Consumer Reports about Health Plan Quality

    ERIC Educational Resources Information Center

    Reise, Steven P.; Meijer, Rob R.; Ainsworth, Andrew T.; Morales, Leo S.; Hays, Ron D.

    2006-01-01

    Group-level parametric and non-parametric item response theory models were applied to the Consumer Assessment of Healthcare Providers and Systems (CAHPS[R]) 2.0 core items in a sample of 35,572 Medicaid recipients nested within 131 health plans. Results indicated that CAHPS responses are dominated by within health plan variation, and only weakly…

  6. Evaluating the validity of the Work Role Functioning Questionnaire (Canadian French version) using classical test theory and item response theory.

    PubMed

    Hong, Quan Nha; Coutu, Marie-France; Berbiche, Djamal

    2017-01-01

    The Work Role Functioning Questionnaire (WRFQ) was developed to assess workers' perceived ability to perform job demands and is used to monitor presenteeism. Still few studies on its validity can be found in the literature. The purpose of this study was to assess the items and factorial composition of the Canadian French version of the WRFQ (WRFQ-CF). Two measurement approaches were used to test the WRFQ-CF: Classical Test Theory (CTT) and non-parametric Item Response Theory (IRT). A total of 352 completed questionnaires were analyzed. A four-factor and three-factor model models were tested and shown respectively good fit with 14 items (Root Mean Square Error of Approximation (RMSEA) = 0.06, Standardized Root Mean Square Residual (SRMR) = 0.04, Bentler Comparative Fit Index (CFI) = 0.98) and with 17 items (RMSEA = 0.059, SRMR = 0.048, CFI = 0.98). Using IRT, 13 problematic items were identified, of which 9 were common with CTT. This study tested different models with fewer problematic items found in a three-factor model. Using a non-parametric IRT and CTT for item purification gave complementary results. IRT is still scarcely used and can be an interesting alternative method to enhance the quality of a measurement instrument. More studies are needed on the WRFQ-CF to refine its items and factorial composition.

  7. Item response theory - A first approach

    NASA Astrophysics Data System (ADS)

    Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar

    2017-07-01

    The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.

  8. Development and psychometric evaluation of a context-based parental self-efficacy instrument for healthy dietary and physical activity behaviors in preschool children.

    PubMed

    Bohman, Benjamin; Rasmussen, Finn; Ghaderi, Ata

    2016-10-20

    Parental self-efficacy (PSE) refers to beliefs of parents to effectively engage in behaviors that result in desired outcomes for their children. There are several instruments of PSE for promoting healthy dietary or physical activity (PA) behaviors in children. These measures typically assess PSE in relation to some quantity or frequency of behavior, for example, number of servings or times per week. However, measuring PSE in relation to contextual circumstances, for example, psychological states and situational demands, may be a more informative approach. The purpose of the present study was to develop and psychometrically evaluate a context-based PSE instrument. Swedish mothers of five-year-old children (n = 698) responded to the Parental Self-Efficacy for Healthy Dietary and Physical Activity Behaviors in Preschoolers Scale (PDAP) and a questionnaire on dietary and PA behaviors in children. Interviews were conducted to explore participant perceptions of the quality of the PDAP items. Psychometric evaluation was conducted using exploratory and confirmatory factor analyses. Spearman correlations between PSE and child behaviors were examined. Twenty-seven interviews were conducted with participants, who perceived the items as highly comprehensible, relevant and acceptable. A four-factor model of a revised 21-item version of the PDAP fitted the data, with different factors of PSE for promoting healthy dietary or PA behaviors in children depending on whether circumstances were facilitating or impeding successful performance. Internal consistency was excellent for total scale (Cronbach's α = .94), and good for factors (α = .84-.88). Correlations were in the expected direction: positive correlations between PSE and healthy behaviors, and negative correlations between PSE and unhealthy behaviors (all r s s ≤ .32). Psychometric evaluation of the PDAP provided preliminary support of construct validity and internal consistency.

  9. Can Less Be More? Comparison of an 8-Item Placement Quality Measure with the 50-Item Dundee Ready Educational Environment Measure (DREEM)

    ERIC Educational Resources Information Center

    Kelly, Martina; Bennett, Deirdre; Muijtjens, Arno; O'Flynn, Siun; Dornan, Tim

    2015-01-01

    Clinical clerks learn more than they are taught and not all they learn can be measured. As a result, curriculum leaders evaluate clinical educational environments. The quantitative Dundee Ready Environment Measure (DREEM) is a "de facto" standard for that purpose. Its 50 items and 5 subscales were developed by consensus. Reasoning that…

  10. Evaluation of a four-item DSM-5 Limited Prosocial Emotions specifier scale within and across settings with Spanish children.

    PubMed

    Seijas, Raquel; Servera, Mateu; García-Banda, Gloria; Barry, Christopher T; Burns, G Leonard

    2018-04-01

    The objective was to evaluate a 4-item measure of the DSM-5 Limited Prosocial Emotions (LPE) specifier (a 4-item measure of prosocial emotions). Mothers, fathers, primary teachers, and ancillary teachers completed measures of prosocial emotions (PE), oppositional defiant disorder (ODD), attention-deficit/hyperactivity disorder (ADHD)-inattention (IN), ADHD-hyperactivity/impulsivity (HI), academic and social impairment on 811 Spanish first-grade children (46% girls). Confirmatory factor and structural regression analyses showed PE symptom scores to have (a) good reliability for the 4 sources (80% to 89% true score variance), (b) invariance of like-symptom loadings and intercepts across the 4 sources, (c) strong convergent and discriminant validity within home and school settings, (d) no convergent validity across settings, and (e) associations with academic and social impairment independent of the ODD dimension (the unique effects of PE also remained significant after controlling for ODD, ADHD-IN, and ADHD-HI for mothers and ancillary teachers). A graded response item response theory analysis indicated that PE scores provided an accurate measure of the PE trait across a wide trait range and especially at low PE trait levels (i.e., scores in the clinical range). Findings also supported the DSM-5 diagnostic criteria of 2 or more LPE symptoms in 2 or more settings (e.g., high levels of the LPE trait were associated with the occurrence of 2 or more symptoms with 4% of the sample showing 2 or more symptoms in both settings). Although additional studies are still required, the PE measure appears useful as a brief measure of the LPE specifier. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  11. An item response theory evaluation of the young mania rating scale and the montgomery-asberg depression rating scale in the systematic treatment enhancement program for bipolar disorder (STEP-BD).

    PubMed

    Prisciandaro, James J; Tolliver, Bryan K

    2016-11-15

    The Young Mania Rating Scale (YMRS) and Montgomery-Asberg Depression Rating Scale (MADRS) are among the most widely used outcome measures for clinical trials of medications for Bipolar Disorder (BD). Nonetheless, very few studies have examined the measurement characteristics of the YMRS and MADRS in individuals with BD using modern psychometric methods. The present study evaluated the YMRS and MADRS in the Systematic Treatment Enhancement Program for BD (STEP-BD) study using Item Response Theory (IRT). Baseline data from 3716 STEP-BD participants were available for the present analysis. The Graded Response Model (GRM) was fit separately to YMRS and MADRS item responses. Differential item functioning (DIF) was examined by regressing a variety of clinically relevant covariates (e.g., sex, substance dependence) on all test items and on the latent symptom severity dimension, within each scale. Both scales: 1) contained several items that provided little or no psychometric information, 2) were inefficient, in that the majority of item response categories did not provide incremental psychometric information, 3) poorly measured participants outside of a narrow band of severity, 4) evidenced DIF for nearly all items, suggesting that item responses were, in part, determined by factors other than symptom severity. Limited to outpatients; DIF analysis only sensitive to certain forms of DIF. The present study provides evidence for significant measurement problems involving the YMRS and MADRS. More work is needed to refine these measures and/or develop suitable alternative measures of BD symptomatology for clinical trials research. Copyright © 2016 Elsevier B.V. All rights reserved.

  12. Developing an item bank to measure economic quality of life for individuals with disabilities.

    PubMed

    Tulsky, David S; Kisala, Pamela A; Lai, Jin-Shei; Carlozzi, Noelle; Hammel, Joy; Heinemann, Allen W

    2015-04-01

    To develop and evaluate the psychometric properties of an item set measuring economic quality of life (QOL) for use by individuals with disabilities. Survey. Community settings. Individuals with disabilities completed individual interviews (n=64), participated in focus groups (n=172), and completed cognitive interviews (n=15). Inclusion criteria included the following: traumatic brain injury, spinal cord injury, or stroke; age ≥18 years; and ability to read and speak English. We calibrated the items with 305 former rehabilitation inpatients. None. Economic QOL. Confirmatory factor analysis showed acceptable fit indices (comparative fit index=.939, root mean square error of approximation=.089) for the 37 items. However, 3 items demonstrated local item dependence. Dropping 9 items improved fit and obviated local dependence. Rasch analysis of the remaining 28 items yielded a person reliability of .92, suggesting that these items discriminate about 4 economic QOL levels. We developed a 28-item bank that measures economic aspects of QOL. Preliminary confirmatory factor analysis and Rasch analysis results support the psychometric properties of this new measure. It fills a gap in health-related QOL measurement by describing the economic barriers and facilitators of community participation. Future development will make the item bank available as a computer adaptive test. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  13. Benthic marine debris, with an emphasis on fishery-related items, surrounding Kodiak Island, Alaska, 1994-1996

    USGS Publications Warehouse

    Hess, N.A.; Ribic, C.A.; Vining, I.

    1999-01-01

    Composition and abundance of benthic marine debris were investigated during three bottom trawl surveys in inlet and offshore locations surrounding Kodiak Island, Alaska, 1994-1996. Debris items were primarily plastic and metal regardless of trawl location. Plastic bait jars, fishing line, and crab pots were the most common fishery-related debris items and were encountered in large amounts in inlets (20-25 items km-2), but were less abundant outside of inlets (4.5-11 items km-2). Overall density of debris was also significantly greater in inlets than outside of inlets. Plastic debris densities in inlets ranged 22-31.5 items km-2, 7.8-18.8 items km-2 outside of inlets. Trawls in inlets contained almost as much metal debris as plastic debris. Density of metal debris ranged from 21.2 to 23.7 items km-2 in inlets, a maximum of 2.7 items km-2 outside of inlets. Inlets around the town of Kodiak had the highest densities of fishery-related and total benthic debris. Differences in benthic debris density between inlets and outside of inlets and differences by area may be due to differences in fishing activity and water circulation patterns. At the current reduced levels of fishing activity, however, yearly monitoring of benthic debris appears unnecessary. Copyright (C) 1999.

  14. Computerized Adaptive Testing with Item Clones. Research Report.

    ERIC Educational Resources Information Center

    Glas, Cees A. W.; van der Linden, Wim J.

    To reduce the cost of item writing and to enhance the flexibility of item presentation, items can be generated by item-cloning techniques. An important consequence of cloning is that it may cause variability on the item parameters. Therefore, a multilevel item response model is presented in which it is assumed that the item parameters of a…

  15. 48 CFR 52.212-3 - Offeror Representations and Certifications-Commercial Items.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... that is to be used specifically— (i) To restrict the free flow of unbiased information in Iran; or (ii... Products: Line Item No.: Country of Origin: (List as necessary) (3) The Government will evaluate offers in... evaluate offers in accordance with the policies and procedures of FAR Part 25. (2) Buy American—Free Trade...

  16. Investigating Separate and Concurrent Approaches for Item Parameter Drift in 3PL Item Response Theory Equating

    ERIC Educational Resources Information Center

    Arce-Ferrer, Alvaro J.; Bulut, Okan

    2017-01-01

    This study examines separate and concurrent approaches to combine the detection of item parameter drift (IPD) and the estimation of scale transformation coefficients in the context of the common item nonequivalent groups design with the three-parameter item response theory equating. The study uses real and synthetic data sets to compare the two…

  17. Development and validation of a 21-item challenges to stopping smoking (CSS-21) scale

    PubMed Central

    Thomas, Dennis; Mackinnon, Andrew J; Bonevski, Billie; Abramson, Michael J; Taylor, Simone; Poole, Susan G; Weeks, Gregory R; Dooley, Michael J; George, Johnson

    2016-01-01

    Objective Identification of challenges associated with quitting and overcoming them may improve cessation outcomes. This study describes the development and initial validation of a scale for measuring challenges to stopping smoking. Methods The item pool was generated from empirical and theoretical literature and existing scales, expert opinion and interviews with smokers and ex-smokers. The questionnaire was administered to smokers and recent quitters who participated in a hospital-based smoking cessation trial. Exploratory factor analysis was performed to identify subscales in the questionnaire. Internal consistency, validity and robustness of the subscales were evaluated. Results Of a total of 182 participants with a mean age of 55 years (SD 12.8), 128 (70.3%) were current smokers and 54 (29.7%) ex-smokers. Factor analysis of the 21-item questionnaire resulted in a 2-factor solution representing items measuring intrinsic (9 items) and extrinsic (12 items) challenges. This structure was stable in various analyses and the 2 factors accounted for 50.7% of the total variance of the polychoric correlations between the items. Internal consistency (Cronbach's α) coefficients for the intrinsic and extrinsic subscales were 0.86 and 0.82, respectively. Compared with ex-smokers, current smokers had a higher mean score (±SD) for intrinsic (24.0±6.4 vs 20.5±7.4, p=0.002) and extrinsic subscales (22.3±7.5 vs 18.6±6.0, p=0.001). Conclusions Initial evaluation suggests that the 21-item challenges to stopping smoking scale is a valid and reliable instrument that can be used in research and clinical settings to assess challenges to stopping smoking. PMID:27033963

  18. Neural Correlates of Encoding Within- and Across-Domain Inter-Item Associations

    PubMed Central

    Park, Heekyeong; Rugg, Michael D.

    2012-01-01

    The neural correlates of the encoding of associations between pairs of words, pairs of pictures, and word-picture pairs were compared. The aims were to determine first, whether the neural correlates of associative encoding vary according to study material and second, whether encoding of across- versus within-material item pairs is associated with dissociable patterns of hippocampal and perirhinal activity, as predicted by the ‘domain dichotomy’ hypothesis of medial temporal lobe (MTL) function. While undergoing fMRI scanning, subjects (n = 24) were presented with the three classes of study pairs, judging which of the denoted objects fit into the other. Outside of the scanner, subjects then undertook an associative recognition task, discriminating between intact study pairs, rearranged pairs comprising items that had been presented on different study trials, and unstudied item pairs. The neural correlates of successful associative encoding – subsequent associative memory effects – were operationalized as the difference in activity between study pairs correctly judged intact versus pairs incorrectly judged rearranged on the subsequent memory test. Pair type-independent subsequent memory effects were evident in the left inferior frontal gyrus (IFG) and the hippocampus. Picture-picture pairs elicited material-selective effects in regions of fusiform cortex that were also activated to a greater extent on picture trials than word trials, while word-word pairs elicited material-selective subsequent memory effects in left lateral temporal cortex. Contrary to the domain-dichotomy hypothesis, neither hippocampal nor perirhinal subsequent memory effects differed depending on whether they were elicited by within- versus across-material study pairs. It is proposed that the left IFG plays a domain-general role in associative encoding, that associative encoding can also be facilitated by enhanced processing in material-selective cortical regions, and that the hippocampus

  19. Measuring the quality of life in hypertension according to Item Response Theory

    PubMed Central

    Borges, José Wicto Pereira; Moreira, Thereza Maria Magalhães; Schmitt, Jeovani; de Andrade, Dalton Francisco; Barbetta, Pedro Alberto; de Souza, Ana Célia Caetano; Lima, Daniele Braz da Silva; Carvalho, Irialda Saboia

    2017-01-01

    ABSTRACT OBJECTIVE To analyze the Miniquestionário de Qualidade de Vida em Hipertensão Arterial (MINICHAL – Mini-questionnaire of Quality of Life in Hypertension) using the Item Response Theory. METHODS This is an analytical study conducted with 712 persons with hypertension treated in thirteen primary health care units of Fortaleza, State of Ceará, Brazil, in 2015. The steps of the analysis by the Item Response Theory were: evaluation of dimensionality, estimation of parameters of items, and construction of scale. The study of dimensionality was carried out on the polychoric correlation matrix and confirmatory factor analysis. To estimate the item parameters, we used the Gradual Response Model of Samejima. The analyses were conducted using the free software R with the aid of psych and mirt. RESULTS The analysis has allowed the visualization of item parameters and their individual contributions in the measurement of the latent trait, generating more information and allowing the construction of a scale with an interpretative model that demonstrates the evolution of the worsening of the quality of life in five levels. Regarding the item parameters, the items related to the somatic state have had a good performance, as they have presented better power to discriminate individuals with worse quality of life. The items related to mental state have been those which contributed with less psychometric data in the MINICHAL. CONCLUSIONS We conclude that the instrument is suitable for the identification of the worsening of the quality of life in hypertension. The analysis of the MINICHAL using the Item Response Theory has allowed us to identify new sides of this instrument that have not yet been addressed in previous studies. PMID:28492764

  20. A Novel Teaching Tool Combined With Active-Learning to Teach Antimicrobial Spectrum Activity.

    PubMed

    MacDougall, Conan

    2017-03-25

    Objective. To design instructional methods that would promote long-term retention of knowledge of antimicrobial pharmacology, particularly the spectrum of activity for antimicrobial agents, in pharmacy students. Design. An active-learning approach was used to teach selected sessions in a required antimicrobial pharmacology course. Students were expected to review key concepts from the course reader prior to the in-class sessions. During class, brief concept reviews were followed by active-learning exercises, including a novel schematic method for learning antimicrobial spectrum of activity ("flower diagrams"). Assessment. At the beginning of the next quarter (approximately 10 weeks after the in-class sessions), 360 students (three yearly cohorts) completed a low-stakes multiple-choice examination on the concepts in antimicrobial spectrum of activity. When data for students was pooled across years, the mean number of correct items was 75.3% for the items that tested content delivered with the active-learning method vs 70.4% for items that tested content delivered via traditional lecture (mean difference 4.9%). Instructor ratings on student evaluations of the active-learning approach were high (mean scores 4.5-4.8 on a 5-point scale) and student comments were positive about the active-learning approach and flower diagrams. Conclusion. An active-learning approach led to modestly higher scores in a test of long-term retention of pharmacology knowledge and was well-received by students.

  1. A Novel Teaching Tool Combined With Active-Learning to Teach Antimicrobial Spectrum Activity

    PubMed Central

    2017-01-01

    Objective. To design instructional methods that would promote long-term retention of knowledge of antimicrobial pharmacology, particularly the spectrum of activity for antimicrobial agents, in pharmacy students. Design. An active-learning approach was used to teach selected sessions in a required antimicrobial pharmacology course. Students were expected to review key concepts from the course reader prior to the in-class sessions. During class, brief concept reviews were followed by active-learning exercises, including a novel schematic method for learning antimicrobial spectrum of activity (“flower diagrams”). Assessment. At the beginning of the next quarter (approximately 10 weeks after the in-class sessions), 360 students (three yearly cohorts) completed a low-stakes multiple-choice examination on the concepts in antimicrobial spectrum of activity. When data for students was pooled across years, the mean number of correct items was 75.3% for the items that tested content delivered with the active-learning method vs 70.4% for items that tested content delivered via traditional lecture (mean difference 4.9%). Instructor ratings on student evaluations of the active-learning approach were high (mean scores 4.5-4.8 on a 5-point scale) and student comments were positive about the active-learning approach and flower diagrams. Conclusion. An active-learning approach led to modestly higher scores in a test of long-term retention of pharmacology knowledge and was well-received by students. PMID:28381885

  2. A Rasch Differential Item Functioning Analysis of the Massachusetts Youth Screening Instrument: Identifying Race and Gender Differential Item Functioning among Juvenile Offenders

    ERIC Educational Resources Information Center

    Cauffman, Elizabeth; MacIntosh, Randall

    2006-01-01

    The juvenile justice system needs a tool that can identify and assess mental health problems among youths quickly with validity and reliability. The goal of this article is to evaluate the racial/ethnic and gender differential item functioning (DIF) of the Massachusetts Youth Screening Instrument-Second Version (MAYSI-2) using the Rasch Model.…

  3. 76 FR 60474 - Commercial Item Handbook

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-09-29

    ... DEPARTMENT OF DEFENSE Defense Acquisition Regulations System Commercial Item Handbook AGENCY.... SUMMARY: DoD has updated its Commercial Item Handbook. The purpose of the Handbook is to help acquisition personnel develop sound business strategies for procuring commercial items. DoD is seeking industry input on...

  4. Item Pool Construction Using Mixed Integer Quadratic Programming (MIQP). GMAC® Research Report RR-14-01

    ERIC Educational Resources Information Center

    Han, Kyung T.; Rudner, Lawrence M.

    2014-01-01

    This study uses mixed integer quadratic programming (MIQP) to construct multiple highly equivalent item pools simultaneously, and compares the results from mixed integer programming (MIP). Three different MIP/MIQP models were implemented and evaluated using real CAT item pool data with 23 different content areas and a goal of equal information…

  5. Item response theory and the measurement of motor behavior.

    PubMed

    Safrit, M J; Cohen, A S; Costa, M G

    1989-12-01

    Item response theory (IRT) has been the focus of intense research and development activity in educational and psychological measurement during the past decade. Because this theory can provide more precise information about test items than other theories usually used in measuring motor behavior, the application of IRT in physical education and exercise science merits investigation. In IRT, the difficulty level of each item (e.g., trial or task) can be estimated and placed on the same scale as the ability of the examinee. Using this information, the test developer can determine the ability levels at which the test functions best. Equating the scores of individuals on two or more items or tests can be handled efficiently by applying IRT. The precision of the identification of performance standards in a mastery test context can be enhanced, as can adaptive testing procedures. In this tutorial, several potential benefits of applying IRT to the measurement of motor behavior were described. An example is provided using bowling data and applying the graded-response form of the Rasch IRT model. The data were calibrated and the goodness of fit was examined. This analysis is described in a step-by-step approach. Limitations to using an IRT model with a test consisting of repeated measures were noted.

  6. [Validity and reliability of a scale to assess self-efficacy for physical activity in elderly].

    PubMed

    Borges, Rossana Arruda; Rech, Cassiano Ricardo; Meurer, Simone Teresinha; Benedetti, Tânia Rosane Bertoldo

    2015-04-01

    This study aimed to analyze the confirmatory factor validity and reliability of a self-efficacy scale for physical activity in a sample of 118 elderly (78% women) from 60 to 90 years of age. Mplus 6.1 was used to evaluate the confirmatory factor analysis. Reliability was tested by internal consistency and temporal stability. The original scale consisted of five items with dichotomous answers (yes/no), independently for walking and moderate and vigorous physical activity. The analysis excluded the item related to confidence in performing physical activities when on vacation. Two constructs were identified, called "self-efficacy for walking" and "self-efficacy for moderate and vigorous physical activity", with a factor load ≥ 0.50. Internal consistency was adequate both for walking (> 0.70) and moderate and vigorous physical activity (> 0.80), and temporal stability was adequate for all the items. In conclusion, the self-efficacy scale for physical activity showed adequate validity, reliability, and internal consistency for evaluating this construct in elderly Brazilians.

  7. Generalized Full-Information Item Bifactor Analysis

    PubMed Central

    Cai, Li; Yang, Ji Seung; Hansen, Mark

    2011-01-01

    Full-information item bifactor analysis is an important statistical method in psychological and educational measurement. Current methods are limited to single group analysis and inflexible in the types of item response models supported. We propose a flexible multiple-group item bifactor analysis framework that supports a variety of multidimensional item response theory models for an arbitrary mixing of dichotomous, ordinal, and nominal items. The extended item bifactor model also enables the estimation of latent variable means and variances when data from more than one group are present. Generalized user-defined parameter restrictions are permitted within or across groups. We derive an efficient full-information maximum marginal likelihood estimator. Our estimation method achieves substantial computational savings by extending Gibbons and Hedeker’s (1992) bifactor dimension reduction method so that the optimization of the marginal log-likelihood only requires two-dimensional integration regardless of the dimensionality of the latent variables. We use simulation studies to demonstrate the flexibility and accuracy of the proposed methods. We apply the model to study cross-country differences, including differential item functioning, using data from a large international education survey on mathematics literacy. PMID:21534682

  8. 7 CFR 2902.5 - Item designation.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ..., USDA will use life cycle cost information only from tests using the BEES analytical method. (c... availability of such items and the economic and technological feasibility of using such items, including life cycle costs. USDA will gather information on individual products within an item and extrapolate that...

  9. Cognitive diagnosis modelling incorporating item response times.

    PubMed

    Zhan, Peida; Jiao, Hong; Liao, Dandan

    2018-05-01

    To provide more refined diagnostic feedback with collateral information in item response times (RTs), this study proposed joint modelling of attributes and response speed using item responses and RTs simultaneously for cognitive diagnosis. For illustration, an extended deterministic input, noisy 'and' gate (DINA) model was proposed for joint modelling of responses and RTs. Model parameter estimation was explored using the Bayesian Markov chain Monte Carlo (MCMC) method. The PISA 2012 computer-based mathematics data were analysed first. These real data estimates were treated as true values in a subsequent simulation study. A follow-up simulation study with ideal testing conditions was conducted as well to further evaluate model parameter recovery. The results indicated that model parameters could be well recovered using the MCMC approach. Further, incorporating RTs into the DINA model would improve attribute and profile correct classification rates and result in more accurate and precise estimation of the model parameters. © 2017 The British Psychological Society.

  10. Criterion-Referenced Test Items for Welding.

    ERIC Educational Resources Information Center

    Davis, Diane, Ed.

    This test item bank on welding contains test questions based upon competencies found in the Missouri Welding Competency Profile. Some test items are keyed for multiple competencies. These criterion-referenced test items are designed to work with the Vocational Instructional Management System. Questions have been statistically sampled and validated…

  11. Generalized Full-Information Item Bifactor Analysis

    ERIC Educational Resources Information Center

    Cai, Li; Yang, Ji Seung; Hansen, Mark

    2011-01-01

    Full-information item bifactor analysis is an important statistical method in psychological and educational measurement. Current methods are limited to single-group analysis and inflexible in the types of item response models supported. We propose a flexible multiple-group item bifactor analysis framework that supports a variety of…

  12. A Review of the Effects on IRT Item Parameter Estimates with a Focus on Misbehaving Common Items in Test Equating

    PubMed Central

    Michaelides, Michalis P.

    2010-01-01

    Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items. PMID:21833230

  13. A Review of the Effects on IRT Item Parameter Estimates with a Focus on Misbehaving Common Items in Test Equating.

    PubMed

    Michaelides, Michalis P

    2010-01-01

    Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items.

  14. An Item Bank to Measure Systems, Services, and Policies: Environmental Factors Affecting People With Disabilities.

    PubMed

    Lai, Jin-Shei; Hammel, Joy; Jerousek, Sara; Goldsmith, Arielle; Miskovic, Ana; Baum, Carolyn; Wong, Alex W; Dashner, Jessica; Heinemann, Allen W

    2016-12-01

    To develop a measure of perceived systems, services, and policies facilitators (see Chapter 5 of the International Classification of Functioning, Disability and Health) for people with neurologic disabilities and to evaluate the effect of perceived systems, services, and policies facilitators on health-related quality of life. Qualitative approaches to develop and refine items. Confirmatory factor analysis including 1-factor confirmatory factor analysis and bifactor analysis to evaluate unidimensionality of items. Rasch analysis to identify misfitting items. Correlational and analysis of variance methods to evaluate construct validity. Community-dwelling individuals participated in telephone interviews or traveled to the academic medical centers where this research took place. Participants (N=571) had a diagnosis of spinal cord injury, stroke, or traumatic brain injury. They were 18 years or older and English speaking. Not applicable. An item bank to evaluate environmental access and support levels of services, systems, and policies for people with disabilities. We identified a general factor defined as "access and support levels of the services, systems, and policies at the level of community living" and 3 local factors defined as "health services," "community living," and "community resources." The systems, services, and policies measure correlated moderately with participation measures: Community Participation Indicators (CPI) - Involvement, CPI - Control over Participation, Quality of Life in Neurological Disorders - Ability to Participate, Quality of Life in Neurological Disorders - Satisfaction with Role Participation, Patient-Reported Outcomes Measurement Information System (PROMIS) Ability to Participate, PROMIS Satisfaction with Role Participation, and PROMIS Isolation. The measure of systems, services, and policies facilitators contains items pertaining to health services, community living, and community resources. Investigators and clinicians can measure

  15. Can Item Analysis of MCQs Accomplish the Need of a Proper Assessment Strategy for Curriculum Improvement in Medical Education?

    ERIC Educational Resources Information Center

    Pawade, Yogesh R.; Diwase, Dipti S.

    2016-01-01

    Item analysis of Multiple Choice Questions (MCQs) is the process of collecting, summarizing and utilizing information from students' responses to evaluate the quality of test items. Difficulty Index (p-value), Discrimination Index (DI) and Distractor Efficiency (DE) are the parameters which help to evaluate the quality of MCQs used in an…

  16. Advising on Preferred Reporting Items for patient-reported outcome instrument development: the PRIPROID.

    PubMed

    Hou, Zheng-Kun; Liu, Feng-Bin; Fang, Ji-Qian; Li, Xiao-Ying; Li, Li-Juan; Lin, Chu-Hua

    2013-03-01

    The reporting of patient-reported outcomes (PRO) instrument development is vital for both researchers and clinicians to determine its validity, thus, we propose the Preferred Reporting Items for PRO Instrument Development (PRIPROID) to improve the quality of reports. Abiding by the guidance published by the Enhancing the QUAlity and Transparency Of health Research (EQUATOR) Network, we had performed 6 steps for items development: identified the need for a guideline, performed a literature review, obtained funding for the guideline initiative, identified participants, conducted a Delphi exercise and generated a list of PRIPROID items for consideration at the face-to-face meeting. Twenty three items subheadings under 7 topics were included: title and structured abstract, rationale, objectives, intention, eligibility criteria, conceptual framework, items generation, response options, scoring, times, administrative modes, burden assessment, properties assessment, statistical methods, participants, main results, and additional analysis, summary of evidence, limitations, clinical attentions, and conclusions, item pools or final form, and funding. The PRIPROID contains many elements of the PRO research, and this assists researchers to report their results more accurately and to a certain degree use this instrument to evaluate the quality of the research methods.

  17. Are Faculty Predictions or Item Taxonomies Useful for Estimating the Outcome of Multiple-Choice Examinations?

    ERIC Educational Resources Information Center

    Kibble, Jonathan D.; Johnson, Teresa

    2011-01-01

    The purpose of this study was to evaluate whether multiple-choice item difficulty could be predicted either by a subjective judgment by the question author or by applying a learning taxonomy to the items. Eight physiology faculty members teaching an upper-level undergraduate human physiology course consented to participate in the study. The…

  18. [Inter-rater concordance of the "Nursing Activities Score" in intensive care].

    PubMed

    Valls-Matarín, Josefa; Salamero-Amorós, Maria; Roldán-Gil, Carmen; Quintana-Riera, Salvador

    2015-01-01

    To evaluate inter-rater concordance in the valuation of the "Nursing Activities Score". Cross-sectional descriptive study conducted from December 2012 until June 2013 in a general intensive care unit with twelve beds. Three evaluator nurses, simultaneously and independently, through the patient daily charts, scored the nursing workload using Nursing Activities Score scale in all patients admitted over 18 years old. Three hundreds and thirty-nine records were collected. The intra-class correlation coefficient (ICC) between evaluators was 0.92 (0.89-0.94). A perfect concordance was obtained in 39.1% of the items, with 52.2% having a high, and 8.7% having lower concordance, corresponding to two of the items with multiple scoring options. Significant differences between two of the evaluators (P=.049) were found. Although the inter-rater concordance was high, more accurate records are needed to reduce the variability of the items with multiple options and to allow more accuracy in the interpretation and measurement of the data regarding nursing workload. Copyright © 2015 Elsevier España, S.L.U. All rights reserved.

  19. Item Response Theory Analyses of the Cambridge Face Memory Test (CFMT)

    PubMed Central

    Cho, Sun-Joo; Wilmer, Jeremy; Herzmann, Grit; McGugin, Rankin; Fiset, Daniel; Van Gulick, Ana E.; Ryan, Katie; Gauthier, Isabel

    2014-01-01

    We evaluated the psychometric properties of the Cambridge face memory test (CFMT; Duchaine & Nakayama, 2006). First, we assessed the dimensionality of the test with a bi-factor exploratory factor analysis (EFA). This EFA analysis revealed a general factor and three specific factors clustered by targets of CFMT. However, the three specific factors appeared to be minor factors that can be ignored. Second, we fit a unidimensional item response model. This item response model showed that the CFMT items could discriminate individuals at different ability levels and covered a wide range of the ability continuum. We found the CFMT to be particularly precise for a wide range of ability levels. Third, we implemented item response theory (IRT) differential item functioning (DIF) analyses for each gender group and two age groups (Age ≤ 20 versus Age > 21). This DIF analysis suggested little evidence of consequential differential functioning on the CFMT for these groups, supporting the use of the test to compare older to younger, or male to female, individuals. Fourth, we tested for a gender difference on the latent facial recognition ability with an explanatory item response model. We found a significant but small gender difference on the latent ability for face recognition, which was higher for women than men by 0.184, at age mean 23.2, controlling for linear and quadratic age effects. Finally, we discuss the practical considerations of the use of total scores versus IRT scale scores in applications of the CFMT. PMID:25642930

  20. 75 FR 63695 - Designation of Biobased Items for Federal Procurement

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-10-18

    ... manufacturing in rural communities; and to enhance the Nation's energy security by substituting biobased... items used in products or systems designed or procured for combat or combat-related missions, which will... least part of its environmental information responsibilities. The BEES tool is designed to evaluate...