ERIC Educational Resources Information Center
Scheuneman, Janice Dowd; Gerritz, Kalle
1990-01-01
Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)
ERIC Educational Resources Information Center
Ong, Yoke Mooi; Williams, Julian; Lamprianou, Iasonas
2013-01-01
Researchers interested in exploring substantive group differences are increasingly attending to bundles of items (or testlets): the aim is to understand how gender differences, for instance, are explained by differential performances on different types or bundles of items, hence differential bundle functioning (DBF). Some previous work has…
Examining Differential Math Performance by Gender and Opportunity to Learn
ERIC Educational Resources Information Center
Albano, Anthony D.; Rodriguez, Michael C.
2013-01-01
Although a substantial amount of research has been conducted on differential item functioning in testing, studies have focused on detecting differential item functioning rather than on explaining how or why it may occur. Some recent work has explored sources of differential functioning using explanatory and multilevel item response models. This…
Exploring Crossing Differential Item Functioning by Gender in Mathematics Assessment
ERIC Educational Resources Information Center
Ong, Yoke Mooi; Williams, Julian; Lamprianou, Iasonas
2015-01-01
The purpose of this article is to explore crossing differential item functioning (DIF) in a test drawn from a national examination of mathematics for 11-year-old pupils in England. An empirical dataset was analyzed to explore DIF by gender in a mathematics assessment. A two-step process involving the logistic regression (LR) procedure for…
Improving measurement of injection drug risk behavior using item response theory.
Janulis, Patrick
2014-03-01
Recent research highlights the multiple steps to preparing and injecting drugs and the resultant viral threats faced by drug users. This research suggests that more sensitive measurement of injection drug HIV risk behavior is required. In addition, growing evidence suggests there are gender differences in injection risk behavior. However, the potential for differential item functioning between genders has not been explored. To explore item response theory as an improved measurement modeling technique that provides empirically justified scaling of injection risk behavior and to examine for potential gender-based differential item functioning. Data is used from three studies in the National Institute on Drug Abuse's Criminal Justice Drug Abuse Treatment Studies. A two-parameter item response theory model was used to scale injection risk behavior and logistic regression was used to examine for differential item functioning. Item fit statistics suggest that item response theory can be used to scale injection risk behavior and these models can provide more sensitive estimates of risk behavior. Additionally, gender-based differential item functioning is present in the current data. Improved measurement of injection risk behavior using item response theory should be encouraged as these models provide increased congruence between construct measurement and the complexity of injection-related HIV risk. Suggestions are made to further improve injection risk behavior measurement. Furthermore, results suggest direct comparisons of composite scores between males and females may be misleading and future work should account for differential item functioning before comparing levels of injection risk behavior.
ERIC Educational Resources Information Center
Banerjee, Jayanti; Papageorgiou, Spiros
2016-01-01
The research reported in this article investigates differential item functioning (DIF) in a listening comprehension test. The study explores the relationship between test-taker age and the items' language domains across multiple test forms. The data comprise test-taker responses (N = 2,861) to a total of 133 unique items, 46 items of which were…
ERIC Educational Resources Information Center
Keiffer, Elizabeth Ann
2011-01-01
A differential item functioning (DIF) simulation study was conducted to explore the type and level of impact that contamination had on type I error and power rates in DIF analyses when the suspect item favored the same or opposite group as the DIF items in the matching subtest. Type I error and power rates were displayed separately for the…
Factor Structure and Reliability of Test Items for Saudi Teacher Licence Assessment
ERIC Educational Resources Information Center
Alsadaawi, Abdullah Saleh
2017-01-01
The Saudi National Assessment Centre administers the Computer Science Teacher Test for teacher certification. The aim of this study is to explore gender differences in candidates' scores, and investigate dimensionality, reliability, and differential item functioning using confirmatory factor analysis and item response theory. The confirmatory…
ERIC Educational Resources Information Center
Sukin, Tia M.
2010-01-01
The presence of outlying anchor items is an issue faced by many testing agencies. The decision to retain or remove an item is a difficult one, especially when the content representation of the anchor set becomes questionable by item removal decisions. Additionally, the reason for the aberrancy is not always clear, and if the performance of the…
ERIC Educational Resources Information Center
Zhang, Yanling; Dorans, Neil J.; Matthews-López, Joy L.
2005-01-01
Statistical procedures for detecting differential item functioning (DIF) are often used as an initial step to screen items for construct irrelevant variance. This research applies a DIF dissection method and a two-way classification scheme to SAT Reasoning Test™ verbal section data and explores the effects of deleting sizable DIF items on reported…
Modeling the Discrimination Power of Physics Items
ERIC Educational Resources Information Center
Mesic, Vanes
2011-01-01
For the purposes of tailoring physics instruction in accordance with the needs and abilities of the students it is useful to explore the knowledge structure of students of different ability levels. In order to precisely differentiate the successive, characteristic states of student achievement it is necessary to use test items that possess…
Detecting Gender Bias Through Test Item Analysis
NASA Astrophysics Data System (ADS)
González-Espada, Wilson J.
2009-03-01
Many physical science and physics instructors might not be trained in pedagogically appropriate test construction methods. This could lead to test items that do not measure what they are intended to measure. A subgroup of these items might show bias against some groups of students. This paper describes how the author became aware of potentially biased items against females in his examinations, which led to the exploration of fundamental issues related to item validity, gender bias, and differential item functioning, or DIF. A brief discussion of DIF in the context of university courses, as well as practical suggestions to detect possible gender-biased items, follows.
ERIC Educational Resources Information Center
Maddox, Bryan; Zumbo, Bruno D.; Tay-Lim, Brenda; Qu, Demin
2015-01-01
This article explores the potential for ethnographic observations to inform the analysis of test item performance. In 2010, a standardized, large-scale adult literacy assessment took place in Mongolia as part of the United Nations Educational, Scientific and Cultural Organization Literacy Assessment and Monitoring Programme (LAMP). In a novel form…
ERIC Educational Resources Information Center
Dai, Yunyun
2013-01-01
Mixtures of item response theory (IRT) models have been proposed as a technique to explore response patterns in test data related to cognitive strategies, instructional sensitivity, and differential item functioning (DIF). Estimation proves challenging due to difficulties in identification and questions of effect size needed to recover underlying…
Teresi, Jeanne A; Ocepek-Welikson, Katja; Ramirez, Mildred; Kleinman, Marjorie; Ornstein, Katherine; Siu, Albert
2016-01-01
Background The Family Satisfaction with End-of-Life Care is an internationally used measure of satisfaction with cancer care. However, the Family Satisfaction with End-of-Life Care has not been studied for equivalence of item endorsement across different socio-demographic groups using differential item functioning. Aims The aims of this secondary data analysis were (1) to examine potential differential item functioning in the family satisfaction item set with respect to type of caregiver, race, and patient age, gender, and education and (2) to provide parameters and documentation of differential item functioning for an item bank. Design A mixed qualitative and quantitative analysis was conducted. A priori hypotheses regarding potential group differences in item response were established. Item response theory and Wald tests were used for the analyses of differential item functioning, accompanied by magnitude and impact measures. Results Very little significant differential item functioning was observed for patient's age and gender. For race, 13 items showed differential item functioning after multiple comparison adjustment, 10 with non-uniform differential item functioning. No items evidenced differential item functioning of high magnitude, and the impact was negligible. For education, 5 items evidenced uniform differential item functioning after adjustment, none of high magnitude. Differential item functioning impact was trivial. One item evidenced differential item functioning for the caregiver relationship variable. Conclusion Differential item functioning was observed primarily for race and education. No differential item functioning of high magnitude was observed for any item, and the overall impact of differential item functioning was negligible. One item, satisfaction with “the patient's pain relief,” might be singled out for further study, given that this item was both hypothesized and observed to show differential item functioning for race and education. PMID:25160692
Curriculum Type as a Differentiating Factor in Medical Licensing Examinations.
ERIC Educational Resources Information Center
Shen, Linjun
This study assessed the effects of the type of medical curriculum on differential item functioning (DIF) and group differences at the test level in Level 1 of the Comprehensive Osteopathic Medical Licensing Examinations (COMLEX). The study also explored the relationship of the DIF and group differences at the test level. There are generally two…
Neural Differentiation of Incorrectly Predicted Memories.
Kim, Ghootae; Norman, Kenneth A; Turk-Browne, Nicholas B
2017-02-22
When an item is predicted in a particular context but the prediction is violated, memory for that item is weakened (Kim et al., 2014). Here, we explore what happens when such previously mispredicted items are later reencountered. According to prior neural network simulations, this sequence of events-misprediction and subsequent restudy-should lead to differentiation of the item's neural representation from the previous context (on which the misprediction was based). Specifically, misprediction weakens connections in the representation to features shared with the previous context and restudy allows new features to be incorporated into the representation that are not shared with the previous context. This cycle of misprediction and restudy should have the net effect of moving the item's neural representation away from the neural representation of the previous context. We tested this hypothesis using human fMRI by tracking changes in item-specific BOLD activity patterns in the hippocampus, a key structure for representing memories and generating predictions. In left CA2/3/DG, we found greater neural differentiation for items that were repeatedly mispredicted and restudied compared with items from a control condition that was identical except without misprediction. We also measured prediction strength in a trial-by-trial fashion and found that greater misprediction for an item led to more differentiation, further supporting our hypothesis. Therefore, the consequences of prediction error go beyond memory weakening. If the mispredicted item is restudied, the brain adaptively differentiates its memory representation to improve the accuracy of subsequent predictions and to shield it from further weakening. SIGNIFICANCE STATEMENT Competition between overlapping memories leads to weakening of nontarget memories over time, making it easier to access target memories. However, a nontarget memory in one context might become a target memory in another context. How do such memories get restrengthened without increasing competition again? Computational models suggest that the brain handles this by reducing neural connections to the previous context and adding connections to new features that were not part of the previous context. The result is neural differentiation away from the previous context. Here, we provide support for this theory, using fMRI to track neural representations of individual memories in the hippocampus and how they change based on learning. Copyright © 2017 the authors 0270-6474/17/372022-10$15.00/0.
ERIC Educational Resources Information Center
Choi, Youn-Jeng; Alexeev, Natalia; Cohen, Allan S.
2015-01-01
The purpose of this study was to explore what may be contributing to differences in performance in mathematics on the Trends in International Mathematics and Science Study 2007. This was done by using a mixture item response theory modeling approach to first detect latent classes in the data and then to examine differences in performance on items…
Goetz, Christopher G; Liu, Yuanyuan; Stebbins, Glenn T; Wang, Lu; Tilley, Barbara C; Teresi, Jeanne A; Merkitch, Douglas; Luo, Sheng
2016-12-01
Assess MDS-UPDRS items for gender-, age-, and race/ethnicity-based differential item functioning. Assessing differential item functioning is a core rating scale validation step. For the MDS-UPDRS, differential item functioning occurs if item-score probability among people with similar levels of parkinsonism differ according to selected covariates (gender, age, race/ethnicity). If the magnitude of differential item functioning is clinically relevant, item-score interpretation must consider influences by these covariates. Differential item functioning can be nonuniform (covariate variably influences an item-score across different levels of parkinsonism) or uniform (covariate influences an item-score consistently over all levels of parkinsonism). Using the MDS-UPDRS translation database of more than 5,000 PD patients from 14 languages, we tested gender-, age-, and race/ethnicity-based differential item functioning. To designate an item as having clinically relevant differential item functioning, we required statistical confirmation by 2 independent methods, along with a McFadden pseudo-R 2 magnitude statistic greater than "negligible." Most items showed no gender-, age- or race/ethnicity-based differential item functioning. When differential item functioning was identified, the magnitude statistic was always in the "negligible" range, and the scale-level impact was minimal. The absence of clinically relevant differential item functioning across all items and all parts of the MDS-UPDRS is strong evidence that the scale can be used confidently. As studies of Parkinson's disease increasingly involve multinational efforts and the MDS-UPDRS has several validated non-English translations, the findings support the scale's broad applicability in populations with varying gender, age, and race/ethnicity distributions. © 2016 International Parkinson and Movement Disorder Society. © 2016 International Parkinson and Movement Disorder Society.
Identifying Country-Specific Cultures of Physics Education: A differential item functioning approach
NASA Astrophysics Data System (ADS)
Mesic, Vanes
2012-11-01
In international large-scale assessments of educational outcomes, student achievement is often represented by unidimensional constructs. This approach allows for drawing general conclusions about country rankings with respect to the given achievement measure, but it typically does not provide specific diagnostic information which is necessary for systematic comparisons and improvements of educational systems. Useful information could be obtained by exploring the differences in national profiles of student achievement between low-achieving and high-achieving countries. In this study, we aimed to identify the relative weaknesses and strengths of eighth graders' physics achievement in Bosnia and Herzegovina in comparison to the achievement of their peers from Slovenia. For this purpose, we ran a secondary analysis of Trends in International Mathematics and Science Study (TIMSS) 2007 data. The student sample consisted of 4,220 students from Bosnia and Herzegovina and 4,043 students from Slovenia. After analysing the cognitive demands of TIMSS 2007 physics items, the correspondent differential item functioning (DIF)/differential group functioning contrasts were estimated. Approximately 40% of items exhibited large DIF contrasts, indicating significant differences between cultures of physics education in Bosnia and Herzegovina and Slovenia. The relative strength of students from Bosnia and Herzegovina showed to be mainly associated with the topic area 'Electricity and magnetism'. Classes of items which required the knowledge of experimental method, counterintuitive thinking, proportional reasoning and/or the use of complex knowledge structures proved to be differentially easier for students from Slovenia. In the light of the presented results, the common practice of ranking countries with respect to universally established cognitive categories seems to be potentially misleading.
An Effect Size Measure for Raju's Differential Functioning for Items and Tests
ERIC Educational Resources Information Center
Wright, Keith D.; Oshima, T. C.
2015-01-01
This study established an effect size measure for differential functioning for items and tests' noncompensatory differential item functioning (NCDIF). The Mantel-Haenszel parameter served as the benchmark for developing NCDIF's effect size measure for reporting moderate and large differential item functioning in test items. The effect size of…
NASA Astrophysics Data System (ADS)
Chiu, Tina
This dissertation includes three studies that analyze a new set of assessment tasks developed by the Learning Progressions in Middle School Science (LPS) Project. These assessment tasks were designed to measure science content knowledge on the structure of matter domain and scientific argumentation, while following the goals from the Next Generation Science Standards (NGSS). The three studies focus on the evidence available for the success of this design and its implementation, generally labelled as "validity" evidence. I use explanatory item response models (EIRMs) as the overarching framework to investigate these assessment tasks. These models can be useful when gathering validity evidence for assessments as they can help explain student learning and group differences. In the first study, I explore the dimensionality of the LPS assessment by comparing the fit of unidimensional, between-item multidimensional, and Rasch testlet models to see which is most appropriate for this data. By applying multidimensional item response models, multiple relationships can be investigated, and in turn, allow for a more substantive look into the assessment tasks. The second study focuses on person predictors through latent regression and differential item functioning (DIF) models. Latent regression models show the influence of certain person characteristics on item responses, while DIF models test whether one group is differentially affected by specific assessment items, after conditioning on latent ability. Finally, the last study applies the linear logistic test model (LLTM) to investigate whether item features can help explain differences in item difficulties.
Dimensionality and DIF in a Licensure Examination.
ERIC Educational Resources Information Center
Sykes, Robert C.; And Others
The sources of multidimensionality found in several different forms of a licensure examination were studied. The relationship between one source of multidimensionality, differential item functioning (DIF) (or factors producing DIF), and content characteristics was explored in an attempt to isolate aspects of training or curriculum that could…
The Egocentric Reference for Visual Exploration and Orientation
ERIC Educational Resources Information Center
Nico, Daniele; Daprati, Elena
2009-01-01
Clinical signs of damage to the egocentric reference system range from the inability to detect stimuli in the real environment to a defect in recovering items from an internal representation. Despite clinical dissociations, current interpretations consider all symptoms as due to a single perturbation, differentially expressed according to the…
ERIC Educational Resources Information Center
Thurman, Carol
2009-01-01
The increased use of polytomous item formats has led assessment developers to pay greater attention to the detection of differential item functioning (DIF) in these items. DIF occurs when an item performs differently for two contrasting groups of respondents (e.g., males versus females) after controlling for differences in the abilities of the…
Code of Federal Regulations, 2012 CFR
2012-04-01
... 17 Commodity and Securities Exchanges 3 2012-04-01 2012-04-01 false Inclusion of items, differentiation between items and answers, omission of instructions. 260.7a-16 Section 260.7a-16 Commodity and... INDENTURE ACT OF 1939 Formal Requirements § 260.7a-16 Inclusion of items, differentiation between items and...
Code of Federal Regulations, 2014 CFR
2014-04-01
... 17 Commodity and Securities Exchanges 4 2014-04-01 2014-04-01 false Inclusion of items, differentiation between items and answers, omission of instructions. 260.7a-16 Section 260.7a-16 Commodity and... INDENTURE ACT OF 1939 Formal Requirements § 260.7a-16 Inclusion of items, differentiation between items and...
Code of Federal Regulations, 2013 CFR
2013-04-01
... 17 Commodity and Securities Exchanges 3 2013-04-01 2013-04-01 false Inclusion of items, differentiation between items and answers, omission of instructions. 260.7a-16 Section 260.7a-16 Commodity and... INDENTURE ACT OF 1939 Formal Requirements § 260.7a-16 Inclusion of items, differentiation between items and...
Code of Federal Regulations, 2010 CFR
2010-04-01
... 17 Commodity and Securities Exchanges 3 2010-04-01 2010-04-01 false Inclusion of items, differentiation between items and answers, omission of instructions. 260.7a-16 Section 260.7a-16 Commodity and... INDENTURE ACT OF 1939 Formal Requirements § 260.7a-16 Inclusion of items, differentiation between items and...
Code of Federal Regulations, 2011 CFR
2011-04-01
... 17 Commodity and Securities Exchanges 3 2011-04-01 2011-04-01 false Inclusion of items, differentiation between items and answers, omission of instructions. 260.7a-16 Section 260.7a-16 Commodity and... INDENTURE ACT OF 1939 Formal Requirements § 260.7a-16 Inclusion of items, differentiation between items and...
Exploring the impact of disability on self-determination measurement.
Mumbardó-Adam, Cristina; Guàrdia-Olmos, Joan; Giné, Climent
2018-07-01
Self-determination is a psychological construct that applies to both the general population and to individuals with disabilities that can be self-determined with adequate accommodations and opportunities. As the relevance of self-determination-related skills in life has been recently acknowledged, researchers have created a measure to assess self-determination in adolescents and young adults with and without disabilities. The Self-Determination Inventory: Student Report (Spanish interim version) is empirically being validated into Spanish. As this scale is the first assessment addressed to all youth, further exploration of its psychometric properties is required to ensure the reliability of the self-determination measurement and gain further insight into the construct when applied to youth with and without disabilities. More than 600 participants were asked to complete the scale. The impact of disability on the item response distributions across the dimensions of self-determination was explored. Differential item functioning (DIF) was found in only 5 of the scale's 45 items. Differences primary favored youth without disabilities. The weak presence of DIF across the items supports the instrument's psychometrical robustness when measuring self-determination in youth with and without disabilities and provides further understanding of the self-determination construct. Implications and future research directions are also discussed. Copyright © 2018 Elsevier Ltd. All rights reserved.
The Student Risk Screening Scale: Exploring Dimensionality and Differential Item Functioning
ERIC Educational Resources Information Center
Schatschneider, Christopher; Lane, Kathleen Lynne; Oakes, Wendy Peia; Kalberg, Jemma Robertson
2014-01-01
Screening of students at risk for antisocial behaviors in school is an essential step in the implementation of evidence-based supports for academic, behavioral, and social domains at the first sign of concern. This study examined the measurement properties of a free-access systematic behavior screening tool: the Student Risk Screening Scale…
ERIC Educational Resources Information Center
Arbuthnot, Keena
2009-01-01
Although research has extensively documented sources for differential item functioning and stereotype threat--especially among women and black college students--little is known about group differences in test-taking strategies among black adolescent students. In this article, Arbuthnot presents findings from two studies that seek to explore how…
DIF Trees: Using Classification Trees to Detect Differential Item Functioning
ERIC Educational Resources Information Center
Vaughn, Brandon K.; Wang, Qiu
2010-01-01
A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…
LeBouthillier, Daniel M; Thibodeau, Michel A; Alberts, Nicole M; Hadjistavropoulos, Heather D; Asmundson, Gordon J G
2015-04-01
Individuals with medical conditions are likely to have elevated health anxiety; however, research has not demonstrated how medical status impacts response patterns on health anxiety measures. Measurement bias can undermine the validity of a questionnaire by overestimating or underestimating scores in groups of individuals. We investigated whether the Short Health Anxiety Inventory (SHAI), a widely-used measure of health anxiety, exhibits medical condition-based bias on item and subscale levels, and whether the SHAI subscales adequately assess the health anxiety continuum. Data were from 963 individuals with diabetes, breast cancer, or multiple sclerosis, and 372 healthy individuals. Mantel-Haenszel tests and item characteristic curves were used to classify the severity of item-level differential item functioning in all three medical groups compared to the healthy group. Test characteristic curves were used to assess scale-level differential item functioning and whether the SHAI subscales adequately assess the health anxiety continuum. Nine out of 14 items exhibited differential item functioning. Two items exhibited differential item functioning in all medical groups compared to the healthy group. In both Thought Intrusion and Fear of Illness subscales, differential item functioning was associated with mildly deflated scores in medical groups with very high levels of the latent traits. Fear of Illness items poorly discriminated between individuals with low and very low levels of the latent trait. While individuals with medical conditions may respond differentially to some items, clinicians and researchers can confidently use the SHAI with a variety of medical populations without concern of significant bias. Copyright © 2015 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Ahmadi, Alireza; Bazvand, Ali Darabi
2016-01-01
Differential Item Functioning (DIF) exists when examinees of equal ability from different groups have different probabilities of successful performance in a certain item. This study examined gender differential item functioning across the PhD Entrance Exam of TEFL (PEET) in Iran, using both logistic regression (LR) and one-parameter item response…
Detection of Differential Item Functioning Using the Lasso Approach
ERIC Educational Resources Information Center
Magis, David; Tuerlinckx, Francis; De Boeck, Paul
2015-01-01
This article proposes a novel approach to detect differential item functioning (DIF) among dichotomously scored items. Unlike standard DIF methods that perform an item-by-item analysis, we propose the "LR lasso DIF method": logistic regression (LR) model is formulated for all item responses. The model contains item-specific intercepts,…
Schultz-Larsen, Kirsten; Kreiner, Svend; Lomholt, Rikke Kirstine
2007-03-01
This study published in two companion papers assesses properties of the Mini-Mental State Examination (MMSE) with the purpose of improving the efficiencies of the methods of screening for cognitive impairment and dementia. An item analysis by conventional and mixed Rasch models was used to explore empirically derived cognitive dimensions of the MMSE, to assess item bias, and to construct diagnostic cut-points. The scores of 1,189 elderly residents were analyzed. Two dimensions of cognitive function, which are statistically and conceptually different from those obtained in previous studies, were derived. The corresponding sum scales were (1) age-correlated MMSE scale (A-MMSE scale: orientation to time, attention/calculation, naming, repetition, and three-stage command) and (2) non-age-correlated MMSE scale (B-MMSE scale: orientation to place, registration, recall, reading, and copying). The "writing" item was not included due to differential effects of age and sex. The analysis also showed that the study sample consisted of two cognitively different groups of elderly. The findings indicate that a two-scale solution is a stable and statistically supported framework for interpreting data obtained by means of the MMSE. Supplementary analyses are presented in the companion paper to explore the performance of this item response theory calibration as a screening test for dementia.
EAFI: Examination of Anomalous Fantasy and Imagination.
Rasmussen, Andreas Rosén; Stephensen, Helene; Parnas, Josef
2018-05-14
The Examination of Anomalous Fantasy and Imagination (EAFI) is an instrument for a semistructured, phenomenological exploration of psychopathology of imagination. The EAFI provides a conceptual-descriptive framework to address such experiences. It consists of 16 main items, sometimes divided into subtypes. We suggest that the anomalies of imagination explored by the EAFI reflect an alteration in the structure of consciousness and belong to a fundamental, generative layer of psychopathology with relevance to differential diagnostic purposes. © 2018 S. Karger AG, Basel.
ERIC Educational Resources Information Center
Huang, Xiaoting; Wilson, Mark; Wang, Lei
2016-01-01
In recent years, large-scale international assessments have been increasingly used to evaluate and compare the quality of education across regions and countries. However, measurement variance between different versions of these assessments often posts threats to the validity of such cross-cultural comparisons. In this study, we investigated the…
ERIC Educational Resources Information Center
Wang, Wen-Chung
2004-01-01
Scale indeterminacy in analysis of differential item functioning (DIF) within the framework of item response theory can be resolved by imposing 3 anchor item methods: the equal-mean-difficulty method, the all-other anchor item method, and the constant anchor item method. In this article, applicability and limitations of these 3 methods are…
ERIC Educational Resources Information Center
Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem
2016-01-01
The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…
ERIC Educational Resources Information Center
Penfield, Randall D.; Alvarez, Karina; Lee, Okhee
2009-01-01
The assessment of differential item functioning (DIF) in polytomous items addresses between-group differences in measurement properties at the item level, but typically does not inform which score levels may be involved in the DIF effect. The framework of differential step functioning (DSF) addresses this issue by examining between-group…
ERIC Educational Resources Information Center
Fukuhara, Hirotaka; Kamata, Akihito
2011-01-01
A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into…
ERIC Educational Resources Information Center
Tay, Louis; Vermunt, Jeroen K.; Wang, Chun
2013-01-01
We evaluate the item response theory with covariates (IRT-C) procedure for assessing differential item functioning (DIF) without preknowledge of anchor items (Tay, Newman, & Vermunt, 2011). This procedure begins with a fully constrained baseline model, and candidate items are tested for uniform and/or nonuniform DIF using the Wald statistic.…
Item Purification in Differential Item Functioning Using Generalized Linear Mixed Models
ERIC Educational Resources Information Center
Liu, Qian
2011-01-01
For this dissertation, four item purification procedures were implemented onto the generalized linear mixed model for differential item functioning (DIF) analysis, and the performance of these item purification procedures was investigated through a series of simulations. Among the four procedures, forward and generalized linear mixed model (GLMM)…
Haggerty, Jeannie L.; Bouharaoui, Fatima; Santor, Darcy A.
2011-01-01
Evaluating the extent to which groups or subgroups of individuals differ with respect to primary healthcare experience depends on first ruling out the possibility of bias. Objective: To determine whether item or subscale performance differs systematically between French/English, high/low education subgroups and urban/rural residency. Method: A sample of 645 adult users balanced by French/English language (in Quebec and Nova Scotia, respectively), high/low education and urban/rural residency responded to six validated instruments: the Primary Care Assessment Survey (PCAS); the Primary Care Assessment Tool – Short Form (PCAT-S); the Components of Primary Care Index (CPCI); the first version of the EUROPEP (EUROPEP-I); the Interpersonal Processes of Care Survey, version II (IPC-II); and part of the Veterans Affairs National Outpatient Customer Satisfaction Survey (VANOCSS). We normalized subscale scores to a 0-to-10 scale and tested for between-group differences using ANOVA tests. We used a parametric item response model to test for differences between subgroups in item discriminability and item difficulty. We re-examined group differences after removing items with differential item functioning. Results: Experience of care was assessed more positively in the English-speaking (Nova Scotia) than in the French-speaking (Quebec) respondents. We found differential English/French item functioning in 48% of the 153 items: discriminability in 20% and differential difficulty in 28%. English items were more discriminating generally than the French. Removing problematic items did not change the differences in French/English assessments. Differential item functioning by high/low education status affected 27% of items, with items being generally more discriminating in high-education groups. Between-group comparisons were unchanged. In contrast, only 9% of items showed differential item functioning by geography, affecting principally the accessibility attribute. Removing problematic items reversed a previously non-significant finding, revealing poorer first-contact access in rural than in urban areas. Conclusion: Differential item functioning does not bias or invalidate French/English comparisons on subscales, but additional development is required to make French and English items equivalent. These instruments are relatively robust by educational status and geography, but results suggest potential differences in the underlying construct in low-education and rural respondents. PMID:23205035
Real and Artificial Differential Item Functioning in Polytomous Items
ERIC Educational Resources Information Center
Andrich, David; Hagquist, Curt
2015-01-01
Differential item functioning (DIF) for an item between two groups is present if, for the same person location on a variable, persons from different groups have different expected values for their responses. Applying only to dichotomously scored items in the popular Mantel-Haenszel (MH) method for detecting DIF in which persons are classified by…
Gender-Based Differential Item Performance in Mathematics Achievement Items.
ERIC Educational Resources Information Center
Doolittle, Allen E.; Cleary, T. Anne
1987-01-01
Eight randomly equivalent samples of high school seniors were each given a unique form of the ACT Assessment Mathematics Usage Test (ACTM). Signed measures of differential item performance (DIP) were obtained for each item in the eight ACTM forms. DIP estimates were analyzed and a significant item category effect was found. (Author/LMO)
ERIC Educational Resources Information Center
Mitchelson, Jacqueline K.; Wicher, Eliza W.; LeBreton, James M.; Craig, S. Bartholomew
2009-01-01
The current study evaluates the measurement precision of the Abridged Big Five Circumplex (AB5C) of personality traits by identifying those items that demonstrate differential item functioning by gender and ethnicity. Differential item functioning is found in 33 of 45 (73%) of the AB5C scales, across gender and ethnic groups (Caucasian vs. African…
The MIMIC Model as a Tool for Differential Bundle Functioning Detection
ERIC Educational Resources Information Center
Finch, W. Holmes
2012-01-01
Increasingly, researchers interested in identifying potentially biased test items are encouraged to use a confirmatory, rather than exploratory, approach. One such method for confirmatory testing is rooted in differential bundle functioning (DBF), where hypotheses regarding potential differential item functioning (DIF) for sets of items (bundles)…
Concreteness effects in short-term memory: a test of the item-order hypothesis.
Roche, Jaclynn; Tolan, G Anne; Tehan, Gerald
2011-12-01
The following experiments explore word length and concreteness effects in short-term memory within an item-order processing framework. This framework asserts order memory is better for those items that are relatively easy to process at the item level. However, words that are difficult to process benefit at the item level for increased attention/resources being applied. The prediction of the model is that differential item and order processing can be detected in episodic tasks that differ in the degree to which item or order memory are required by the task. The item-order account has been applied to the word length effect such that there is a short word advantage in serial recall but a long word advantage in item recognition. The current experiment considered the possibility that concreteness effects might be explained within the same framework. In two experiments, word length (Experiment 1) and concreteness (Experiment 2) are examined using forward serial recall, backward serial recall, and item recognition. These results for word length replicate previous studies showing the dissociation in item and order tasks. The same was not true for the concreteness effect. In all three tasks concrete words were better remembered than abstract words. The concreteness effect cannot be explained in terms of an item-order trade off. PsycINFO Database Record (c) 2011 APA, all rights reserved.
Local context effects during emotional item directed forgetting in younger and older adults.
Gallant, Sara N; Dyson, Benjamin J; Yang, Lixia
2017-09-01
This paper explored the differential sensitivity young and older adults exhibit to the local context of items entering memory. We examined trial-to-trial performance during an item directed forgetting task for positive, negative, and neutral (or baseline) words each cued as either to-be-remembered (TBR) or to-be-forgotten (TBF). This allowed us to focus on how variations in emotional valence (independent of arousal) and instruction (TBR vs. TBF) of the previous item (trial n-1) impacted memory for the current item (trial n) during encoding. Different from research showing impairing effects of emotional arousal, both age groups showed a memorial boost for stimuli when preceded by items high in positive or negative valence relative to those preceded by neutral items. This advantage was particularly prominent for neutral trial n items that followed emotional items suggesting that, regardless of age, neutral memories may be strengthened by a local context that is high in valence. A trending age difference also emerged with older adults showing greater sensitivity when encoding instructions changed between trial n-1 and n. Results are discussed in light of age-related theories of cognitive and emotional processing, highlighting the need to consider the dynamic, moment-to-moment fluctuations of these systems.
Babiar, Tasha Calvert
2011-01-01
Traditionally, women and minorities have not been fully represented in science and engineering. Numerous studies have attributed these differences to gaps in science achievement as measured by various standardized tests. Rather than describe mean group differences in science achievement across multiple cultures, this study focused on an in-depth item-level analysis across two countries: Spain and the United States. This study investigated eighth-grade gender differences on science items across the two countries. A secondary purpose of the study was to explore the nature of gender differences using the many-faceted Rasch Model as a way to estimate gender DIF. A secondary analysis of data from the Third International Mathematics and Science Study (TIMSS) was used to address three questions: 1) Does gender DIF in science achievement exist? 2) Is there a relationship between gender DIF and characteristics of the science items? 3) Do the relationships between item characteristics and gender DIF in science items replicate across countries. Participants included 7,087 eight grade students from the United States and 3,855 students from Spain who participated in TIMSS. The Facets program (Linacre and Wright, 1992) was used to estimate gender DIF. The results of the analysis indicate that the content of the item seemed to be related to gender DIF. The analysis also suggests that there is a relationship between gender DIF and item format. No pattern of gender DIF related to cognitive demand was found. The general pattern of gender DIF was similar across the two countries used in the analysis. The strength of item-level analysis as opposed to group mean difference analysis is that gender differences can be detected at the item level, even when no mean differences can be detected at the group level.
Morales, Leo S; Flowers, Claudia; Gutierrez, Peter; Kleinman, Marjorie; Teresi, Jeanne A
2006-11-01
To illustrate the application of the Differential Item and Test Functioning (DFIT) method using English and Spanish versions of the Mini-Mental State Examination (MMSE). Study participants were 65 years of age or older and lived in North Manhattan, New York. Of the 1578 study participants who were administered the MMSE 665 completed it in Spanish. : The MMSE contains 20 items that measure the degree of cognitive impairment in the areas of orientation, attention and calculation, registration, recall and language, as well as the ability to follow verbal and written commands. After assessing the dimensionality of the MMSE scale, item response theory person and item parameters were estimated separately for the English and Spanish sample using Samejima's 2-parameter graded response model. Then the DFIT framework was used to assess differential item functioning (DIF) and differential test functioning (DTF). Nine items were found to show DIF; these were items that ask the respondent to name the correct season, day of the month, city, state, and 2 nearby streets, recall 3 objects, repeat the phrase no ifs, no ands, no buts, follow the command, "close your eyes," and the command, "take the paper in your right hand, fold the paper in half with both hands, and put the paper down in your lap." At the scale level, however, the MMSE did not show differential functioning. Respondents to the English and Spanish versions of the MMSE are comparable on the basis of scale scores. However, assessments based on individual MMSE items may be misleading.
ERIC Educational Resources Information Center
Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.
2016-01-01
In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…
Applying a Mixed Methods Framework to Differential Item Function Analyses
ERIC Educational Resources Information Center
Hitchcock, John H.; Johanson, George A.
2015-01-01
Understanding the reason(s) for Differential Item Functioning (DIF) in the context of measurement is difficult. Although identifying potential DIF items is typically a statistical endeavor, understanding the reasons for DIF (and item repair or replacement) might require investigations that can be informed by qualitative work. Such work is…
Effect of Differential Item Functioning on Test Equating
ERIC Educational Resources Information Center
Kabasakal, Kübra Atalay; Kelecioglu, Hülya
2015-01-01
This study examines the effect of differential item functioning (DIF) items on test equating through multilevel item response models (MIRMs) and traditional IRMs. The performances of three different equating models were investigated under 24 different simulation conditions, and the variables whose effects were examined included sample size, test…
Ramsay-Curve Differential Item Functioning
ERIC Educational Resources Information Center
Woods, Carol M.
2011-01-01
Differential item functioning (DIF) occurs when an item on a test, questionnaire, or interview has different measurement properties for one group of people versus another, irrespective of true group-mean differences on the constructs being measured. This article is focused on item response theory based likelihood ratio testing for DIF (IRT-LR or…
Differential Item Functioning Analysis Using Rasch Item Information Functions
ERIC Educational Resources Information Center
Wyse, Adam E.; Mapuranga, Raymond
2009-01-01
Differential item functioning (DIF) analysis is a statistical technique used for ensuring the equity and fairness of educational assessments. This study formulates a new DIF analysis method using the information similarity index (ISI). ISI compares item information functions when data fits the Rasch model. Through simulations and an international…
ERIC Educational Resources Information Center
Penfield, Randall D.; Algina, James
2006-01-01
One approach to measuring unsigned differential test functioning is to estimate the variance of the differential item functioning (DIF) effect across the items of the test. This article proposes two estimators of the DIF effect variance for tests containing dichotomous and polytomous items. The proposed estimators are direct extensions of the…
Parker, G; McCraw, S; Hadzi-Pavlovic, D
2015-07-15
Studies suggest that differentiating melancholic from non-melancholic depressive disorders is advanced by use of illness course as well as symptom variables but, in practice, potentially differentiating variables are generally positioned as having equal value. Judging that differentiating features are more likely to vary in their signal intensity, we sought to determine the number of features required to effect differentiation and their hierarchical order. The 24-item clinician-rated Sydney Melancholia Prototype Index (SMPI-CR) was completed for 364 unipolar depressed patients. The sample was divided into two cohorts according to the recruitment period. An RPART classification tree analysis identified the most discriminating SMPI items in the development sample of 197 patients, and examined the sensitivity and specificity of the diagnostic decisions, then sought to replicate findings in a validation sample of 169 patients. Independent analyses of putative SMPI items identified only seven items as required to discriminate those with clinically-diagnosed melancholic or non-melancholic depression when the conditions were examined separately. An RPART analysis considering differentiation of melancholic and non-melancholic depression in the total samples retained five of those items in the classification tree, three of which were non-symptom items, and with 92% sensitivity and 80% specificity in the development sample. This reduced item set showed 93% sensitivity and 82% specificity in the validation sample. Our clinical judgment of melancholic or non-melancholic depression may not correspond with the clinical logic employed by other clinicians. Only five SMPI items were required to derive a succinct and efficient decision tree, comprising high sensitivity and specificity in differentiating melancholic and non-melancholic depression. Current study findings provide an empirical model that could enrich clinicians׳ approach to differentiating melancholic and non-melancholic depression. Copyright © 2015 Elsevier B.V. All rights reserved.
Screening Test Items for Differential Item Functioning
ERIC Educational Resources Information Center
Longford, Nicholas T.
2014-01-01
A method for medical screening is adapted to differential item functioning (DIF). Its essential elements are explicit declarations of the level of DIF that is acceptable and of the loss function that quantifies the consequences of the two kinds of inappropriate classification of an item. Instead of a single level and a single function, sets of…
Differential Item Functioning: Its Consequences. Research Report. ETS RR-10-01
ERIC Educational Resources Information Center
Lee, Yi-Hsuan; Zhang, Jinming
2010-01-01
This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…
Detection of Gender-Based Differential Item Functioning in a Mathematics Performance Assessment.
ERIC Educational Resources Information Center
Wang, Ning; Lane, Suzanne
This study used three different differential item functioning (DIF) procedures to examine the extent to which items in a mathematics performance assessment functioned differently for matched gender groups. In addition to examining the appropriateness of individual items in terms of DIF with respect to gender, an attempt was made to identify…
Real and Artificial Differential Item Functioning
ERIC Educational Resources Information Center
Andrich, David; Hagquist, Curt
2012-01-01
The literature in modern test theory on procedures for identifying items with differential item functioning (DIF) among two groups of persons includes the Mantel-Haenszel (MH) procedure. Generally, it is not recognized explicitly that if there is real DIF in some items which favor one group, then as an artifact of this procedure, artificial DIF…
Ayala, Alba; Bilbao, Amaia; Garcia-Perez, Sonia; Escobar, Antonio; Forjaz, Maria João
2018-03-01
The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) measures the quality of life of patients with osteoarthritis (OA), and there is a specific scale for the physical functioning dimension, the short version with seven items WOMAC-pf. This study describes the application of the Rasch model to explore scale invariance and response stability of the WOMAC-pf short version across affected joint and over time. A sample of 884 patients with OA, from 15 hospitals in Spain, completed the WOMAC-pf before surgery (baseline) and at 3, 6 and 12 months post-surgery of hip or knee. The invariance by joint was explored through the differential item functioning (DIF) analysis of the Rasch model using baseline data, and time stability (DIF by time) were evaluated in stack data (each participant is represented four times, one by time point). Mean age of the patients was of 69.13 years (SD 10.01), 59.3% of them were women (n = 524), 59.2% had knee OA (n = 523) and 40.8% hip OA (n = 361). Item "putting on socks" showed DIF by joint and time. Fit to the Rasch model using stack data improved when this item was removed. Good reliability for individual use, local independency and unidimensionality of the models were confirmed. WOMAC-pf 7-item short version was invariant over time and joint when item "putting on socks" was removed. Researchers should carefully evaluate this item as it presents problems in scale invariance and stability, which could affect results when comparing data by joint or when computing change scores.
ERIC Educational Resources Information Center
Grover, Raman K.; Ercikan, Kadriye
2017-01-01
In gender differential item functioning (DIF) research it is assumed that all members of a gender group have similar item response patterns and therefore generalizations from group level to subgroup and individual levels can be made accurately. However DIF items do not necessarily disadvantage every member of a gender group to the same degree,…
Kwag, Kyung Hwa; Chiriboga, David A.
2010-01-01
Objectives. Given the emphasis on modesty and self-effacement in Asian societies, the present study explored differential item responses for 2 positive affect items (5 = Hopeful and 8 = Happy) on a short form of the Center for Epidemiologic Studies-Depression scale. The samples consisted of elderly non-Hispanic Whites (n = 450), Korean Americans (n = 519), and Koreans (n = 2,030). Method. Multiple Indicator Multiple Cause models were estimated to identify the impact of group membership on responses to the positive affect items while controlling for the latent trait of depressive symptoms. Results. The data revealed that Koreans and Korean Americans were less likely than non-Hispanic Whites to endorse the positive affect items. Compared with Korean Americans who were more acculturated to mainstream American culture, those who were less acculturated were less likely to endorse the positive affect items. Discussion. Our findings support the notion that the way in which people endorse depressive symptoms is substantially influenced by cultural orientation. These findings call into question the common use of simple mean comparisons and a universal cutoff point across diverse cultural groups. PMID:20660026
Jang, Yuri; Kwag, Kyung Hwa; Chiriboga, David A
2010-11-01
Given the emphasis on modesty and self-effacement in Asian societies, the present study explored differential item responses for 2 positive affect items (5 = Hopeful and 8 = Happy) on a short form of the Center for Epidemiologic Studies-Depression scale. The samples consisted of elderly non-Hispanic Whites (n = 450), Korean Americans (n = 519), and Koreans (n = 2,030). Multiple Indicator Multiple Cause models were estimated to identify the impact of group membership on responses to the positive affect items while controlling for the latent trait of depressive symptoms. The data revealed that Koreans and Korean Americans were less likely than non-Hispanic Whites to endorse the positive affect items. Compared with Korean Americans who were more acculturated to mainstream American culture, those who were less acculturated were less likely to endorse the positive affect items. Our findings support the notion that the way in which people endorse depressive symptoms is substantially influenced by cultural orientation. These findings call into question the common use of simple mean comparisons and a universal cutoff point across diverse cultural groups.
Assessment of Differential Item Functioning in Testlet-Based Items Using the Rasch Testlet Model
ERIC Educational Resources Information Center
Wang, Wen-Chung; Wilson, Mark
2005-01-01
This study presents a procedure for detecting differential item functioning (DIF) for dichotomous and polytomous items in testlet-based tests, whereby DIF is taken into account by adding DIF parameters into the Rasch testlet model. Simulations were conducted to assess recovery of the DIF and other parameters. Two independent variables, test type…
The Effects of Testlets on Reliability and Differential Item Functioning
ERIC Educational Resources Information Center
Teker, Gulsen Tasdelen; Dogan, Nuri
2015-01-01
Reliability and differential item functioning (DIF) analyses were conducted on testlets displaying local item dependence in this study. The data set employed in the research was obtained from the answers given by 1,500 students to the 20 items included in six testlets given in English Proficiency Exam by the School of Foreign Languages of a state…
MIMIC Methods for Assessing Differential Item Functioning in Polytomous Items
ERIC Educational Resources Information Center
Wang, Wen-Chung; Shih, Ching-Lin
2010-01-01
Three multiple indicators-multiple causes (MIMIC) methods, namely, the standard MIMIC method (M-ST), the MIMIC method with scale purification (M-SP), and the MIMIC method with a pure anchor (M-PA), were developed to assess differential item functioning (DIF) in polytomous items. In a series of simulations, it appeared that all three methods…
ERIC Educational Resources Information Center
Myers, Nicholas D.; Wolfe, Edward W.; Feltz, Deborah L.; Penfield, Randall D.
2006-01-01
This study (a) provided a conceptual introduction to differential item functioning (DIF), (b) introduced the multifaceted Rasch rating scale model (MRSM) and an associated statistical procedure for identifying DIF in rating scale items, and (c) applied this procedure to previously collected data from American coaches who responded to the coaching…
Differential Item Functioning Analysis of the 2003-04 NHANES Physical Activity Questionnaire
ERIC Educational Resources Information Center
Gao, Yong; Zhu, Weimo
2011-01-01
Using differential item functioning (DIF) analyses, this study examined whether there were any DIF items in the National Health and Nutrition Examination Survey (NHANES) physical activity (PA) questionnaire. A subset of adult data from the 2003-04 NHANES study (n = 3,083) was used. PA items related to respondents' occupational, transportation,…
Identifying Differential Item Functioning in Multi-Stage Computer Adaptive Testing
ERIC Educational Resources Information Center
Gierl, Mark J.; Lai, Hollis; Li, Johnson
2013-01-01
The purpose of this study is to evaluate the performance of CATSIB (Computer Adaptive Testing-Simultaneous Item Bias Test) for detecting differential item functioning (DIF) when items in the matching and studied subtest are administered adaptively in the context of a realistic multi-stage adaptive test (MST). MST was simulated using a 4-item…
ERIC Educational Resources Information Center
Bilir, Mustafa Kuzey
2009-01-01
This study uses a new psychometric model (mixture item response theory-MIMIC model) that simultaneously estimates differential item functioning (DIF) across manifest groups and latent classes. Current DIF detection methods investigate DIF from only one side, either across manifest groups (e.g., gender, ethnicity, etc.), or across latent classes…
A Comparison of Two Area Measures for Detecting Differential Item Functioning.
ERIC Educational Resources Information Center
Kim, Seock-Ho; Cohen, Allan S.
1991-01-01
The exact and closed-interval area measures for detecting differential item functioning are compared for actual data from 1,000 African-American and 1,000 white college students taking a vocabulary test with items intentionally constructed to favor 1 set of examinees. No real differences in detection of biased items were found. (SLD)
ERIC Educational Resources Information Center
Holweger, Nancy; Taylor, Grace
The fifth-grade and eighth-grade science items on a state performance assessment were compared for differential item functioning (DIF) due to gender. The grade 5 sample consisted of 8,539 females and 8,029 males and the grade 8 sample consisted of 7,477 females and 7,891 males. A total of 30 fifth grade items and 26 eighth grade items were…
Woodward, Matthew R; Hafeez, Muhammad Ubaid; Qi, Qianya; Riaz, Ahmed; Benedict, Ralph H B; Yan, Li; Szigeti, Kinga
2018-04-19
To explore whether the ability to recognize specific odorant items is differentially affected in aging versus Alzheimer disease (AD); to refine olfactory identification deficit (OID) as a biomarker of prodromal and early AD. Prospective multicenter cross-sectional study with a longitudinal arm. Outpatient memory diagnostic clinics in New York and Texas. Adults aged 65 and older with amnestic mild cognitive impairment (aMCI) and AD and healthy aging (HA) subjects in the comparison group. Participants completed the University of Pennsylvania Smell Identification Test (UPSIT) and neuropsychological testing. AD-associated odorants (AD-10) were selected based on a model of ordinal logistic regression. Age-associated odorants (Age-10) were identified using a linear model. For the 841 participants (234 HA, 192 aMCI, 415 AD), AD-10 was superior to Age-10 in separating HA and AD. AD-10 was associated with a more widespread cognitive deficit across multiple domains, in contrast to Age-10. The disease- and age-associated odorants clustered separately in age and AD. AD-10 predicted conversion from aMCI to AD. Nonoverlapping UPSIT items were identified that were individually associated with age and disease. Despite a modest predictive value of the AD-specific items for conversion to AD, the AD-specific items may be useful in enriching samples to better identify those at risk for AD. Further studies are needed with monomolecular and unilateral stimulation and orthogonal biomarker validation to further refine disease- and age-associated signals. Copyright © 2018 American Association for Geriatric Psychiatry. Published by Elsevier Inc. All rights reserved.
Pallant, J F; Haines, H M; Green, P; Toohill, J; Gamble, J; Creedy, D K; Fenwick, J
2016-11-21
Fear of childbirth has negative consequences for a woman's physical and emotional wellbeing. The most commonly used measurement tool for childbirth fear is the Wijma Delivery Expectancy Questionnaire (WDEQ-A). Although originally conceptualized as unidimensional, subsequent investigations have suggested it is multidimensional. This study aimed to undertake a detailed psychometric assessment of the WDEQ-A; exploring the dimensionality and identifying possible subscales that may have clinical and research utility. WDEQ-A was administered to a sample of 1410 Australian women in mid-pregnancy. The dimensionality of WDEQ-A was explored using exploratory (EFA) and confirmatory factor analysis (CFA), and Rasch analysis. EFA identified a four factor solution. CFA failed to support the unidimensional structure of the original WDEQ-A, but confirmed the four factor solution identified by EFA. Rasch analysis was used to refine the four subscales (Negative emotions: five items; Lack of positive emotions: five items; Social isolation: four items; Moment of birth: three items). Each WDEQ-A Revised subscale showed good fit to the Rasch model and adequate internal consistency reliability. The correlation between Negative emotions and Lack of positive emotions was strong, however Moment of birth and Social isolation showed much lower intercorrelations, suggesting they should not be added to create a total score. This study supports the findings of other investigations that suggest the WDEQ-A is multidimensional and should not be used in its original form. The WDEQ-A Revised may provide researchers with a more refined, psychometrically sound tool to explore the differential impact of aspects of childbirth fear.
ERIC Educational Resources Information Center
Drabinová, Adéla; Martinková, Patrícia
2017-01-01
In this article we present a general approach not relying on item response theory models (non-IRT) to detect differential item functioning (DIF) in dichotomous items with presence of guessing. The proposed nonlinear regression (NLR) procedure for DIF detection is an extension of method based on logistic regression. As a non-IRT approach, NLR can…
ERIC Educational Resources Information Center
Magis, David; De Boeck, Paul
2011-01-01
We focus on the identification of differential item functioning (DIF) when more than two groups of examinees are considered. We propose to consider items as elements of a multivariate space, where DIF items are outlying elements. Following this approach, the situation of multiple groups is a quite natural case. A robust statistics technique is…
ERIC Educational Resources Information Center
Ajeigbe, Taiwo Oluwafemi; Afolabi, Eyitayo Rufus Ifedayo
2017-01-01
This study assessed unidimensionality and occurrence of Differential Item Functioning (DIF) in Mathematics and English Language items of Osun State Qualifying Examination. The study made use of secondary data. The results showed that OSQ Mathematics (-0.094 = r = 0.236) and English Language items (-0.095 = r = 0.228) were unidimensional. Also,…
Effect of Multiple Testing Adjustment in Differential Item Functioning Detection
ERIC Educational Resources Information Center
Kim, Jihye; Oshima, T. C.
2013-01-01
In a typical differential item functioning (DIF) analysis, a significance test is conducted for each item. As a test consists of multiple items, such multiple testing may increase the possibility of making a Type I error at least once. The goal of this study was to investigate how to control a Type I error rate and power using adjustment…
ERIC Educational Resources Information Center
Zebehazy, Kim T.; Zigmond, Naomi; Zimmerman, George J.
2012-01-01
Introduction: This study investigated differential item functioning (DIF) of test items on Pennsylvania's Alternate System of Assessment (PASA) for students with visual impairments and severe cognitive disabilities and what the reasons for the differences may be. Methods: The Wilcoxon signed ranks test was used to analyze differences in the scores…
ERIC Educational Resources Information Center
Cauffman, Elizabeth; MacIntosh, Randall
2006-01-01
The juvenile justice system needs a tool that can identify and assess mental health problems among youths quickly with validity and reliability. The goal of this article is to evaluate the racial/ethnic and gender differential item functioning (DIF) of the Massachusetts Youth Screening Instrument-Second Version (MAYSI-2) using the Rasch Model.…
Reise, Steven P.; Ventura, Joseph; Keefe, Richard S. E.; Baade, Lyle E.; Gold, James M.; Green, Michael F.; Kern, Robert S.; Mesholam-Gately, Raquelle; Nuechterlein, Keith H.; Seidman, Larry J.; Bilder, Robert
2011-01-01
We conducted psychometric analyses of two interview-based measures of cognitive deficits: the 21-item Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS; Ventura et al., 2008), and the 20-item Schizophrenia Cognition Rating Scale (SCoRS; Keefe et al., 2006), which were administered on two occasions to a sample of people with schizophrenia. Traditional psychometrics, bifactor analysis, and item response theory (IRT) methods were used to explore item functioning, dimensionality, and to compare instruments. Despite containing similar item content, responses to the CGI-CogS demonstrated superior psychometric properties (e.g., higher item-intercorrelations, better spread of ratings across response categories), relative to the SCoRS. We argue that these differences arise mainly from the differential use of prompts and how the items are phrased and scored. Bifactor analysis demonstrated that although both measures capture a broad range of cognitive functioning (e.g., working memory, social cognition), the common variance on each is overwhelmingly explained by a single general factor. IRT analyses of the combined pool of 41 items showed that measurement precision is peaked in the mild to moderate range of cognitive impairment. Finally, simulated adaptive testing revealed that only about 10 to 12 items are necessary to achieve latent trait level estimates with reasonably small standard errors for most individuals. This suggests that these interview-based measures of cognitive deficits could be shortened without loss of measurement precision. PMID:21381848
ERIC Educational Resources Information Center
Steacy, Laura M.; Elleman, Amy M.; Lovett, Maureen W.; Compton, Donald L.
2016-01-01
In English, gains in decoding skill do not map directly onto increases in word reading. However, beyond the Self-Teaching Hypothesis, little is known about the transfer of decoding skills to word reading. In this study, we offer a new approach to testing specific decoding elements on transfer to word reading. To illustrate, we modeled word-reading…
Responding to Claims of Misrepresentation
ERIC Educational Resources Information Center
Santelices, Maria Veronica; Wilson, Mark
2010-01-01
In their paper "Unfair Treatment? The Case of Freedle, the SAT, and the Standardization Approach to Differential Item Functioning" (Santelices & Wilson, 2010), the authors studied claims of differential effects of the SAT on Latinos and African Americans through the methodology of differential item functioning (DIF). Previous…
The Usefulness of Differential Item Functioning Methodology in Longitudinal Intervention Studies
USDA-ARS?s Scientific Manuscript database
Perceived self-efficacy (SE) for engaging in physical activity (PA) is a key variable mediating PA change in interventions. The purpose of this study is to demonstrate the usefulness of item response modeling-based (IRM) differential item functioning (DIF) in the investigation of group differences ...
DIFAS: Differential Item Functioning Analysis System. Computer Program Exchange
ERIC Educational Resources Information Center
Penfield, Randall D.
2005-01-01
Differential item functioning (DIF) is an important consideration in assessing the validity of test scores (Camilli & Shepard, 1994). A variety of statistical procedures have been developed to assess DIF in tests of dichotomous (Hills, 1989; Millsap & Everson, 1993) and polytomous (Penfield & Lam, 2000; Potenza & Dorans, 1995) items. Some of these…
ERIC Educational Resources Information Center
Beinicke, Andrea; Pässler, Katja; Hell, Benedikt
2014-01-01
The study investigates consequences of eliminating items showing gender-specific differential item functioning (DIF) on the psychometric structure of a standard RIASEC interest inventory. Holland's hexagonal model was tested for structural invariance using a confirmatory methodological approach (confirmatory factor analysis and randomization…
NASA Astrophysics Data System (ADS)
Linn, Marcia C.; de Benedictis, Tina; Delucchi, Kevin; Harris, Abigail; Stage, Elizabeth
The National Assessment of Educational Progress Science Assessment has consistently revealed small gender differences on science content items but not on science inquiry items. This assessment differs from others in that respondents can choose I don't know rather than guessing. This paper examines explanations for the gender differences including (a) differential prior instruction, (b) differential response to uncertainty and use of the I don't know response, (c) differential response to figurally presented items, and (d) different attitudes towards science. Of these possible explanations, the first two received support. Females are more likely to use the I don't know response, especially for items with physical science content or masculine themes such as football. To ameliorate this situation we need more effective science instruction and more gender-neutral assessment items.
Evaluating linguistic equivalence of patient-reported outcomes in a cancer clinical trial.
Hahn, Elizabeth A; Bode, Rita K; Du, Hongyan; Cella, David
2006-01-01
In order to make meaningful cross-cultural or cross-linguistic comparisons of health-related quality of life (HRQL) or to pool international research data, it is essential to create unbiased measures that can detect clinically important differences. When HRQL scores differ between cultural/linguistic groups, it is important to determine whether this reflects real group differences, or is the result of systematic measurement variability. To investigate the linguistic measurement equivalence of a cancer-specific HRQL questionnaire, and to conduct a sensitivity analysis of treatment differences in HRQL in a clinical trial. Patients with newly diagnosed chronic myelogenous leukemia (n = 1049) completed serial HRQL assessments in an international Phase III trial. Two types of differential item functioning (uniform and non-uniform) were evaluated using item response theory and classical test theory approaches. A sensitivity analysis was conducted to compare HRQL between treatment arms using items without evidence of differential functioning. Among 27 items, nine (33%) did not exhibit any evidence of differential functioning in both linguistic comparisons (English versus French, English versus German). Although 18 items functioned differently, there was no evidence of systematic bias. In a sensitivity analysis, adjustment for differential functioning affected the magnitude, but not the direction or interpretation of clinical trial treatment arm differences. Sufficient sample sizes were available for only three of the eight language groups. Identification of differential functioning in two-thirds of the items suggests that current psychometric methods may be too sensitive. Enhanced methodologies are needed to differentiate trivial from substantive differential item functioning. Systematic variability in HRQL across different groups can be evaluated for its effect upon clinical trial results; a practice recommended when data are pooled across cultural or linguistic groups to make conclusions about treatment effects.
ERIC Educational Resources Information Center
Ayodele, Alicia Nicole
2017-01-01
Within polytomous items, differential item functioning (DIF) can take on various forms due to the number of response categories. The lack of invariance at this level is referred to as differential step functioning (DSF). The most common DSF methods in the literature are the adjacent category log odds ratio (AC-LOR) estimator and cumulative…
ERIC Educational Resources Information Center
Tan, Xuan; Xiang, Bihua; Dorans, Neil J.; Qu, Yanxuan
2010-01-01
The nature of the matching criterion (usually the total score) in the study of differential item functioning (DIF) has been shown to impact the accuracy of different DIF detection procedures. One of the topics related to the nature of the matching criterion is whether the studied item should be included. Although many studies exist that suggest…
ERIC Educational Resources Information Center
Penfield, Randall D.; Giacobbi, Peter R., Jr.; Myers, Nicholas D.
2007-01-01
One aspect of construct validity is the extent to which the measurement properties of a rating scale are invariant across the groups being compared. An increasingly used method for assessing between-group differences in the measurement properties of items of a scale is the framework of differential item functioning (DIF). In this paper we…
ERIC Educational Resources Information Center
Shih, Ching-Lin; Wang, Wen-Chung
2009-01-01
The multiple indicators, multiple causes (MIMIC) method with a pure short anchor was proposed to detect differential item functioning (DIF). A simulation study showed that the MIMIC method with an anchor of 1, 2, 4, or 10 DIF-free items yielded a well-controlled Type I error rate even when such tests contained as many as 40% DIF items. In general,…
ERIC Educational Resources Information Center
Qi, Cathy Huaqing; Marley, Scott C.
2009-01-01
The study examined whether item bias is present in the "Preschool Language Scale-4" (PLS-4). Participants were 440 children (3-5 years old; 86% English-speaking Hispanic and 14% European American) who were enrolled in Head Start programs. The PLS-4 items were analyzed for differential item functioning (DIF) using logistic regression and…
Deng, Nina; Anatchkova, Milena D; Waring, Molly E; Han, Kyung T; Ware, John E
2015-08-01
The Quality-of-life (QOL) Disease Impact Scale (QDIS(®)) standardizes the content and scoring of QOL impact attributed to different diseases using item response theory (IRT). This study examined the IRT invariance of the QDIS-standardized IRT parameters in an independent sample. The differential functioning of items and test (DFIT) of a static short-form (QDIS-7) was examined across two independent sources: patients hospitalized for acute coronary syndrome (ACS) in the TRACE-CORE study (N = 1,544) and chronically ill US adults in the QDIS standardization sample. "ACS-specific" IRT item parameters were calibrated and linearly transformed to compare to "standardized" IRT item parameters. Differences in IRT model-expected item, scale and theta scores were examined. The DFIT results were also compared in a standard logistic regression differential item functioning analysis. Item parameters estimated in the ACS sample showed lower discrimination parameters than the standardized discrimination parameters, but only small differences were found for thresholds parameters. In DFIT, results on the non-compensatory differential item functioning index (range 0.005-0.074) were all below the threshold of 0.096. Item differences were further canceled out at the scale level. IRT-based theta scores for ACS patients using standardized and ACS-specific item parameters were highly correlated (r = 0.995, root-mean-square difference = 0.09). Using standardized item parameters, ACS patients scored one-half standard deviation higher (indicating greater QOL impact) compared to chronically ill adults in the standardization sample. The study showed sufficient IRT invariance to warrant the use of standardized IRT scoring of QDIS-7 for studies comparing the QOL impact attributed to acute coronary disease and other chronic conditions.
Effect Size Measures for Differential Item Functioning in a Multidimensional IRT Model
ERIC Educational Resources Information Center
Suh, Youngsuk
2016-01-01
This study adapted an effect size measure used for studying differential item functioning (DIF) in unidimensional tests and extended the measure to multidimensional tests. Two effect size measures were considered in a multidimensional item response theory model: signed weighted P-difference and unsigned weighted P-difference. The performance of…
Explaining Crossing DIF in Polytomous Items Using Differential Step Functioning Effects
ERIC Educational Resources Information Center
Penfield, Randall D.
2010-01-01
Crossing, or intersecting, differential item functioning (DIF) is a form of nonuniform DIF that exists when the sign of the between-group difference in expected item performance changes across the latent trait continuum. The presence of crossing DIF presents a problem for many statistics developed for evaluating DIF because positive and negative…
Decisions that Make a Difference in Detecting Differential Item Functioning
ERIC Educational Resources Information Center
Sireci, Stephen G.; Rios, Joseph A.
2013-01-01
There are numerous statistical procedures for detecting items that function differently across subgroups of examinees that take a test or survey. However, in endeavouring to detect items that may function differentially, selection of the statistical method is only one of many important decisions. In this article, we discuss the important decisions…
ERIC Educational Resources Information Center
French, Brian F.; Maller, Susan J.
2007-01-01
Two unresolved implementation issues with logistic regression (LR) for differential item functioning (DIF) detection include ability purification and effect size use. Purification is suggested to control inaccuracies in DIF detection as a result of DIF items in the ability estimate. Additionally, effect size use may be beneficial in controlling…
Testing for Differential Item Functioning with Measures of Partial Association
ERIC Educational Resources Information Center
Woods, Carol M.
2009-01-01
Differential item functioning (DIF) occurs when an item on a test or questionnaire has different measurement properties for one group of people versus another, irrespective of mean differences on the construct. There are many methods available for DIF assessment. The present article is focused on indices of partial association. A family of average…
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning
ERIC Educational Resources Information Center
Finch, W. Holmes
2011-01-01
Missing information is a ubiquitous aspect of data analysis, including responses to items on cognitive and affective instruments. Although the broader statistical literature describes missing data methods, relatively little work has focused on this issue in the context of differential item functioning (DIF) detection. Such prior research has…
Testing for Nonuniform Differential Item Functioning with Multiple Indicator Multiple Cause Models
ERIC Educational Resources Information Center
Woods, Carol M.; Grimm, Kevin J.
2011-01-01
In extant literature, multiple indicator multiple cause (MIMIC) models have been presented for identifying items that display uniform differential item functioning (DIF) only, not nonuniform DIF. This article addresses, for apparently the first time, the use of MIMIC models for testing both uniform and nonuniform DIF with categorical indicators. A…
Mitchell, Alex J; Smith, Adam B; Al-salihy, Zerak; Rahim, Twana A; Mahmud, Mahmud Q; Muhyaldin, Asma S
2011-10-01
We aimed to redefine the optimal self-report symptoms of depression suitable for creation of an item bank that could be used in computer adaptive testing or to develop a simplified screening tool for DSM-V. Four hundred subjects (200 patients with primary depression and 200 non-depressed subjects), living in Iraqi Kurdistan were interviewed. The Mini International Neuropsychiatric Interview (MINI) was used to define the presence of major depression (DSM-IV criteria). We examined symptoms of depression using four well-known scales delivered in Kurdish. The Partial Credit Model was applied to each instrument. Common-item equating was subsequently used to create an item bank and differential item functioning (DIF) explored for known subgroups. A symptom level Rasch analysis reduced the original 45 items to 24 items of the original after the exclusion of 21 misfitting items. A further six items (CESD13 and CESD17, HADS-D4, HADS-D5 and HADS-D7, and CDSS3 and CDSS4) were removed due to misfit as the items were added together to form the item bank, and two items were subsequently removed following the DIF analysis by diagnosis (CESD20 and CDSS9, both of which were harder to endorse for women). Therefore the remaining optimal item bank consisted of 17 items and produced an area under the curve (AUC) of 0.987. Using a bank restricted to the optimal nine items revealed only minor loss of accuracy (AUC = 0.989, sensitivity 96%, specificity 95%). Finally, when restricted to only four items accuracy was still high (AUC was still 0.976; sensitivity 93%, specificity 96%). An item bank of 17 items may be useful in computer adaptive testing and nine or even four items may be used to develop a simplified screening tool for DSM-V major depressive disorder (MDD). Further examination of this item bank should be conducted in different cultural settings.
Staff Differentiation. An Annotated Bibliography.
ERIC Educational Resources Information Center
Marin County Superintendent of Schools, Corte Madera, CA.
This annotated bibliography reviews selected literature focusing on the concept of staff differentiation. Included are 62 items (dated 1966-1970), along with a list of mailing addresses where copies of individual items can be obtained. Also a list of 31 staff differentiation projects receiving financial assistance from the U.S. Office of Education…
Detection of Uniform and Nonuniform Differential Item Functioning by Item-Focused Trees
ERIC Educational Resources Information Center
Berger, Moritz; Tutz, Gerhard
2016-01-01
Detection of differential item functioning (DIF) by use of the logistic modeling approach has a long tradition. One big advantage of the approach is that it can be used to investigate nonuniform (NUDIF) as well as uniform DIF (UDIF). The classical approach allows one to detect DIF by distinguishing between multiple groups. We propose an…
ERIC Educational Resources Information Center
Dimitrov, Dimiter M.
2017-01-01
This article offers an approach to examining differential item functioning (DIF) under its item response theory (IRT) treatment in the framework of confirmatory factor analysis (CFA). The approach is based on integrating IRT- and CFA-based testing of DIF and using bias-corrected bootstrap confidence intervals with a syntax code in Mplus.
ERIC Educational Resources Information Center
Gomez, Rapson
2012-01-01
Objective: Generalized partial credit model, which is based on item response theory (IRT), was used to test differential item functioning (DIF) for the "Diagnostic and Statistical Manual of Mental Disorders" (4th ed.), inattention (IA), and hyperactivity/impulsivity (HI) symptoms across boys and girls. Method: To accomplish this, parents completed…
Stepwise Analysis of Differential Item Functioning Based on Multiple-Group Partial Credit Model.
ERIC Educational Resources Information Center
Muraki, Eiji
1999-01-01
Extended an Item Response Theory (IRT) method for detection of differential item functioning to the partial credit model and applied the method to simulated data using a stepwise procedure. Then applied the stepwise DIF analysis based on the multiple-group partial credit model to writing trend data from the National Assessment of Educational…
Effects of Differential Item Functioning on Examinees' Test Performance and Reliability of Test
ERIC Educational Resources Information Center
Lee, Yi-Hsuan; Zhang, Jinming
2017-01-01
Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…
Assessment of Differential Item Functioning under Cognitive Diagnosis Models: The DINA Model Example
ERIC Educational Resources Information Center
Li, Xiaomin; Wang, Wen-Chung
2015-01-01
The assessment of differential item functioning (DIF) is routinely conducted to ensure test fairness and validity. Although many DIF assessment methods have been developed in the context of classical test theory and item response theory, they are not applicable for cognitive diagnosis models (CDMs), as the underlying latent attributes of CDMs are…
ERIC Educational Resources Information Center
Sachse, Karoline A.; Haag, Nicole
2017-01-01
Standard errors computed according to the operational practices of international large-scale assessment studies such as the Programme for International Student Assessment's (PISA) or the Trends in International Mathematics and Science Study (TIMSS) may be biased when cross-national differential item functioning (DIF) and item parameter drift are…
ERIC Educational Resources Information Center
Martinková, Patricia; Drabinová, Adéla; Liaw, Yuan-Ling; Sanders, Elizabeth A.; McFarland, Jenny L.; Price, Rebecca M.
2017-01-01
We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments. After explaining a number of methodological approaches, we test for gender bias in two scenarios that demonstrate why DIF analysis is crucial for developing assessments, particularly because…
ERIC Educational Resources Information Center
Doolittle, Allen E.
Differential item performance (DIP) is discussed as a concept that does not necessarily imply item bias or unfairness to subgroups of examinees. With curriculum-based achievement tests, DIP is presented as a valid reflection of group differences in requisite skills and instruction. Using data from a national testing of the ACT Assessment, this…
Martinez-Martin, Pablo; Manuel Rojo-Abuin, Jose; Rizos, Alexandra; Rodriguez-Blazquez, Carmen; Trenkwalder, Claudia; Perkins, Lauren; Sauerbier, Anna; Odin, Per; Antonini, Angelo; Chaudhuri, Kallol Ray
2017-01-01
In Parkinson's disease, pain is a prevalent and complex symptom of diverse origin. King's Parkinson's disease pain scale, assesses different pain syndromes, thus allowing exploration of its differential prevalence and influence on the health-related quality of life of patients. Post hoc study 178 patients and 83 matched controls participating in the King's Parkinson's disease pain scale validation study were used. For determining the respective distribution, King's Parkinson's disease pain scale items and domains scores = 0 meant absence and ≥1 presence of the symptom. The regular scores were used for the other analyses. Health-related quality of lifewas evaluated with EQ-5D-3L and PDQ-8 questionnaires. Parkinson's disease patients experienced more pain modalities than controls. In patients, Pain around joints (King's Parkinson's disease pain scale item 1) and Pain while turning in bed (item 8) were the most prevalent types of pain, whereas Burning mouth syndrome (item 11) and Pain due to grinding teeth (item 10) showed the lowest frequency. The total number of experienced pain modalities closely correlated with the PDQ-8 index, but not with other variables. For all pain types except Pain around joints (item 1) and pain related to Periodic leg movements/RLS (item 7), patients with pain had significantly worse health-related quality of life. The influence of pain, as a whole, on the health-related quality of life was not remarkable after adjustment by other variables. When the particular types of pain were considered, adjusted by sex, age, and Parkinson's disease duration, pain determinants were different for EQ-5D-3L and PDQ-8. King's Parkinson's disease pain scale allows exploring the distribution of the diverse syndromic pain occurring in Parkinson's disease and its association with health-related quality of life.
McFadden, Estelle; Horton, Mike C; Ford, Helen L; Gilworth, Gill; McFadden, Majella; Tennant, Alan
2012-06-01
Multiple sclerosis (MS) mainly presents amongst those of working age. Depending upon the type of MS, many people embark upon a long period of managing their day-to-day work-related needs in the face of intermittent and sometimes persistent disabling symptoms. The objective of this study was to explore the concept of work instability (WI) following the onset of MS and develop a Work Instability Scale (WIS) specific to this population. WI amongst those with MS in work was explored through qualitative interviews which were then used to generate items for a WIS. Rasch analysis was used to refine the scaling properties of the MS-WIS, which was then validated against expert vocational assessment by occupational health physiotherapists and ergonomists. The resulting measure is a 22-item, self-administered scale which can be scored in three bands indicating low, medium and high risk of WI (job retention) problems. The scale meets modern psychometric requirements for measurement, indicated by adequate fit to the Rasch model with absence of local dependency and differential item functioning (DIF) by age, gender and hours worked. The scale presents an opportunity in routine clinical practice to take positive action to reduce sickness absence and prevent job loss.
A Monte Carlo Study Investigating Missing Data, Differential Item Functioning, and Effect Size
ERIC Educational Resources Information Center
Garrett, Phyllis
2009-01-01
The use of polytomous items in assessments has increased over the years, and as a result, the validity of these assessments has been a concern. Differential item functioning (DIF) and missing data are two factors that may adversely affect assessment validity. Both factors have been studied separately, but DIF and missing data are likely to occur…
Differential Item Functioning Analysis of the Mental, Emotional, and Bodily Toughness Inventory
ERIC Educational Resources Information Center
Gao, Yong; Mack, Mick G.; Ragan, Moira A.; Ragan, Brian
2012-01-01
In this study the authors used differential item functioning analysis to examine if there were items in the Mental, Emotional, and Bodily Toughness Inventory functioning differently across gender and athletic membership. A total of 444 male (56.3%) and female (43.7%) participants (30.9% athletes and 69.1% non-athletes) responded to the Mental,…
ERIC Educational Resources Information Center
Magis, David; Raiche, Gilles; Beland, Sebastien; Gerard, Paul
2011-01-01
We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual framework of a single focal group, we propose a general approach to estimate the item response functions in each group and to test for the presence…
NASA Astrophysics Data System (ADS)
Roth, Wolff-Michael; Oliveri, Maria Elena; Dallie Sandilands, Debra; Lyons-Thomas, Juliette; Ercikan, Kadriye
2013-03-01
Even if national and international assessments are designed to be comparable, subsequent psychometric analyses often reveal differential item functioning (DIF). Central to achieving comparability is to examine the presence of DIF, and if DIF is found, to investigate its sources to ensure differentially functioning items that do not lead to bias. In this study, sources of DIF were examined using think-aloud protocols. The think-aloud protocols of expert reviewers were conducted for comparing the English and French versions of 40 items previously identified as DIF (N = 20) and non-DIF (N = 20). Three highly trained and experienced experts in verifying and accepting/rejecting multi-lingual versions of curriculum and testing materials for government purposes participated in this study. Although there is a considerable amount of agreement in the identification of differentially functioning items, experts do not consistently identify and distinguish DIF and non-DIF items. Our analyses of the think-aloud protocols identified particular linguistic, general pedagogical, content-related, and cognitive factors related to sources of DIF. Implications are provided for the process of arriving at the identification of DIF, prior to the actual administration of tests at national and international levels.
Differential item functioning magnitude and impact measures from item response theory models.
Kleinman, Marjorie; Teresi, Jeanne A
2016-01-01
Measures of magnitude and impact of differential item functioning (DIF) at the item and scale level, respectively are presented and reviewed in this paper. Most measures are based on item response theory models. Magnitude refers to item level effect sizes, whereas impact refers to differences between groups at the scale score level. Reviewed are magnitude measures based on group differences in the expected item scores and impact measures based on differences in the expected scale scores. The similarities among these indices are demonstrated. Various software packages are described that provide magnitude and impact measures, and new software presented that computes all of the available statistics conveniently in one program with explanations of their relationships to one another.
The co-occurrence of PTSD and dissociation: differentiating severe PTSD from dissociative-PTSD.
Armour, Cherie; Karstoft, Karen-Inge; Richardson, J Don
2014-08-01
A dissociative-posttraumatic stress disorder (PTSD) subtype has been included in the DSM-5. However, it is not yet clear whether certain socio-demographic characteristics or psychological/clinical constructs such as comorbid psychopathology differentiate between severe PTSD and dissociative-PTSD. The current study investigated the existence of a dissociative-PTSD subtype and explored whether a number of trauma and clinical covariates could differentiate between severe PTSD alone and dissociative-PTSD. The current study utilized a sample of 432 treatment seeking Canadian military veterans. Participants were assessed with the Clinician Administered PTSD Scale (CAPS) and self-report measures of traumatic life events, depression, and anxiety. CAPS severity scores were created reflecting the sum of the frequency and intensity items from each of the 17 PTSD and 3 dissociation items. The CAPS severity scores were used as indicators in a latent profile analysis (LPA) to investigate the existence of a dissociative-PTSD subtype. Subsequently, several covariates were added to the model to explore differences between severe PTSD alone and dissociative-PTSD. The LPA identified five classes: one of which constituted a severe PTSD group (30.5 %), and one of which constituted a dissociative-PTSD group (13.7 %). None of the included, demographic, trauma, or clinical covariates were significantly predictive of membership in the dissociative-PTSD group compared to the severe PTSD group. In conclusion, a significant proportion of individuals report high levels of dissociation alongside their PTSD, which constitutes a dissociative-PTSD subtype. Further investigation is needed to identify which factors may increase or decrease the likelihood of membership in a dissociative-PTSD subtype group compared to a severe PTSD only group.
Adjusting for cross-cultural differences in computer-adaptive tests of quality of life.
Gibbons, C J; Skevington, S M
2018-04-01
Previous studies using the WHOQOL measures have demonstrated that the relationship between individual items and the underlying quality of life (QoL) construct may differ between cultures. If unaccounted for, these differing relationships can lead to measurement bias which, in turn, can undermine the reliability of results. We used item response theory (IRT) to assess differential item functioning (DIF) in WHOQOL data from diverse language versions collected in UK, Zimbabwe, Russia, and India (total N = 1332). Data were fitted to the partial credit 'Rasch' model. We used four item banks previously derived from the WHOQOL-100 measure, which provided excellent measurement for physical, psychological, social, and environmental quality of life domains (40 items overall). Cross-cultural differential item functioning was assessed using analysis of variance for item residuals and post hoc Tukey tests. Simulated computer-adaptive tests (CATs) were conducted to assess the efficiency and precision of the four items banks. Splitting item parameters by DIF results in four linked item banks without DIF or other breaches of IRT model assumptions. Simulated CATs were more precise and efficient than longer paper-based alternatives. Assessing differential item functioning using item response theory can identify measurement invariance between cultures which, if uncontrolled, may undermine accurate comparisons in computer-adaptive testing assessments of QoL. We demonstrate how compensating for DIF using item anchoring allowed data from all four countries to be compared on a common metric, thus facilitating assessments which were both sensitive to cultural nuance and comparable between countries.
Item-focussed Trees for the Identification of Items in Differential Item Functioning.
Tutz, Gerhard; Berger, Moritz
2016-09-01
A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method.
Paz, Sylvia H; Spritzer, Karen L; Reise, Steven P; Hays, Ron D
2017-06-01
About 70% of Latinos, 5 years old or older, in the United States speak Spanish at home. Measurement equivalence of the PROMIS ® pain interference (PI) item bank by language of administration (English versus Spanish) has not been evaluated. A sample of 527 adult Spanish-speaking Latinos completed the Spanish version of the 41-item PROMIS ® pain interference item bank. We evaluate dimensionality, monotonicity and local independence of the Spanish-language items. Then we evaluate differential item functioning (DIF) using ordinal logistic regression with item response theory scores estimated from DIF-free "anchor" items. One of the 41 items in the Spanish version of the PROMIS ® PI item bank was identified as having significant uniform DIF. English- and Spanish-speaking subjects with the same level of pain interference responded differently to 1 of the 41 items in the PROMIS ® PI item bank. This item was not retained due to proprietary issues. The original English language item parameters can be used when estimating PROMIS ® PI scores.
Devine, J; Otto, C; Rose, M; Barthel, D; Fischer, F; Mühlan, H; Mülhan, H; Nolte, S; Schmidt, S; Ottova-Jordan, V; Ravens-Sieberer, U
2015-04-01
Assessing health-related quality of life (HRQoL) via Computerized Adaptive Tests (CAT) provides greater measurement precision coupled with a lower test burden compared to conventional tests. Currently, there are no European pediatric HRQoL CATs available. This manuscript aims at describing the development of a HRQoL CAT for children and adolescents: the Kids-CAT, which was developed based on the established KIDSCREEN-27 HRQoL domain structure. The Kids-CAT was developed combining classical test theory and item response theory methods and using large archival data of European KIDSCREEN norm studies (n = 10,577-19,580). Methods were applied in line with the US PROMIS project. Item bank development included the investigation of unidimensionality, local independence, exploration of Differential Item Functioning (DIF), evaluation of Item Response Curves (IRCs), estimation and norming of item parameters as well as first CAT simulations. The Kids-CAT was successfully built covering five item banks (with 26-46 items each) to measure physical well-being, psychological well-being, parent relations, social support and peers, and school well-being. The Kids-CAT item banks proved excellent psychometric properties: high content validity, unidimensionality, local independence, low DIF, and model conform IRCs. In CAT simulations, seven items were needed to achieve a measurement precision between .8 and .9 (reliability). It has a child-friendly design, is easy accessible online and gives immediate feedback reports of scores. The Kids-CAT has the potential to advance pediatric HRQoL measurement by making it less burdensome and enhancing the patient-doctor communication.
Estabrook, Ryne; Sadler, Michael E; McGue, Matt
2015-12-01
A long-standing and critical problem in the study of aging and depression is the comparability of measurement across age groups. While psychological measures of depression typically show increased incidence of symptoms with increasing age, rates of depression diagnosis do not show the same age trend. This analysis presents tests of differential item functioning on the depression section of the CAMDEX interview schedule, using factor analysis-derived affective and somatic subscales (McGue & Christensen, 1997). Results for the affective subscale show significant differences in item functioning in the majority of the affective items as a function of age (items "Happy Life," "Lonely," "Nervous" "Worthless," and "Future": χ6(2) = [30.193, 255.971] across items, all p < .0001). Analyses for the somatic subscale show differential item functioning is limited to a single item relating to coping (χ6(2) = 180.754, p < .0001). These results indicate that differences in depression symptoms across age groups are not entirely consistent with a unidimensional depression trait, and that the measurement structure of depression varies over the life span. (c) 2015 APA, all rights reserved).
ERIC Educational Resources Information Center
Ercikan, Kadriye; Arim, Rubab; Law, Danielle; Domene, Jose; Gagnon, France; Lacroix, Serge
2010-01-01
This paper demonstrates and discusses the use of think aloud protocols (TAPs) as an approach for examining and confirming sources of differential item functioning (DIF). The TAPs are used to investigate to what extent surface characteristics of the items that are identified by expert reviews as sources of DIF are supported by empirical evidence…
ERIC Educational Resources Information Center
Gonzalez-Roma, Vicente; Tomas, Ines; Ferreres, Doris; Hernandez, Ana
2005-01-01
The aims of this study were to investigate whether the 6 items of the Physical Appearance Scale (Marsh, Richards, Johnson, Roche, & Tremayne, 1994) show differential item functioning (DIF) across gender groups of adolescents, and to show how this can be done using the multigroup mean and covariance structure (MG-MACS) analysis model. Two samples…
ERIC Educational Resources Information Center
Moses, Tim; Miao, Jing; Dorans, Neil
2010-01-01
This study compared the accuracies of four differential item functioning (DIF) estimation methods, where each method makes use of only one of the following: raw data, logistic regression, loglinear models, or kernel smoothing. The major focus was on the estimation strategies' potential for estimating score-level, conditional DIF. A secondary focus…
Vegetable parenting practices scale. Item response modeling analyses
Chen, Tzu-An; O’Connor, Teresia; Hughes, Sheryl; Beltran, Alicia; Baranowski, Janice; Diep, Cassandra; Baranowski, Tom
2015-01-01
Objective To evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We also tested for differences in the ways item function (called differential item functioning) across child’s gender, ethnicity, age, and household income groups. Method Parents of 3–5 year old children completed a self-reported vegetable parenting practices scale online. Vegetable parenting practices consisted of 14 effective vegetable parenting practices and 12 ineffective vegetable parenting practices items, each with three subscales (responsiveness, structure, and control). Multidimensional polytomous item response modeling was conducted separately on effective vegetable parenting practices and ineffective vegetable parenting practices. Results One effective vegetable parenting practice item did not fit the model well in the full sample or across demographic groups, and another was a misfit in differential item functioning analyses across child’s gender. Significant differential item functioning was detected across children’s age and ethnicity groups, and more among effective vegetable parenting practices than ineffective vegetable parenting practices items. Wright maps showed items only covered parts of the latent trait distribution. The harder- and easier-to-respond ends of the construct were not covered by items for effective vegetable parenting practices and ineffective vegetable parenting practices, respectively. Conclusions Several effective vegetable parenting practices and ineffective vegetable parenting practices scale items functioned differently on the basis of child’s demographic characteristics; therefore, researchers should use these vegetable parenting practices scales with caution. Item response modeling should be incorporated in analyses of parenting practice questionnaires to better assess differences across demographic characteristics. PMID:25895694
Paz, Sylvia H; Spritzer, Karen L; Morales, Leo S; Hays, Ron D
2013-03-29
To evaluate the equivalence of the PROMIS® wave 1 physical functioning item bank, by age (50 years or older versus 18-49). A total of 114 physical functioning items with 5 response choices were administered to English- (n=1504) and Spanish-language (n=640) adults. Item frequencies, means and standard deviations, item-scale correlations, and internal consistency reliability were estimated. Differential Item Functioning (DIF) by age was evaluated. Thirty of the 114 items were fagged for DIF based on an R-squared of 0.02 or above criterion. The expected total score was higher for those respondents who were 18-49 than those who were 50 or older. Those who were 50 years or older versus 18-49 years old with the same level of physical functioning responded differently to 30 of the 114 items in the PROMIS® physical functioning item bank. This study yields essential information about the equivalence of the physical functioning items in older versus younger individuals.
Item Purification Does Not Always Improve DIF Detection: A Counterexample with Angoff's Delta Plot
ERIC Educational Resources Information Center
Magis, David; Facon, Bruno
2013-01-01
Item purification is an iterative process that is often advocated as improving the identification of items affected by differential item functioning (DIF). With test-score-based DIF detection methods, item purification iteratively removes the items currently flagged as DIF from the test scores to get purified sets of items, unaffected by DIF. The…
ERIC Educational Resources Information Center
Kostin, Irene
2004-01-01
The purpose of this study is to explore the relationship between a set of item characteristics and the difficulty of TOEFL[R] dialogue items. Identifying characteristics that are related to item difficulty has the potential to improve the efficiency of the item-writing process The study employed 365 TOEFL dialogue items, which were coded on 49…
ERIC Educational Resources Information Center
Hidalgo, Mª Dolores; Gómez-Benito, Juana; Zumbo, Bruno D.
2014-01-01
The authors analyze the effectiveness of the R[superscript 2] and delta log odds ratio effect size measures when using logistic regression analysis to detect differential item functioning (DIF) in dichotomous items. A simulation study was carried out, and the Type I error rate and power estimates under conditions in which only statistical testing…
Mackus, Marlou; Kruijff, Deborah de; Otten, Leila S; Kraneveld, Aletta D; Garssen, Johan; Verster, Joris C
2017-04-12
Altered immune functioning has been demonstrated in individuals with autism spectrum disorder (ASD). The current study explores the relationship between perceived immune functioning and experiencing ASD traits in healthy young adults. N = 410 students from Utrecht University completed a survey on immune functioning and autistic traits. In addition to a 1-item perceived immune functioning rating, the Immune Function Questionnaire (IFQ) was completed to assess perceived immune functioning. The Dutch translation of the Autism-Spectrum Quotient (AQ) was completed to examine variation in autistic traits, including the domains "social insights and behavior", "difficulties with change", "communication", "phantasy and imagination", and "detail orientation". The 1-item perceived immune functioning score did not significantly correlate with the total AQ score. However, a significant negative correlation was found between perceived immune functioning and the AQ subscale "difficulties with change" (r = -0.119, p = 0.019). In women, 1-item perceived immune functioning correlated significantly with the AQ subscales "difficulties with change" (r = -0.149, p = 0.029) and "communication" (r = -0.145, p = 0.032). In men, none of the AQ subscales significantly correlated with 1-item perceived immune functioning. In conclusion, a modest relationship between perceived immune functioning and several autistic traits was found.
Bjorner, Jakob Bue; Pejtersen, Jan Hyld
2010-02-01
To evaluate the construct validity of the Copenhagen Psychosocial Questionnaire II (COPSOQ II) by means of tests for differential item functioning (DIF) and differential item effect (DIE). We used a Danish general population postal survey (n = 4,732 with 3,517 wage earners) with a one-year register based follow up for long-term sickness absence. DIF was evaluated against age, gender, education, social class, public/private sector employment, and job type using ordinal logistic regression. DIE was evaluated against job satisfaction and self-rated health (using ordinal logistic regression), against depressive symptoms, burnout, and stress (using multiple linear regression), and against long-term sick leave (using a proportional hazards model). We used a cross-validation approach to counter the risk of significant results due to multiple testing. Out of 1,052 tests, we found 599 significant instances of DIF/DIE, 69 of which showed both practical and statistical significance across two independent samples. Most DIF occurred for job type (in 20 cases), while we found little DIF for age, gender, education, social class and sector. DIE seemed to pertain to particular items, which showed DIE in the same direction for several outcome variables. The results allowed a preliminary identification of items that have a positive impact on construct validity and items that have negative impact on construct validity. These results can be used to develop better shortform measures and to improve the conceptual framework, items and scales of the COPSOQ II. We conclude that tests of DIF and DIE are useful for evaluating construct validity.
ERIC Educational Resources Information Center
Chalmers, R. Philip; Counsell, Alyssa; Flora, David B.
2016-01-01
Differential test functioning, or DTF, occurs when one or more items in a test demonstrate differential item functioning (DIF) and the aggregate of these effects are witnessed at the test level. In many applications, DTF can be more important than DIF when the overall effects of DIF at the test level can be quantified. However, optimal statistical…
Weeks, Clinton S; Humphreys, Michael S; Cornwell, T Bettina
2018-02-01
Brands engaged in sponsorship of events commonly have objectives that depend on consumer memory for the sponsor-event relationship (e.g., sponsorship awareness). Consumers however, often misattribute sponsorships to nonsponsor competitor brands, indicating erroneous memory for these relationships. The current research uses an item and relational memory framework to reveal sponsor brands may inadvertently foster this misattribution when they communicate relational linkages to events. Effects can be explained via differential roles of communicating item information (information that supports processing item distinctiveness) versus relational information (information that supports processing relationships among items) in contributing to memory outcomes. Experiment 1 uses event-cued brand recall to show that correct memory retrieval is best supported by communicating relational information when sponsorship relationships are not obvious (low congruence). In contrast, correct retrieval is best supported by communicating item information when relationships are obvious (high congruence). Experiment 2 uses brand-cued event recall to show that, against conventional marketing recommendations, relational information increases misattribution, whereas item information guards against misattribution. Results suggest sponsor brands must distinguish between item and relational communications to enhance correct retrieval and limit misattribution. Methodologically, the work shows that choice of cueing direction is critical in differentially revealing patterns of correct and incorrect retrieval with pair relationships. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
A Multilevel Assessment of Differential Item Functioning.
ERIC Educational Resources Information Center
Shen, Linjun
A multilevel approach was proposed for the assessment of differential item functioning and compared with the traditional logistic regression approach. Data from the Comprehensive Osteopathic Medical Licensing Examination for 2,300 freshman osteopathic medical students were analyzed. The multilevel approach used three-level hierarchical generalized…
ERIC Educational Resources Information Center
Zwick, Rebecca
2012-01-01
Differential item functioning (DIF) analysis is a key component in the evaluation of the fairness and validity of educational tests. The goal of this project was to review the status of ETS DIF analysis procedures, focusing on three aspects: (a) the nature and stringency of the statistical rules used to flag items, (b) the minimum sample size…
Effects of Item Exposure for Conventional Examinations in a Continuous Testing Environment.
ERIC Educational Resources Information Center
Hertz, Norman R.; Chinn, Roberta N.
This study explored the effect of item exposure on two conventional examinations administered as computer-based tests. A principal hypothesis was that item exposure would have little or no effect on average difficulty of the items over the course of an administrative cycle. This hypothesis was tested by exploring conventional item statistics and…
ERIC Educational Resources Information Center
Lee, Woo-yeol; Cho, Sun-Joo
2017-01-01
Cross-level invariance in a multilevel item response model can be investigated by testing whether the within-level item discriminations are equal to the between-level item discriminations. Testing the cross-level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model…
An NCME Instructional Module on Latent DIF Analysis Using Mixture Item Response Models
ERIC Educational Resources Information Center
Cho, Sun-Joo; Suh, Youngsuk; Lee, Woo-yeol
2016-01-01
The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called…
Gender-Related Differential Item Functioning on a Middle-School Mathematics Performance Assessment.
ERIC Educational Resources Information Center
Lane, Suzanne; And Others
This study examined gender-related differential item functioning (DIF) using a mathematics performance assessment, the QUASAR Cognitive Assessment Instrument (QCAI), administered to middle school students. The QCAI was developed for the Quantitative Understanding: Amplifying Student Achievement and Reading (QUASAR) project, which focuses on…
Using Mixed Methods to Interpret Differential Item Functioning
ERIC Educational Resources Information Center
Benítez, Isabel; Padilla, José-Luis; Hidalgo Montesinos, María Dolores; Sireci, Stephen G.
2016-01-01
Analysis of differential item functioning (DIF) is often used to determine if cross-lingual assessments are equivalent across languages. However, evidence on the causes of cross-lingual DIF is still evasive. Expert appraisal is a qualitative method useful for obtaining detailed information about problematic elements in the different linguistic…
Item Response Theory Using Hierarchical Generalized Linear Models
ERIC Educational Resources Information Center
Ravand, Hamdollah
2015-01-01
Multilevel models (MLMs) are flexible in that they can be employed to obtain item and person parameters, test for differential item functioning (DIF) and capture both local item and person dependence. Papers on the MLM analysis of item response data have focused mostly on theoretical issues where applications have been add-ons to simulation…
Item response theory analyses of the Delis-Kaplan Executive Function System card sorting subtest.
Spencer, Mercedes; Cho, Sun-Joo; Cutting, Laurie E
2018-02-02
In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years. We also tested for measurement invariance for these items across age and gender groups using item response theory (IRT). Results of the exploratory factor analysis indicated that a two-factor model that distinguished between verbal and perceptual items provided the best fit to the data. Although the items demonstrated measurement invariance across age groups, measurement invariance was violated for gender groups, with two items demonstrating differential item functioning for males and females. Multigroup analysis using all 16 items indicated that the items were more effective for individuals whose IRT scale scores were relatively high. A single-group explanatory IRT model using 14 non-differential item functioning items showed that for perceptual ability, females scored higher than males and that scores increased with age for both males and females; for verbal ability, the observed increase in scores across age differed for males and females. The implications of these findings are discussed.
Shen, Minxue; Hu, Ming; Sun, Zhenqiu
2017-01-01
Objectives To develop and validate brief scales to measure common emotional and behavioural problems among adolescents in the examination-oriented education system and collectivistic culture of China. Setting Middle schools in Hunan province. Participants 5442 middle school students aged 11–19 years were sampled. 4727 valid questionnaires were collected and used for validation of the scales. The final sample included 2408 boys and 2319 girls. Primary and secondary outcome measures The tools were assessed by the item response theory, classical test theory (reliability and construct validity) and differential item functioning. Results Four scales to measure anxiety, depression, study problem and sociality problem were established. Exploratory factor analysis showed that each scale had two solutions. Confirmatory factor analysis showed acceptable to good model fit for each scale. Internal consistency and test–retest reliability of all scales were above 0.7. Item response theory showed that all items had acceptable discrimination parameters and most items had appropriate difficulty parameters. 10 items demonstrated differential item functioning with respect to gender. Conclusions Four brief scales were developed and validated among adolescents in middle schools of China. The scales have good psychometric properties with minor differential item functioning. They can be used in middle school settings, and will help school officials to assess the students’ emotional/behavioural problems. PMID:28062469
Differential item functioning by sex and race in the Hogan Personality Inventory.
Sheppard, Richard; Han, Kyunghee; Colarelli, Stephen M; Dai, Guangdong; King, Daniel W
2006-12-01
The authors examined measurement bias in the Hogan Personality Inventory by investigating differential item functioning (DIF) across sex and two racial groups (Caucasian and Black). The sample consisted of 1,579 Caucasians (1,023 men, 556 women) and 523 Blacks (321 men, 202 women) who were applying for entry-level, unskilled jobs in factories. Although the group mean differences were trivial, more than a third of the items showed DIF by sex (38.4%) and by race (37.3%). A content analysis of potentially biased items indicated that the themes of items displaying DIF were slightly more cohesive for sex than for race. The authors discuss possible explanations for differing clustering tendencies of items displaying DIF and some practical and theoretical implications of DIF in the development and interpretation of personality inventories.
The neural substrates of memory suppression: a FMRI exploration of directed forgetting.
Bastin, Christine; Feyers, Dorothée; Majerus, Steve; Balteau, Evelyne; Degueldre, Christian; Luxen, André; Maquet, Pierre; Salmon, Eric; Collette, Fabienne
2012-01-01
The directed forgetting paradigm is frequently used to determine the ability to voluntarily suppress information. However, little is known about brain areas associated with information to forget. The present study used functional magnetic resonance imaging to determine brain activity during the encoding and retrieval phases of an item-method directed forgetting recognition task with neutral verbal material in order to apprehend all processing stages that information to forget and to remember undergoes. We hypothesized that regions supporting few selective processes, namely recollection and familiarity memory processes, working memory, inhibitory and selection processes should be differentially activated during the processing of to-be-remembered and to-be-forgotten items. Successful encoding and retrieval of items to remember engaged the entorhinal cortex, the hippocampus, the anterior medial prefrontal cortex, the left inferior parietal cortex, the posterior cingulate cortex and the precuneus; this set of regions is well known to support deep and associative encoding and retrieval processes in episodic memory. For items to forget, encoding was associated with higher activation in the right middle frontal and posterior parietal cortex, regions known to intervene in attentional control. Items to forget but nevertheless correctly recognized at retrieval yielded activation in the dorsomedial thalamus, associated with familiarity-based memory processes and in the posterior intraparietal sulcus and the anterior cingulate cortex, involved in attentional processes.
The Act of Answering Questions Elicited Differentiated Responses in a Concealed Information Test.
Otsuka, Takuro; Mizutani, Mitsuyoshi; Yagi, Akihiro; Katayama, Jun'ichi
2018-04-17
The concealed information test (CIT), a psychophysiological detection of deception test, compares physiological responses between crime-related and crime-unrelated items. In previous studies, whether the act of answering questions affected physiological responses was unclear. This study examined effects of both question-related and answer-related processes on physiological responses. Twenty participants received a modified CIT, in which the interval between presentation of questions and answering them was 27 s. Differentiated respiratory movements and cardiovascular responses between items were observed for both questions (items) and answers, while differentiated skin conductance response was observed only for questions. These results suggest that physiological responses to questions reflected orientation to a crime-related item, while physiological responses during answering reflected inhibition of psychological arousal caused by orienting. Regarding the CIT's accuracy, participants' perception of the questions themselves more strongly influenced physiological responses than answering them. © 2018 American Academy of Forensic Sciences.
Factor structure and gender stability in the multidimensional condom attitudes scale.
Starosta, Amy J; Berghoff, Christopher R; Earleywine, Mitch
2015-06-01
Sexually transmitted infections continue to trouble the United States and can be attenuated through increased condom use. Attitudes about condoms are an important multidimensional factor that can affect sexual health choices and have been successfully measured using the Multidimensional Condom Attitudes Scale (MCAS). Such attitudes have the potential to vary between men and women, yet little work has been undertaken to identify if the MCAS accurately captures attitudes without being influenced by underlying gender biases. We examined the factor structure and gender invariance on the MCAS using confirmatory factor analysis and item response theory, within-subscale differential item functioning analyses. More than 770 participants provided data via the Internet. Results of differential item functioning analyses identified three items as differentially functioning between the genders, and removal of these items is recommended. Findings confirmed the previously hypothesized multidimensional nature of condom attitudes and the five-factor structure of the MCAS even after the removal of the three problematic items. In general, comparisons across genders using the MCAS seem reasonable from a methodological standpoint. Results are discussed in terms of improving sexual health research and interventions. © The Author(s) 2014.
ERIC Educational Resources Information Center
Fidalgo, Angel M.; Ferreres, Doris; Muniz, Jose
2004-01-01
Sample-size restrictions limit the contingency table approaches based on asymptotic distributions, such as the Mantel-Haenszel (MH) procedure, for detecting differential item functioning (DIF) in many practical applications. Within this framework, the present study investigated the power and Type I error performance of empirical and inferential…
ERIC Educational Resources Information Center
Lee, HyeSun; Geisinger, Kurt F.
2016-01-01
The current study investigated the impact of matching criterion purification on the accuracy of differential item functioning (DIF) detection in large-scale assessments. The three matching approaches for DIF analyses (block-level matching, pooled booklet matching, and equated pooled booklet matching) were employed with the Mantel-Haenszel…
Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning
ERIC Educational Resources Information Center
Li, Zhushan
2014-01-01
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
The MIMIC Method with Scale Purification for Detecting Differential Item Functioning
ERIC Educational Resources Information Center
Wang, Wen-Chung; Shih, Ching-Lin; Yang, Chih-Chien
2009-01-01
This study implements a scale purification procedure onto the standard MIMIC method for differential item functioning (DIF) detection and assesses its performance through a series of simulations. It is found that the MIMIC method with scale purification (denoted as M-SP) outperforms the standard MIMIC method (denoted as M-ST) in controlling…
ERIC Educational Resources Information Center
Finch, W. Holmes; Hernández Finch, Maria E.; French, Brian F.
2016-01-01
Differential item functioning (DIF) assessment is key in score validation. When DIF is present scores may not accurately reflect the construct of interest for some groups of examinees, leading to incorrect conclusions from the scores. Given rising immigration, and the increased reliance of educational policymakers on cross-national assessments…
Using Multiple-Variable Matching to Identify Cultural Sources of Differential Item Functioning
ERIC Educational Resources Information Center
Wu, Amery D.; Ercikan, Kadriye
2006-01-01
Identifying the sources of differential item functioning (DIF) in international assessments is very challenging, because such sources are often nebulous and intertwined. Even though researchers frequently focus on test translation and content area, few actually go beyond these factors to investigate other cultural sources of DIF. This article…
The impact of gender on the assessment of body checking behavior.
Alfano, Lauren; Hildebrandt, Tom; Bannon, Katie; Walker, Catherine; Walton, Kate E
2011-01-01
Body checking includes any behavior aimed at global or specific evaluations of appearance characteristics. Men and women are believed to express these behaviors differently, possibly reflecting different socialization. However, there has been no empirical test of the impact of gender on body checking. A total of 1024 male and female college students completed two measures of body checking, the Body Checking Questionnaire and the Male Body Checking Questionnaire. Using multiple group confirmatory factor analysis, differential item functioning (DIF) was explored in a composite of these measures. Two global latent factors were identified (female and male body checking severity), and there were expected gender differences in these factors even after controlling for DIF. Ten items were found to be unbiased by gender and provide a suitable brief measure of body checking for mixed gender research. Practical applications for body checking assessment and theoretical implications are discussed. Copyright © 2010 Elsevier Ltd. All rights reserved.
41 CFR 101-30.101-2 - Item of supply.
Code of Federal Regulations, 2013 CFR
2013-07-01
....101-2 Section 101-30.101-2 Public Contracts and Property Management Federal Property Management Regulations System FEDERAL PROPERTY MANAGEMENT REGULATIONS SUPPLY AND PROCUREMENT 30-FEDERAL CATALOG SYSTEM 30... differentiates one item from another item in the Federal Catalog System. Each item of supply is expressed in and...
41 CFR 101-30.101-2 - Item of supply.
Code of Federal Regulations, 2010 CFR
2010-07-01
....101-2 Section 101-30.101-2 Public Contracts and Property Management Federal Property Management Regulations System FEDERAL PROPERTY MANAGEMENT REGULATIONS SUPPLY AND PROCUREMENT 30-FEDERAL CATALOG SYSTEM 30... differentiates one item from another item in the Federal Catalog System. Each item of supply is expressed in and...
41 CFR 101-30.101-2 - Item of supply.
Code of Federal Regulations, 2014 CFR
2014-07-01
....101-2 Section 101-30.101-2 Public Contracts and Property Management Federal Property Management Regulations System FEDERAL PROPERTY MANAGEMENT REGULATIONS SUPPLY AND PROCUREMENT 30-FEDERAL CATALOG SYSTEM 30... differentiates one item from another item in the Federal Catalog System. Each item of supply is expressed in and...
41 CFR 101-30.101-2 - Item of supply.
Code of Federal Regulations, 2011 CFR
2011-07-01
....101-2 Section 101-30.101-2 Public Contracts and Property Management Federal Property Management Regulations System FEDERAL PROPERTY MANAGEMENT REGULATIONS SUPPLY AND PROCUREMENT 30-FEDERAL CATALOG SYSTEM 30... differentiates one item from another item in the Federal Catalog System. Each item of supply is expressed in and...
41 CFR 101-30.101-2 - Item of supply.
Code of Federal Regulations, 2012 CFR
2012-07-01
....101-2 Section 101-30.101-2 Public Contracts and Property Management Federal Property Management Regulations System FEDERAL PROPERTY MANAGEMENT REGULATIONS SUPPLY AND PROCUREMENT 30-FEDERAL CATALOG SYSTEM 30... differentiates one item from another item in the Federal Catalog System. Each item of supply is expressed in and...
An Alternative Approach for the Analyses and Interpretation of Attachment Sort Items
ERIC Educational Resources Information Center
Kirkland, John; Bimler, David; Drawneek, Andrew; McKim, Margaret; Scholmerich, Axel
2004-01-01
Attachment Q-Sort (AQS) is a tool for quantifying observations about toddler/caregiver relationships. Previous studies have applied factor analysis to the full 90 AQS item set to explore the structure underlying them. Here we explore that structure by applying multidimensional scaling (MDS) to judgements of inter-item similarity. AQS items are…
ERIC Educational Resources Information Center
Tian, Wei; Cai, Li; Thissen, David; Xin, Tao
2013-01-01
In item response theory (IRT) modeling, the item parameter error covariance matrix plays a critical role in statistical inference procedures. When item parameters are estimated using the EM algorithm, the parameter error covariance matrix is not an automatic by-product of item calibration. Cai proposed the use of Supplemented EM algorithm for…
Rasch Analysis of the Power as Knowing Participation in Change Tool--the Brazilian version.
Guedes, Erika de Souza; Orozco-Vargas, Luiz Carlos; Turrini, Ruth Natália Teresa; de Sousa, Regina Márcia Cardoso; dos Santos, Mariana Alvina; da Cruz, Diná de Almeida Lopes Monteiro
2013-01-01
the objective of this study was to evaluate the items contained in the Brazilian version of the Power as Knowing Participation in Change Tool (PKPCT). investigation of the psychometric properties of the mentioned questionnaire through Rasch analysis. the data from 952 nursing assistants and 627 baccalaureate nurses were analyzed (average age 44.1 (SD=9.5); 13.0% men). The subscales Choices, Awareness, Freedom and Involvement were tested separately and presented unidimensionality; the categories of the responses given to the items were compiled from 7 to 3 levels and the items fit the model well, except for the following/leading item, in which the infit and outfit values were above 1.4; this item has also presented Differential Item Functioning (DIF) according to the participant's role. The reliability of the items was of 0.99 and the reliability of the participants ranged from 0.80 to 0.84 in the subscales. Items with extremely high levels of difficulty were not identified. the PKPCT should not be viewed as unidimensional, items with extremely high levels of difficulty in the scale need to be created and the differential functioning of some items has to be further investigated.
Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina
2015-06-01
This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism. © The Author(s) 2014.
Acculturation and the Center For Epidemiological Studies-Depression Scale for Hispanic women.
McCabe, Brian E; Vermeesch, Amber L; Hall, Rosemary F; Peragallo, Nilda P; Mitrani, Victoria B
2011-01-01
Culturally valid measures of depression for Spanish-speaking Hispanic women are important for developing and implementing effective interventions to reduce health disparities. The Center for Epidemiological Studies-Depression Scale (CES-D) is a widely used measure of depression. Differential item functioning has been studied using language preference as a proxy for acculturation, but it is unknown if the results were due to acculturation or the language of administration. The aim of this study was to evaluate the relationship of acculturation, defined with a dimensional measure, to Spanish CES-D item responses. Spanish-speaking Hispanic women (n = 504) were recruited for a randomized controlled trial of Salud, Educación, Prevención y Autocuidado (Health, Education, Prevention, and Self-Care). Acculturation, an important dimension of variation within the diverse U.S. Hispanic community, was defined by high or low scores on the Americanism subscale of the Bidimensional Acculturation Scale. Differential item functioning for each of the 20 CES-D items between more acculturated and less acculturated women was tested using ordinal logistic regression. No items on the Depressed Affect, Somatic Activity, or Positive Affect subscales showed meaningful differential item functioning, but 1 item ("People were unfriendly") on the Interpersonal subscale had small results (R = 1.1%). The majority of CES-D items performed similarly for Spanish-speaking Hispanic women with high and low acculturation. Less acculturated women responded more positively to "People were unfriendly," despite having an equivalent level of depression, than did more acculturated women. Possibilities for improving this item are proposed.
Comparison of Objective and Subjective Methods on Determination of Differential Item Functioning
ERIC Educational Resources Information Center
Sahin, Melek Gülsah
2017-01-01
Research objective is comparing the objective methods often used in literature for determination of differential item functioning (DIF) and the subjective method based on the opinions of the experts which are not used so often in literature. Mantel-Haenszel (MH), Logistic Regression (LR) and SIBTEST are chosen as objective methods. While the data…
ERIC Educational Resources Information Center
Alavi, Seyed Mohammad; Bordbar, Soodeh
2017-01-01
Differential Item Functioning (DIF) analysis is a key element in evaluating educational test fairness and validity. One of the frequently cited sources of construct-irrelevant variance is gender which has an important role in the university entrance exam; therefore, it causes bias and consequently undermines test validity. The present study aims…
An Introduction to Missing Data in the Context of Differential Item Functioning
ERIC Educational Resources Information Center
Banks, Kathleen
2015-01-01
This article introduces practitioners and researchers to the topic of missing data in the context of differential item functioning (DIF), reviews the current literature on the issue, discusses implications of the review, and offers suggestions for future research. A total of nine studies were reviewed. All of these studies determined what effect…
Differential Item Functioning By Sex and Race in The Hogan Personality Inventory
ERIC Educational Resources Information Center
Sheppard, Richard; Han, Kyunghee; Colarelli, Stephen M.; Dai, Guangdong; King, Daniel W.
2006-01-01
The authors examined measurement bias in the Hogan Personality Inventory by investigating differential item functioning (DIF) across sex and two racial groups (Caucasian and Black). The sample consisted of 1,579 Caucasians (1,023 men, 556 women) and 523 Blacks (321 men, 202 women) who were applying for entry-level, unskilled jobs in factories.…
Owens, Sherry; Kristjansson, Alfgeir L; Hunte, Haslyn E R
2015-11-05
We investigated whether individual items on the nine item William's Perceived Everyday Discrimination Scale (EDS) functioned differently by age (<45 vs ≥ 45) within five racial groups in the United States: Asians (n=2,017); Hispanics (n=2,688); Black Caribbeans (n=1,377); African Americans (n=3,434); and Whites (n=854). We used data from the 2001-2003 National Survey of American Lives and the 2001-2003 National Latino and Asian Studies. Multiple-indicator, multiple-cause models (MIMIC) were used to examine differential item functioning (DIF) on the EDS by age within each racial/ethnic group. Overall, Asian and Hispanic respondents reported less discrimination than Whites; on the other hand, African Americans and Black Caribbeans reported more discrimination than Whites. Regardless of race/ethnicity, the younger respondents (aged <45 years) reported less discrimination than the older respondents (aged ≥ 45 years). In terms of age by race/ethnicity, the results were mixed for 19 out of 45 tests of DIF (40%). No differences in item function were observed among Black Caribbeans. "Being called names or insulted" and others acting as "if they are afraid" of the respondents were the only two items that did not exhibit differential item functioning by age across all racial/ethnic groups. Overall, our findings suggest that the EDS scale should be used with caution in multi-age multi-racial/ethnic samples.
Forrest, Christopher B; Devine, Janine; Bevans, Katherine B; Becker, Brandon D; Carle, Adam C; Teneralli, Rachel E; Moon, JeanHee; Tucker, Carole A; Ravens-Sieberer, Ulrike
2018-01-01
To describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Life Satisfaction item banks, child-report, and parent-proxy editions. A pool of 55 life satisfaction items was administered to 1992 children 8-17 years old and 964 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and assessment of construct validity. Thirteen items were deleted because of poor psychometric performance. An 8-item short form was administered to a national sample of 996 children 8-17 years old, and 1294 parents of children 5-17 years old. The combined sample (2988 children and 2258 parents) was used in item response theory (IRT) calibration analyses. The final item banks were unidimensional, the items were locally independent, and the items were free from impactful differential item functioning. The 8-item and 4-item short form scales showed excellent reliability, convergent validity, and discriminant validity. Life satisfaction decreased with declining socio-economic status, presence of a special health care need, and increasing age for girls, but not boys. After IRT calibration, we found that 4- and 8-item short forms had a high degree of precision (reliability) across a wide range (>4 SD units) of the latent variable. The PROMIS Pediatric Life Satisfaction item banks and their short forms provide efficient, precise, and valid assessments of life satisfaction in children and youth.
Forrest, Christopher B; Ravens-Sieberer, Ulrike; Devine, Janine; Becker, Brandon D; Teneralli, Rachel; Moon, JeanHee; Carle, Adam; Tucker, Carole A; Bevans, Katherine B
2018-03-01
The purpose of this study is to describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Positive Affect item bank, child-report and parent-proxy editions. The initial item pool comprising 53 items, previously developed using qualitative methods, was administered to 1,874 children 8-17 years old and 909 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and construct validity. A total of 14 items were deleted, because of poor psychometric performance, and an 8-item short form constructed from the remaining 39 items was administered to a national sample of 1,004 children 8-17 years old, and 1,306 parents of children 5-17 years old. The combined sample was used in item response theory (IRT) calibration analyses. The final item bank appeared unidimensional, the items appeared locally independent, and the items were free from differential item functioning. The scales showed excellent reliability and convergent and discriminant validity. Positive affect decreased with children's age and was lower for those with a special health care need. After IRT calibration, we found that 4 and 8 item short forms had a high degree of precision (reliability) across a wide range of the latent trait (>4 SD units). The PROMIS Pediatric Positive Affect item bank and its short forms provide an efficient, precise, and valid assessment of positive affect in children and youth.
Aggregating Polytomous DIF Results over Multiple Test Administrations
ERIC Educational Resources Information Center
Zwick, Rebecca; Ye, Lei; Isham, Steven
2018-01-01
In typical differential item functioning (DIF) assessments, an item's DIF status is not influenced by its status in previous test administrations. An item that has shown DIF at multiple administrations may be treated the same way as an item that has shown DIF in only the most recent administration. Therefore, much useful information about the…
A Comparison of Linking and Concurrent Calibration under the Graded Response Model.
ERIC Educational Resources Information Center
Kim, Seock-Ho; Cohen, Allan S.
Applications of item response theory to practical testing problems including equating, differential item functioning, and computerized adaptive testing, require that item parameter estimates be placed onto a common metric. In this study, two methods for developing a common metric for the graded response model under item response theory were…
A Methodology for Zumbo's Third Generation DIF Analyses and the Ecology of Item Responding
ERIC Educational Resources Information Center
Zumbo, Bruno D.; Liu, Yan; Wu, Amery D.; Shear, Benjamin R.; Olvera Astivia, Oscar L.; Ark, Tavinder K.
2015-01-01
Methods for detecting differential item functioning (DIF) and item bias are typically used in the process of item analysis when developing new measures; adapting existing measures for different populations, languages, or cultures; or more generally validating test score inferences. In 2007 in "Language Assessment Quarterly," Zumbo…
Multidimensional Extension of Multiple Indicators Multiple Causes Models to Detect DIF
ERIC Educational Resources Information Center
Lee, Soo; Bulut, Okan; Suh, Youngsuk
2017-01-01
A number of studies have found multiple indicators multiple causes (MIMIC) models to be an effective tool in detecting uniform differential item functioning (DIF) for individual items and item bundles. A recently developed MIMIC-interaction model is capable of detecting both uniform and nonuniform DIF in the unidimensional item response theory…
Böhnke, Jan R; Croudace, Tim J
2016-08-01
The assessment of 'general health and well-being' in public mental health research stimulates debates around relative merits of questionnaire instruments and their items. Little evidence regarding alignment or differential advantages of instruments or items has appeared to date. Population-based psychometric study of items employed in public mental health narratives. Multidimensional item response theory was applied to General Health Questionnaire (GHQ-12), Warwick-Edinburgh Mental Well-being Scale (WEMWBS) and EQ-5D items (Health Survey for England, 2010-2012; n = 19 290). A bifactor model provided the best account of the data and showed that the GHQ-12 and WEMWBS items assess mainly the same construct. Only one item of the EQ-5D showed relevant overlap with this dimension (anxiety/depression). Findings were corroborated by comparisons with alternative models and cross-validation analyses. The consequences of this lack of differentiation (GHQ-12 v. WEMWBS) for mental health and well-being narratives deserves discussion to enrich debates on priorities in public mental health and its assessment. © The Royal College of Psychiatrists 2015.
ERIC Educational Resources Information Center
Çikirikçi Demirtasli, Nükhet; Ulutas, Seher
2015-01-01
Problem Statement: Item bias occurs when individuals from different groups (different gender, cultural background, etc.) have different probabilities of responding correctly to a test item despite having the same skill levels. It is important that tests or items do not have bias in order to ensure the accuracy of decisions taken according to test…
Assessing the Utility of Item Response Theory Models: Differential Item Functioning.
ERIC Educational Resources Information Center
Scheuneman, Janice Dowd
The current status of item response theory (IRT) is discussed. Several IRT methods exist for assessing whether an item is biased. Focus is on methods proposed by L. M. Rudner (1975), F. M. Lord (1977), D. Thissen et al. (1988) and R. L. Linn and D. Harnisch (1981). Rudner suggested a measure of the area lying between the two item characteristic…
Sideridis, Georgios D.; Tsaousis, Ioannis; Al Harbi, Khaleel
2016-01-01
The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction. PMID:27790174
Sideridis, Georgios D; Tsaousis, Ioannis; Al Harbi, Khaleel
2016-01-01
The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction.
Adaptation of the Practice Environment Scale for military nurses: a psychometric analysis.
Swiger, Pauline A; Raju, Dheeraj; Breckenridge-Sproat, Sara; Patrician, Patricia A
2017-09-01
The aim of this study was to confirm the psychometric properties of Practice Environment Scale of the Nursing Work Index in a military population. This study also demonstrates association rule analysis, a contemporary exploratory technique. One of the instruments most commonly used to evaluate the nursing practice environment is the Practice Environment Scale of the Nursing Work Index. Although the instrument has been widely used, the reliability, validity and individual item function are not commonly evaluated. Gaps exist with regard to confirmatory evaluation of the subscale factors, individual item analysis and evaluation in the outpatient setting and with non-registered nursing staff. This was a secondary data analysis of existing survey data. Multiple psychometric methods were used for this analysis using survey data collected in 2014. First, descriptive analyses were conducted, including exploration using association rules. Next, internal consistency was tested and confirmatory factor analysis was performed to test the factor structure. The specified factor structure did not hold; therefore, exploratory factor analysis was performed. Finally, item analysis was executed using item response theory. The differential item functioning technique allowed the comparison of responses by care setting and nurse type. The results of this study indicate that responses differ between groups and that several individual items could be removed without altering the psychometric properties of the instrument. The instrument functions moderately well in a military population; however, researchers may want to consider nurse type and care setting during analysis to identify any meaningful variation in responses. © 2017 John Wiley & Sons Ltd.
Solving the measurement invariance anchor item problem in item response theory.
Meade, Adam W; Wright, Natalie A
2012-09-01
The efficacy of tests of differential item functioning (measurement invariance) has been well established. It is clear that when properly implemented, these tests can successfully identify differentially functioning (DF) items when they exist. However, an assumption of these analyses is that the metric for different groups is linked using anchor items that are invariant. In practice, however, it is impossible to be certain which items are DF and which are invariant. This problem of anchor items, or referent indicators, has long plagued invariance research, and a multitude of suggested approaches have been put forth. Unfortunately, the relative efficacy of these approaches has not been tested. This study compares 11 variations on 5 qualitatively different approaches from recent literature for selecting optimal anchor items. A large-scale simulation study indicates that for nearly all conditions, an easily implemented 2-stage procedure recently put forth by Lopez Rivas, Stark, and Chernyshenko (2009) provided optimal power while maintaining nominal Type I error. With this approach, appropriate anchor items can be easily and quickly located, resulting in more efficacious invariance tests. Recommendations for invariance testing are illustrated using a pedagogical example of employee responses to an organizational culture measure.
ERIC Educational Resources Information Center
Lee, HwaYoung; Beretvas, S. Natasha
2014-01-01
Conventional differential item functioning (DIF) detection methods (e.g., the Mantel-Haenszel test) can be used to detect DIF only across observed groups, such as gender or ethnicity. However, research has found that DIF is not typically fully explained by an observed variable. True sources of DIF may include unobserved, latent variables, such as…
ERIC Educational Resources Information Center
Aryadoust, Vahid
2012-01-01
This article investigates a version of the International English Language Testing System (IELTS) listening test for evidence of differential item functioning (DIF) based on gender, nationality, age, and degree of previous exposure to the test. Overall, the listening construct was found to be underrepresented, which is probably an important cause…
ERIC Educational Resources Information Center
Quesen, Sarah
2016-01-01
When studying differential item functioning (DIF) with students with disabilities (SWD) focal groups typically suffer from small sample size, whereas the reference group population is usually large. This makes it possible for a researcher to select a sample from the reference population to be similar to the focal group on the ability scale. Doing…
A Robust Outlier Approach to Prevent Type I Error Inflation in Differential Item Functioning
ERIC Educational Resources Information Center
Magis, David; De Boeck, Paul
2012-01-01
The identification of differential item functioning (DIF) is often performed by means of statistical approaches that consider the raw scores as proxies for the ability trait level. One of the most popular approaches, the Mantel-Haenszel (MH) method, belongs to this category. However, replacing the ability level by the simple raw score is a source…
ERIC Educational Resources Information Center
Laitusis, Cara Cahalan; Maneckshana, Behroz; Monfils, Lora; Ahlgrim-Delzell, Lynn
2009-01-01
The purpose of this study was to examine Differential Item Functioning (DIF) by disability groups on an on-demand performance assessment for students with severe cognitive impairments. Researchers examined the presence of DIF for two comparisons. One comparison involved students with severe cognitive impairments who served as the reference group…
ERIC Educational Resources Information Center
Polikoff, Morgan S.; May, Henry; Porter, Andrew C.; Elliott, Stephen N.; Goldring, Ellen; Murphy, Joseph
2009-01-01
The Vanderbilt Assessment of Leadership in Education is a 360-degree assessment of the effectiveness of principals' learning-centered leadership behaviors. In this report, we present results from a differential item functioning (DIF) study of the assessment. Using data from a national field trial, we searched for evidence of DIF on school level,…
Using a Mixture IRT Model to Understand English Learner Performance on Large-Scale Assessments
ERIC Educational Resources Information Center
Shea, Christine A.
2013-01-01
The purpose of this study was to determine whether an eighth grade state-level math assessment contained items that function differentially (DIF) for English Learner students (EL) as compared to English Only students (EO) and if so, what factors might have caused DIF. To determine this, Differential Item Functioning (DIF) analysis was employed.…
ERIC Educational Resources Information Center
Socha, Alan; DeMars, Christine E.; Zilberberg, Anna; Phan, Ha
2015-01-01
The Mantel-Haenszel (MH) procedure is commonly used to detect items that function differentially for groups of examinees from various demographic and linguistic backgrounds--for example, in international assessments. As in some other DIF methods, the total score is used to match examinees on ability. In thin matching, each of the total score…
ERIC Educational Resources Information Center
Robitzsch, Alexander; Rupp, Andre A.
2009-01-01
This article describes the results of a simulation study to investigate the impact of missing data on the detection of differential item functioning (DIF). Specifically, it investigates how four methods for dealing with missing data (listwise deletion, zero imputation, two-way imputation, response function imputation) interact with two methods of…
ERIC Educational Resources Information Center
Sari, Halil Ibrahim; Huggins, Anne Corinne
2015-01-01
This study compares two methods of defining groups for the detection of differential item functioning (DIF): (a) pairwise comparisons and (b) composite group comparisons. We aim to emphasize and empirically support the notion that the choice of pairwise versus composite group definitions in DIF is a reflection of how one defines fairness in DIF…
ERIC Educational Resources Information Center
Walker, Cindy M.; Gocer Sahin, Sakine
2017-01-01
The theoretical reason for the presence of differential item functioning (DIF) is that data are multidimensional and two groups of examinees differ in their underlying ability distribution for the secondary dimension(s). Therefore, the purpose of this study was to determine how much the secondary ability distributions must differ before DIF is…
Use of multilevel logistic regression to identify the causes of differential item functioning.
Balluerka, Nekane; Gorostiaga, Arantxa; Gómez-Benito, Juana; Hidalgo, María Dolores
2010-11-01
Given that a key function of tests is to serve as evaluation instruments and for decision making in the fields of psychology and education, the possibility that some of their items may show differential behaviour is a major concern for psychometricians. In recent decades, important progress has been made as regards the efficacy of techniques designed to detect this differential item functioning (DIF). However, the findings are scant when it comes to explaining its causes. The present study addresses this problem from the perspective of multilevel analysis. Starting from a case study in the area of transcultural comparisons, multilevel logistic regression is used: 1) to identify the item characteristics associated with the presence of DIF; 2) to estimate the proportion of variation in the DIF coefficients that is explained by these characteristics; and 3) to evaluate alternative explanations of the DIF by comparing the explanatory power or fit of different sequential models. The comparison of these models confirmed one of the two alternatives (familiarity with the stimulus) and rejected the other (the topic area) as being a cause of differential functioning with respect to the compared groups.
Varga, Mihai; Visu-Petra, George; Miclea, Mircea; Visu-Petra, Laura
2015-01-01
Concealing the possession of relevant information represents a complex cognitive process, shaped by contextual demands and individual differences in cognitive and socio-emotional functioning. The Reaction Time-based Concealed Information Test (RT-CIT) is used to detect concealed knowledge based on the difference in RTs between denying recognition of critical (probes) and newly encountered (irrelevant) information. Several research questions were addressed in this scenario implemented after a mock crime. First, we were interested whether the introduction of a social stimulus (facial identity) simulating a virtual investigator would facilitate the process of deception detection. Next, we explored whether his emotional displays (friendly, hostile or neutral) would have a differential impact on speed of responses to probe versus irrelevant items. We also compared the impact of introducing similar stimuli in a working memory (WM) updating context without requirements to conceal information. Finally, we explored the association between deceptive behavior and individual differences in WM updating proficiency or in internalizing problems (state / trait anxiety and depression). Results indicated that the mere presence of a neutral virtual investigator slowed down participants' responses, but not the appended lie-specific time (difference between probes and irrelevants). Emotional expression was shown to differentially affect speed of responses to critical items, with positive displays from the virtual examiner enhancing lie-specific time, compared to negative facial expressions, which had an opposite impact. This valence-specific effect was not visible in the WM updating context. Higher levels of trait / state anxiety were related to faster responses to probes in the negative condition (hostile facial expression) of the RT-CIT. These preliminary findings further emphasize the need to take into account motivational and emotional factors when considering the transfer of deception detection techniques from the laboratory to real-life settings. PMID:25699516
Differential emotional processing in concrete and abstract words.
Yao, Bo; Keitel, Anne; Bruce, Gillian; Scott, Graham G; O'Donnell, Patrick J; Sereno, Sara C
2018-02-12
Emotion (positive and negative) words are typically recognized faster than neutral words. Recent research suggests that emotional valence, while often treated as a unitary semantic property, may be differentially represented in concrete and abstract words. Studies that have explicitly examined the interaction of emotion and concreteness, however, have demonstrated inconsistent patterns of results. Moreover, these findings may be limited as certain key lexical variables (e.g., familiarity, age of acquisition) were not taken into account. We investigated the emotion-concreteness interaction in a large-scale, highly controlled lexical decision experiment. A 3 (Emotion: negative, neutral, positive) × 2 (Concreteness: abstract, concrete) design was used, with 45 items per condition and 127 participants. We found a significant interaction between emotion and concreteness. Although positive and negative valenced words were recognized faster than neutral words, this emotion advantage was significantly larger in concrete than in abstract words. We explored potential contributions of participant alexithymia level and item imageability to this interactive pattern. We found that only word imageability significantly modulated the emotion-concreteness interaction. While both concrete and abstract emotion words are advantageously processed relative to comparable neutral words, the mechanisms of this facilitation are paradoxically more dependent on imageability in abstract words. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Cross-Group Equivalence of Interest and Motivation Items in PISA 2012 Turkey Sample
ERIC Educational Resources Information Center
Ardic, Elif Ozlem; Gelbal, Selahattin
2017-01-01
Purpose: The aim of this study was to examine measurement invariance of the interest and motivation related items contained in the PISA 2012 student survey with regard to gender school type and statistical regions and to identify the items that show differential item functioning (DIF) across groups. Research Methods: Multiple-group confirmatory…
Detecting DIF in Polytomous Items Using MACS, IRT and Ordinal Logistic Regression
ERIC Educational Resources Information Center
Elosua, Paula; Wells, Craig
2013-01-01
The purpose of the present study was to compare the Type I error rate and power of two model-based procedures, the mean and covariance structure model (MACS) and the item response theory (IRT), and an observed-score based procedure, ordinal logistic regression, for detecting differential item functioning (DIF) in polytomous items. A simulation…
Application of a Method of Estimating DIF for Polytomous Test Items.
ERIC Educational Resources Information Center
Camilli, Gregory; Congdon, Peter
1999-01-01
Demonstrates a method for studying differential item functioning (DIF) that can be used with dichotomous or polytomous items and that is valid for data that follow a partial credit Item Response Theory model. A simulation study shows that positively biased Type I error rates are in accord with results from previous studies. (SLD)
A Monte Carlo Study of an Iterative Wald Test Procedure for DIF Analysis
ERIC Educational Resources Information Center
Cao, Mengyang; Tay, Louis; Liu, Yaowu
2017-01-01
This study examined the performance of a proposed iterative Wald approach for detecting differential item functioning (DIF) between two groups when preknowledge of anchor items is absent. The iterative approach utilizes the Wald-2 approach to identify anchor items and then iteratively tests for DIF items with the Wald-1 approach. Monte Carlo…
Assessment of Differential Item Functioning in the Experiences of Discrimination Index
Cunningham, Timothy J.; Berkman, Lisa F.; Gortmaker, Steven L.; Kiefe, Catarina I.; Jacobs, David R.; Seeman, Teresa E.; Kawachi, Ichiro
2011-01-01
The psychometric properties of instruments used to measure self-reported experiences of discrimination in epidemiologic studies are rarely assessed, especially regarding construct validity. The authors used 2000–2001 data from the Coronary Artery Risk Development in Young Adults (CARDIA) Study to examine differential item functioning (DIF) in 2 versions of the Experiences of Discrimination (EOD) Index, an index measuring self-reported experiences of racial/ethnic and gender discrimination. DIF may confound interpretation of subgroup differences. Large DIF was observed for 2 of 7 racial/ethnic discrimination items: White participants reported more racial/ethnic discrimination for the “at school” item, and black participants reported more racial/ethnic discrimination for the “getting housing” item. The large DIF by race/ethnicity in the index for racial/ethnic discrimination probably reflects item impact and is the result of valid group differences between blacks and whites regarding their respective experiences of discrimination. The authors also observed large DIF by race/ethnicity for 3 of 7 gender discrimination items. This is more likely to have been due to item bias. Users of the EOD Index must consider the advantages and disadvantages of DIF adjustment (omitting items, constructing separate measures, and retaining items). The EOD Index has substantial usefulness as an instrument that can assess self-reported experiences of discrimination. PMID:22038104
Heinemann, Allen W; Kisala, Pamela A; Hahn, Elizabeth A; Tulsky, David S
2015-05-01
To develop a spinal cord injury (SCI)-focused version of PROMIS and Neuro-QOL social domain item banks; evaluate the psychometric properties of items developed for adults with SCI; and report information to facilitate clinical and research use. We used a mixed-methods design to develop and evaluate Ability to Participate in Social Roles and Activities and Satisfaction with Social Roles and Activities items. Focus groups helped define the constructs; cognitive interviews helped revise items; and confirmatory factor analysis and item response theory methods helped calibrate item banks and evaluate differential item functioning related to demographic and injury characteristics. Five SCI Model System sites and one Veterans Administration medical center. The calibration sample consisted of 641 individuals; a reliability sample consisted of 245 individuals residing in the community. A subset of 27 Ability to Participate and 35 Satisfaction items demonstrated good measurement properties and negligible differential item functioning related to demographic and injury characteristics. The SCI-specific measures correlate strongly with the PROMIS and Neuro-QOL versions. Ten item short forms correlate >0.96 with the full banks. Variable-length CATs with a minimum of 4 items, variable-length CATs with a minimum of 8 items, fixed-length CATs of 10 items, and the 10-item short forms demonstrate construct coverage and measurement error that is comparable to the full item bank. The Ability to Participate and Satisfaction with Social Roles and Activities CATs and short forms demonstrate excellent psychometric properties and are suitable for clinical and research applications.
Older and younger adults differently judge the similarity between negative affect terms.
Ready, Rebecca E; Santorelli, Gennarina D; Mather, Molly A
2018-01-02
Theoretical models of aging suggest changes across the adult lifespan in the capacity to differentiate emotions. Greater emotion differentiation is associated with advantages in terms of emotion regulation and emotion resiliency. This study utilized a novel method that directly measures judgments of affect differentiation and does not confound affective experience with knowledge about affect terms. Theoretical predictions that older adults would distinguish more between affect terms than younger persons were tested. Older (n = 27; aged 60-92) and younger (n = 56; aged 18-32) adults rated the difference versus similarity of 16 affect terms from the Kessler and Staudinger ( 2009 ) scales; each of the 16 items was paired with every other item for a total of 120 ratings. Participants provided self-reports of trait emotions, alexithymia, and depressive symptoms. Older adults significantly differentiated more between low arousal and high arousal negative affect (NA) items than younger persons. Depressive symptoms were associated with similarity ratings across and within valence and arousal. Findings offer partial support for theoretical predictions that older adults differentiate more between affect terms than younger persons. To the extent that differentiating between negative affects can aid in emotion regulation, older adults may have an advantage over younger persons. Future research should investigate mechanisms that underlie age group differences in emotion differentiation.
Lo, Barbara Chuen Yee; Zhao, Yue; Kwok, Alice Wai Yee; Chan, Wai; Chan, Calais Kin Yuen
2017-07-01
The present study applied item response theory to examine the psychometric properties of the Asian Adolescent Depression Scale and to construct a short form among 1,084 teenagers recruited from secondary schools in Hong Kong. Findings suggested that some items of the full form reflected higher levels of severity and were more discriminating than others, and the Asian Adolescent Depression Scale was useful in measuring a broad range of depressive severity in community youths. Differential item functioning emerged in several items where females reported higher depressive severity than males. In the short form construction, preliminary validation suggested that, relative to the 20-item full form, our derived short form offered significantly greater diagnostic performance and stronger discriminatory ability in differentiating depressed and nondepressed groups, and simultaneously maintained adequate measurement precision with a reduced response burden in assessing depression in the Asian adolescents. Cultural variance in depressive symptomatology and clinical implications are discussed.
Zampetakis, Leonidas A.; Bakatsaki, Maria; Litos, Charalambos; Kafetsios, Konstantinos G.; Moustakis, Vassilis
2017-01-01
Over the past years the percentage of female entrepreneurs has increased, yet it is still far below of that for males. Although various attempts have been made to explain differences in mens’ and women’s entrepreneurial attitudes and intentions, the extent to which those differences are due to self-report biases has not been yet considered. The present study utilized Differential Item Functioning (DIF) to compare men and women’s reporting on entrepreneurial intentions. DIF occurs in situations where members of different groups show differing probabilities of endorsing an item despite possessing the same level of the ability that the item is intended to measure. Drawing on the theory of planned behavior (TPB), the present study investigated whether constructs such as entrepreneurial attitudes, perceived behavioral control, subjective norms and intention would show gender differences and whether these gender differences could be explained by DIF. Using DIF methods on a dataset of 1800 Greek participants (50.4% female) indicated that differences at the item-level are almost non-existent. Moreover, the differential test functioning (DTF) analysis, which allows assessing the overall impact of DIF effects with all items being taken into account simultaneously, suggested that the effect of DIF across all the items for each scale was negligible. Future research should consider that measurement invariance can be assumed when using TPB constructs for the study of entrepreneurial motivation independent of gender. PMID:28386244
Zampetakis, Leonidas A; Bakatsaki, Maria; Litos, Charalambos; Kafetsios, Konstantinos G; Moustakis, Vassilis
2017-01-01
Over the past years the percentage of female entrepreneurs has increased, yet it is still far below of that for males. Although various attempts have been made to explain differences in mens' and women's entrepreneurial attitudes and intentions, the extent to which those differences are due to self-report biases has not been yet considered. The present study utilized Differential Item Functioning (DIF) to compare men and women's reporting on entrepreneurial intentions. DIF occurs in situations where members of different groups show differing probabilities of endorsing an item despite possessing the same level of the ability that the item is intended to measure. Drawing on the theory of planned behavior (TPB), the present study investigated whether constructs such as entrepreneurial attitudes, perceived behavioral control, subjective norms and intention would show gender differences and whether these gender differences could be explained by DIF. Using DIF methods on a dataset of 1800 Greek participants (50.4% female) indicated that differences at the item-level are almost non-existent. Moreover, the differential test functioning (DTF) analysis, which allows assessing the overall impact of DIF effects with all items being taken into account simultaneously, suggested that the effect of DIF across all the items for each scale was negligible. Future research should consider that measurement invariance can be assumed when using TPB constructs for the study of entrepreneurial motivation independent of gender.
ERIC Educational Resources Information Center
Hou, Likun; de la Torre, Jimmy; Nandakumar, Ratna
2014-01-01
Analyzing examinees' responses using cognitive diagnostic models (CDMs) has the advantage of providing diagnostic information. To ensure the validity of the results from these models, differential item functioning (DIF) in CDMs needs to be investigated. In this article, the Wald test is proposed to examine DIF in the context of CDMs. This study…
ERIC Educational Resources Information Center
Paek, Insu; Wilson, Mark
2011-01-01
This study elaborates the Rasch differential item functioning (DIF) model formulation under the marginal maximum likelihood estimation context. Also, the Rasch DIF model performance was examined and compared with the Mantel-Haenszel (MH) procedure in small sample and short test length conditions through simulations. The theoretically known…
ERIC Educational Resources Information Center
Aryadoust, Vahid
2015-01-01
The present study uses a mixture Rasch model to examine latent differential item functioning in English as a foreign language listening tests. Participants (n = 250) took a listening and lexico-grammatical test and completed the metacognitive awareness listening questionnaire comprising problem solving (PS), planning and evaluation (PE), mental…
An analysis of the DuPage County Regional Office of Education physics exam
NASA Astrophysics Data System (ADS)
Muehsler, Hans
In 2009, the DuPage County Regional Office of Education (ROE) tasked volunteer physics teachers with creating a basic skills physics exam reflecting what the participants valued and shared in common across curricula. Mechanics, electricity & magnetism (E&M), and wave phenomena emerged as the primary constructs. The resulting exam was intended for first-exposure physics students. The most recently completed version was psychometrically assessed for unidimensionality within the constructs using a robust WLS structural equation model and for reliability. An item analysis using a 3-PL IRT model was performed on the mechanics items and a 2-PL IRT model was performed on the E&M and waves items; a distractor analysis was also performed on all items. Lastly, differential item functioning (DIF) and differential test functioning (DTF) analyses, using the Mantel-Haenszel procedure, were performed using gender, ethnicity, year in school, ELL, physics level, and math level as groupings.
Church, A Timothy; Alvarez, Juan M; Mai, Nhu T Q; French, Brian F; Katigbak, Marcia S; Ortiz, Fernando A
2011-11-01
Measurement invariance is a prerequisite for confident cross-cultural comparisons of personality profiles. Multigroup confirmatory factor analysis was used to detect differential item functioning (DIF) in factor loadings and intercepts for the Revised NEO Personality Inventory (P. T. Costa, Jr., & R. R. McCrae, 1992) in comparisons of college students in the United States (N = 261), Philippines (N = 268), and Mexico (N = 775). About 40%-50% of the items exhibited some form of DIF and item-level noninvariance often carried forward to the facet level at which scores are compared. After excluding DIF items, some facet scales were too short or unreliable for cross-cultural comparisons, and for some other facets, cultural mean differences were reduced or eliminated. The results indicate that considerable caution is warranted in cross-cultural comparisons of personality profiles.
NASA Astrophysics Data System (ADS)
Rahmani, B. D.
2018-01-01
The purpose of this paper is to evaluate Indonesian senior high school teacher’s pedagogical content knowledge also their perception toward curriculum changing in West Java Indonesia. The data used in this study were derived from a questionnaire survey conducted among teachers in Bandung, West Java. A total of 61 usable responses were collected. The Differential Item Functioning (DIFF) was used to analyze the data whether the item had a difference or not toward gender, education background also on school location. However, the result showed that there was no any significant difference on gender and school location toward the item response but educational background. As a conclusion, the teacher’s educational background influence on giving the response to the questionnaire. Therefore, it is suggested in the future to construct the items on the questionnaire which is coped the differences of the participant particularly the educational background.
Validation of a mobility item bank for older patients in primary care.
Cabrero-García, Julio; Ramos-Pichardo, Juan Diego; Muñoz-Mendoza, Carmen Luz; Cabañero-Martínez, María José; González-Llopis, Lorena; Reig-Ferrer, Abilio
2012-12-05
To develop and validate an item bank to measure mobility in older people in primary care and to analyse differential item functioning (DIF) and differential bundle functioning (DBF) by sex. A pool of 48 mobility items was administered by interview to 593 older people attending primary health care practices. The pool contained four domains based on the International Classification of Functioning: changing and maintaining body position, carrying, lifting and pushing, walking and going up and down stairs. The Late Life Mobility item bank consisted of 35 items, and measured with a reliability of 0.90 or more across the full spectrum of mobility, except at the higher end of better functioning. No evidence was found of non-uniform DIF but uniform DIF was observed, mainly for items in the changing and maintaining body position and carrying, lifting and pushing domains. The walking domain did not display DBF, but the other three domains did, principally the carrying, lifting and pushing items. During the design and validation of an item bank to measure mobility in older people, we found that strength (carrying, lifting and pushing) items formed a secondary dimension that produced DBF. More research is needed to determine how best to include strength items in a mobility measure, or whether it would be more appropriate to design separate measures for each construct.
Anchor Selection Strategies for DIF Analysis: Review, Assessment, and New Approaches
ERIC Educational Resources Information Center
Kopf, Julia; Zeileis, Achim; Strobl, Carolin
2015-01-01
Differential item functioning (DIF) indicates the violation of the invariance assumption, for instance, in models based on item response theory (IRT). For item-wise DIF analysis using IRT, a common metric for the item parameters of the groups that are to be compared (e.g., for the reference and the focal group) is necessary. In the Rasch model,…
Massof, Robert W
2014-10-01
A simple theoretical framework explains patient responses to items in rating scale questionnaires. Fixed latent variables position each patient and each item on the same linear scale. Item responses are governed by a set of fixed category thresholds, one for each ordinal response category. A patient's item responses are magnitude estimates of the difference between the patient variable and the patient's estimate of the item variable, relative to his/her personally defined response category thresholds. Differences between patients in their personal estimates of the item variable and in their personal choices of category thresholds are represented by random variables added to the corresponding fixed variables. Effects of intervention correspond to changes in the patient variable, the patient's response bias, and/or latent item variables for a subset of items. Intervention effects on patients' item responses were simulated by assuming the random variables are normally distributed with a constant scalar covariance matrix. Rasch analysis was used to estimate latent variables from the simulated responses. The simulations demonstrate that changes in the patient variable and changes in response bias produce indistinguishable effects on item responses and manifest as changes only in the estimated patient variable. Changes in a subset of item variables manifest as intervention-specific differential item functioning and as changes in the estimated person variable that equals the average of changes in the item variables. Simulations demonstrate that intervention-specific differential item functioning produces inefficiencies and inaccuracies in computer adaptive testing. © The Author(s) 2013 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
Differential item functioning analysis of the Vanderbilt Expertise Test for cars.
Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W; Van Gulick, Ana Beth; Gauthier, Isabel
2015-01-01
The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge.
Modelling Question Difficulty in an A Level Physics Examination
ERIC Educational Resources Information Center
Crisp, Victoria; Grayson, Rebecca
2013-01-01
"Item difficulty modelling" is a technique used for a number of purposes such as to support future item development, to explore validity in relation to the constructs that influence difficulty and to predict the difficulty of items. This research attempted to explore the factors influencing question difficulty in a general qualification…
Controlling Item Exposure Conditional on Ability in Computerized Adaptive Testing.
ERIC Educational Resources Information Center
Stocking, Martha L.; Lewis, Charles
1998-01-01
Ensuring item and pool security in a continuous testing environment is explored through a new method of controlling exposure rate of items conditional on ability level in computerized testing. Properties of this conditional control on exposure rate, when used in conjunction with a particular adaptive testing algorithm, are explored using simulated…
Choi, Bongkyoo; Kurowski, Alicia; Bond, Meg; Baker, Dean; Clays, Els; De Bacquer, Dirk; Punnett, Laura
2012-01-01
The construct validity of the Job Content Questionnaire (JCQ) psychological demands scale in relationship to physical demands has been inconsistent. This study aims to test quantitatively and qualitatively whether the scale validity differs by occupation. Hierarchical clustering analyses of 10 JCQ psychological and physical demands items were conducted in 61 occupations from two datasets: one of non-faculty workers at a university in the United States (6 occupations with 208 total workers) and the other of a Belgian working population (55 occupations with 13,039 total workers). The psychological and physical demands items overlapped in 13 of 61 occupation-stratified clustering analyses. Most of the overlaps occurred in physically-demanding occupations and involved the two psychological demands items, 'work fast' and 'work hard'. Generally, the scale reliability was low in such occupations. Additionally, interviews with eight university workers revealed that workers interpreted the two psychological demands items differently by the nature of their tasks. The scale validity was occupation-differential. The JCQ psychological job demands scale as a job demand measure has been used worldwide in many studies. This study indicates that the wordings of the 'work fast' and 'work hard' items of the scale need to be reworded enough to differentiate mental and physical job demands as intended, 'psychological.'
Johnson, Jeffrey D; Rugg, Michael D
2006-02-03
Retrieval orientation refers to the differential processing of retrieval cues according to the type of information sought from memory (e.g., words vs. pictures). In the present study, event-related potentials (ERPs) were employed to investigate whether the neural correlates of differential retrieval orientations are sensitive to the specificity of the retrieval demands of the test task. In separate study-test phases, subjects encoded lists of intermixed words and pictures, and then undertook one of two retrieval tests, in both of which the retrieval cues were exclusively words. In the recognition test, subjects performed 'old/new' discriminations on the test items, and old items corresponded to only one class of studied material (words or pictures). In the exclusion test, old items corresponded to both classes of study material, and subjects were required to respond 'old' only to test items corresponding to a designated class of material. Thus, demands for retrieval specificity were greater in the exclusion test than during recognition. ERPs elicited by correctly classified new items in the two types of test were contrasted according to whether words or pictures were the sought-for material. Material-dependent ERP effects were evident in both tests, but the effects onset earlier and offset later in the exclusion test. The findings suggest that differential processing of retrieval cues, and hence the adoption of differential retrieval orientations, varies according to the specificity of the retrieval goal.
Haggerty, Jeannie L; Levesque, Jean-Frédéric
2017-04-01
Patients are the most valid source for evaluating the accessibility of services, but a previous study observed differential psychometric performance of instruments in rural and urban respondents. To validate a measure of organizational accessibility free of differential rural-urban performance that predicts consequences of difficult access for patient-initiated care. Sequential qualitative-quantitative study. Qualitative findings used to adapt or develop evaluative and reporting items. Quantitative validation study. Primary data by telephone from 750 urban, rural and remote respondents in Quebec, Canada; follow-up mailed questionnaire to a subset of 316. Items were developed for barriers along the care trajectory. We used common factor and confirmatory factor analysis to identify constructs and compare models. We used item response theory analysis to test for differential rural-urban performance; examine individual item performance; adjust response options; and exclude redundant or non-discriminatory items. We used logistic regression to examine predictive validity of the subscale on access difficulty (outcome). Initial factor resolution suggested geographic and organizational dimensions, plus consequences of access difficulty. After second administration, organizational accommodation and geographic indicators were integrated into a 6-item subscale of Effective Availability and Accommodation, which demonstrates good variability and internal consistency (α = 0.84) and no differential functioning by geographic area. Each unit increase predicts decreased likelihood of consequences of access difficulties (unmet need and problem aggravation). The new subscale is a practical, valid and reliable measure for patients to evaluate first-contact health services accessibility, yielding valid comparisons between urban and rural contexts. © 2016 The Authors. Health Expectations published by John Wiley & Sons Ltd.
KEDROWICZ, APRIL A.; ROYAL, KENNETH; FLAMMER, KEVEN
2016-01-01
Introduction: While social media has the potential to be used to make professional and personal connections, it can also be used inappropriately, with detrimental ramifications for the individual in terms of their professional reputation and even hiring decisions. This research explored students’ and faculty members’ perceptions of the acceptability of various social media postings. Methods: This cross-sectional study was conducted in 2015. All students and faculty members at the College of Veterinary Medicine were invited to participate. The sample size included 140 students and 69 faculty members who completed the Social Media Scale (SMS), a 7-point semantic differential scale. The SMS consisted of 12 items that measured the extent to which a variety of behaviors, using social media, constituted acceptable and unacceptable behaviors. Items appearing on the SMS were an amalgamation of modified items previously presented by Coe, Weijs, Muise et al. (2012) and new items generated specifically for this study. The data were collected during the spring semester of 2015 using Qualtrics online survey software and analyzed using t-tests and ANOVA. Results: The results showed that statistically significant differences existed between the students’ and faculty members’ ratings of acceptable behavior, as well as gender differences and differences across class years. Conclusion: These findings have implications for the development of policy and educational initiatives around professional identity management in the social sphere. PMID:27795965
Harris, Keith M; Aboujaoude, Elias
2016-08-01
Online relationships are increasingly central to many people's lives. As a result, there is a growing need to scientifically examine their psychosocial implications. This study developed and tested the Online Relationship Initiation Scale (ORIS) through classical and item response theory analyses to address this need. An anonymous online survey included 713 adults, aged 18-71 years. The ORIS was tested on psychometric properties and examined for associations with gender and several standardized psychosocial measures. Results demonstrated unidimensionality of nine items, strong factor loadings, and high internal consistency (α = 0.90, ωt = 0.94). All items captured significant information on the latent trait and none showed differential item functioning by sex, age group, or ethnicity. General linear modeling confirmed hypotheses that men were more likely than women to initiate online relationships. Online relationship initiation was not strongly associated with perceived social support, but was positively related to financial distress, and willingness to engage in infidelity or unprotected sex. The ORIS was negatively associated with age and satisfaction with life and showed modest interactions with ethnicity and hours online. This study provided empirical evidence for an interpersonal relationship initiation construct. The ORIS was shown to be a psychometrically sound instrument for evaluating online interpersonal behaviors and their associations with psychosocial and demographic factors. Such psychometrically sound instruments can be useful in exploring online interpersonal behaviors and their significance.
Kedrowicz, April A; Royal, Kenneth; Flammer, Keven
2016-10-01
While social media has the potential to be used to make professional and personal connections, it can also be used inappropriately, with detrimental ramifications for the individual in terms of their professional reputation and even hiring decisions. This research explored students' and faculty members' perceptions of the acceptability of various social media postings. This cross-sectional study was conducted in 2015. All students and faculty members at the College of Veterinary Medicine were invited to participate. The sample size included 140 students and 69 faculty members who completed the Social Media Scale (SMS), a 7-point semantic differential scale. The SMS consisted of 12 items that measured the extent to which a variety of behaviors, using social media, constituted acceptable and unacceptable behaviors. Items appearing on the SMS were an amalgamation of modified items previously presented by Coe, Weijs, Muise et al. (2012) and new items generated specifically for this study. The data were collected during the spring semester of 2015 using Qualtrics online survey software and analyzed using t-tests and ANOVA. The results showed that statistically significant differences existed between the students' and faculty members' ratings of acceptable behavior, as well as gender differences and differences across class years. These findings have implications for the development of policy and educational initiatives around professional identity management in the social sphere.
Electronic Quality of Life Assessment Using Computer-Adaptive Testing
2016-01-01
Background Quality of life (QoL) questionnaires are desirable for clinical practice but can be time-consuming to administer and interpret, making their widespread adoption difficult. Objective Our aim was to assess the performance of the World Health Organization Quality of Life (WHOQOL)-100 questionnaire as four item banks to facilitate adaptive testing using simulated computer adaptive tests (CATs) for physical, psychological, social, and environmental QoL. Methods We used data from the UK WHOQOL-100 questionnaire (N=320) to calibrate item banks using item response theory, which included psychometric assessments of differential item functioning, local dependency, unidimensionality, and reliability. We simulated CATs to assess the number of items administered before prespecified levels of reliability was met. Results The item banks (40 items) all displayed good model fit (P>.01) and were unidimensional (fewer than 5% of t tests significant), reliable (Person Separation Index>.70), and free from differential item functioning (no significant analysis of variance interaction) or local dependency (residual correlations < +.20). When matched for reliability, the item banks were between 45% and 75% shorter than paper-based WHOQOL measures. Across the four domains, a high standard of reliability (alpha>.90) could be gained with a median of 9 items. Conclusions Using CAT, simulated assessments were as reliable as paper-based forms of the WHOQOL with a fraction of the number of items. These properties suggest that these item banks are suitable for computerized adaptive assessment. These item banks have the potential for international development using existing alternative language versions of the WHOQOL items. PMID:27694100
Likelihood-Ratio DIF Testing: Effects of Nonnormality
ERIC Educational Resources Information Center
Woods, Carol M.
2008-01-01
Differential item functioning (DIF) occurs when an item has different measurement properties for members of one group versus another. Likelihood-ratio (LR) tests for DIF based on item response theory (IRT) involve statistically comparing IRT models that vary with respect to their constraints. A simulation study evaluated how violation of the…
Assessment of Preference for Edible and Leisure Items in Individuals with Dementia
ERIC Educational Resources Information Center
Ortega, Javier Virues; Iwata, Brian A.; Nogales-Gonzalez, Celia; Frades, Belen
2012-01-01
We conducted 2 studies on reinforcer preference in patients with dementia. Results of preference assessments yielded differential selections by 14 participants. Unlike prior studies with individuals with intellectual disabilities, all participants showed a noticeable preference for leisure items over edible items. Results of a subsequent analysis…
Different Approaches to Covariate Inclusion in the Mixture Rasch Model
ERIC Educational Resources Information Center
Li, Tongyun; Jiao, Hong; Macready, George B.
2016-01-01
The present study investigates different approaches to adding covariates and the impact in fitting mixture item response theory models. Mixture item response theory models serve as an important methodology for tackling several psychometric issues in test development, including the detection of latent differential item functioning. A Monte Carlo…
ERIC Educational Resources Information Center
Benítez, Isabel; Padilla, José-Luis
2014-01-01
Differential item functioning (DIF) can undermine the validity of cross-lingual comparisons. While a lot of efficient statistics for detecting DIF are available, few general findings have been found to explain DIF results. The objective of the article was to study DIF sources by using a mixed method design. The design involves a quantitative phase…
ERIC Educational Resources Information Center
Abd-El-Fattah, Sabry M.; AL-Sinani, Yousra; El Shourbagi, Sahar; Fakhroo, Hessa A.
2014-01-01
This study uses the Rasch model technique to examine the dimensionality structure and differential item functioning of the Arabic version of the Perceived Physical Ability Scale for Children (PPASC). A sample of 220 Omani fourth graders (120 males and 100 females) responded to an Arabic translated version of the PPASC. Data on students'…
ERIC Educational Resources Information Center
French, Brian F.; Gotch, Chad M.
2013-01-01
The Brigance Comprehensive Inventory of Basic Skills-II (CIBS-II) is a diagnostic battery intended for children in grades 1st through 6th. The aim of this study was to test for item invariance, or differential item functioning (DIF), of the CIBS-II across sex in the standardization sample through the use of item response theory DIF detection…
ERIC Educational Resources Information Center
Mayes, Susan D.
2018-01-01
The smallest subset of items from the 30-item Checklist for Autism Spectrum Disorder (CASD) that differentiated 607 referred children (3-17 years) with and without autism with 100% accuracy was identified. This 6-item subset (CASD-Short Form) was cross-validated on an independent sample of 397 referred children (1-18 years) with and without autism…
Cross-Cultural Validation of the Quality of Life in Hand Eczema Questionnaire (QOLHEQ).
Ofenloch, Robert F; Oosterhaven, Jart A F; Susitaival, Päivikki; Svensson, Åke; Weisshaar, Elke; Minamoto, Keiko; Onder, Meltem; Schuttelaar, Marie Louise A; Bulbul Baskan, Emel; Diepgen, Thomas L; Apfelbacher, Christian
2017-07-01
The Quality of Life in Hand Eczema Questionnaire (QOLHEQ) is the only instrument assessing disease-specific health-related quality of life in patients with hand eczema. It is available in eight language versions. In this study we assessed if the items of different language versions of the QOLHEQ yield comparable values across countries. An international multicenter study was conducted with participating centers in Finland, Germany, Japan, The Netherlands, Sweden, and Turkey. Methods of item response theory were applied to each subscale to assess differential item functioning for items among countries. Overall, 662 hand eczema patients were recruited into the study. Single items were removed or split according to the item response theory model by country to resolve differential item functioning. After this adjustment, none of the four subscales of the QOLHEQ showed significant misfit to the item response theory model (P < 0.01), and a Person Separation Index of greater than 0.7 showed good internal consistency for each subscale. By adapting the scoring of the QOLHEQ using the methods of item response theory, it was possible to obtain QOLHEQ values that are comparable across countries. Cross-cultural variations in the interpretation of single items were resolved. The QOLHEQ is now ready to be used in international studies assessing the health-related quality of life impact of hand eczema. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Fischer, H Felix; Wahl, Inka; Nolte, Sandra; Liegl, Gregor; Brähler, Elmar; Löwe, Bernd; Rose, Matthias
2017-12-01
To investigate differential item functioning (DIF) of PROMIS Depression items between US and German samples we compared data from the US PROMIS calibration sample (n = 780), a German general population survey (n = 2,500) and a German clinical sample (n = 621). DIF was assessed in an ordinal logistic regression framework, with 0.02 as criterion for R 2 -change and 0.096 for Raju's non-compensatory DIF. Item parameters were initially fixed to the PROMIS Depression metric; we used plausible values to account for uncertainty in depression estimates. Only four items showed DIF. Accounting for DIF led to negligible effects for the full item bank as well as a post hoc simulated computer-adaptive test (< 0.1 point on the PROMIS metric [mean = 50, standard deviation =10]), while the effect on the short forms was small (< 1 point). The mean depression severity (43.6) in the German general population sample was considerably lower compared to the US reference value of 50. Overall, we found little evidence for language DIF between US and German samples, which could be addressed by either replacing the DIF items by items not showing DIF or by scoring the short form in German samples with the corrected item parameters reported. Copyright © 2016 John Wiley & Sons, Ltd.
Exploring the Manifestations of Anxiety in Children with Autism Spectrum Disorders
ERIC Educational Resources Information Center
Hallett, Victoria; Lecavalier, Luc; Sukhodolsky, Denis G.; Cipriano, Noreen; Aman, Michael G.; McCracken, James T.; McDougle, Christopher J.; Tierney, Elaine; King, Bryan H.; Hollander, Eric; Sikich, Linmarie; Bregman, Joel; Anagnostou, Evdokia; Donnelly, Craig; Katsovich, Lily; Dukes, Kimberly; Vitiello, Benedetto; Gadow, Kenneth; Scahill, Lawrence
2013-01-01
This study explores the manifestation and measurement of anxiety symptoms in 415 children with ASDs on a 20-item, parent-rated, DSM-IV referenced anxiety scale. In both high and low-functioning children (IQ above vs. below 70), commonly endorsed items assessed restlessness, tension and sleep difficulties. Items requiring verbal expression of worry…
The clinical content of NHS trust board meetings: an initial exploration.
Watkins, Mary; Jones, Ray; Lindsey, Laura; Sheaff, Rod
2008-09-01
To differentiate between English NHS trust board meetings according to the percentage of clinical content and to explore which characteristics of board meetings might explain this. Definition of scoring system for clinical content. Scoring of minutes for a random sample of 60 trusts. Qualitative analysis of a sub-sample, generated hypotheses about factors leading to higher percentage of clinical items was undertaken; testing of hypotheses in a longitudinal sample of minutes from 24 trusts over 1 year. Clinical content varied from 2% to 30%. Boards with a more clinical focus tended to link other issues including finance to clinical issues; have non-executive directors able to question board executives openly; make less use of acronyms in minutes; had more liaison with social services; and accepted questions from the public. Counting items in board minutes has prima facie validity as a means of defining how clinically focussed board meetings are, although more research is required to refine the method. The present method of analysing board minutes may provide one way of assessing board culture. Directors of nursing can help focus trust board meetings on clinical matters. Further research is required to determine whether greater clinical content in trust board meetings has impacts on clinical practice or organizational performance.
Peterson, Alexander C; Sutherland, Jason M; Liu, Guiping; Crump, R Trafford; Karimuddin, Ahmer A
2018-06-01
The Fecal Incontinence Quality of Life Scale (FIQL) is a commonly used patient-reported outcome measure for fecal incontinence, often used in clinical trials, yet has not been validated in English since its initial development. This study uses modern methods to thoroughly evaluate the psychometric characteristics of the FIQL and its potential for differential functioning by gender. This study analyzed prospectively collected patient-reported outcome data from a sample of patients prior to colorectal surgery. Patients were recruited from 14 general and colorectal surgeons in Vancouver Coastal Health hospitals in Vancouver, Canada. Confirmatory factor analysis was used to assess construct validity. Item response theory was used to evaluate test reliability, describe item-level characteristics, identify local item dependence, and test for differential functioning by gender. 236 patients were included for analysis, with mean age 58 and approximately half female. Factor analysis failed to identify the lifestyle, coping, depression, and embarrassment domains, suggesting lack of construct validity. Items demonstrated low difficulty, indicating that the test has the highest reliability among individuals who have low quality of life. Five items are suggested for removal or replacement. Differential test functioning was minimal. This study has identified specific improvements that can be made to each domain of the Fecal Incontinence Quality of Life Scale and to the instrument overall. Formatting, scoring, and instructions may be simplified, and items with higher difficulty developed. The lifestyle domain can be used as is. The embarrassment domain should be significantly revised before use.
Evidence-based HIV pilot program for Chinese college students: Differences by gender.
Tung, Wei-Chen; Serratt, Teresa D; Lu, Minggen
2015-06-01
This study explored gender differences in the effectiveness of the translated VOICES (Video Opportunities for Condom Education and Safer Sex) intervention on the condom use intention, perceived benefits and barriers to condom use, condom use self-efficacy, and HIV/AIDS knowledge among Chinese students in a US university. We utilized a pretest/post-test quasi-experimental design and recruited 67 Chinese students at the local university. Participants viewed a 20-min video with Chinese subtitles, attended one 25-min small group discussion and condom interactive educational activity. Female participants showed significantly greater mean scores of perceived benefits and condom use self-efficacy, in comparison with male participants. Female participants also reported significantly higher scores than male participants in five of the perceived benefits items and one self-efficacy item. These study results provide important information for developing more differentiated intervention strategies specific to gender for HIV and STI education programs. © 2014 Wiley Publishing Asia Pty Ltd.
Williams, Helen L; Moulin, Chris J A
2015-01-01
In the Remember-Know paradigm whether a Know response is defined as a high-confidence state of certainty or a low-confidence state based on familiarity varies across researchers and can influence participants' responses. The current experiment was designed to explore differences between the states of Know and Familiar. Participants studied others' justification statements to "Know" recognition decisions and separated them into two types. Crucially, participants were not provided definitions of Know and Familiar on which to sort the items--their judgements were based solely on the phenomenology described in the justifications. Participants' sorting decisions were shown to reliably map onto expert classification of Know and Familiar. Post-task questionnaire responses demonstrated that both the level of memory detail and confidence expressed in the justifications were central to how participants categorised the items. In sum, given no instructions to do so, participants classify Familiar and Know according to two dimensions: confidence and amount of information retrieved.
Hart, Dennis L; Werneke, Mark W; George, Steven Z; Matheson, James W; Wang, Ying-Chih; Cook, Karon F; Mioduski, Jerome E; Choi, Seung W
2009-08-01
Screening people for elevated levels of fear-avoidance beliefs is uncommon, but elevated levels of fear could worsen outcomes. Developing short screening tools might reduce the data collection burden and facilitate screening, which could prompt further testing or management strategy modifications to improve outcomes. The purpose of this study was to develop efficient yet accurate screening methods for identifying elevated levels of fear-avoidance beliefs regarding work or physical activities in people receiving outpatient rehabilitation. A secondary analysis of data collected prospectively from people with a variety of common neuromusculoskeletal diagnoses was conducted. Intake Fear-Avoidance Beliefs Questionnaire (FABQ) data were collected from 17,804 people who had common neuromusculoskeletal conditions and were receiving outpatient rehabilitation in 121 clinics in 26 states (in the United States). Item response theory (IRT) methods were used to analyze the FABQ data, with particular emphasis on differential item functioning among clinically logical groups of subjects, and to identify screening items. The accuracy of screening items for identifying subjects with elevated levels of fear was assessed with receiver operating characteristic analyses. Three items for fear of physical activities and 10 items for fear of work activities represented unidimensional scales with adequate IRT model fit. Differential item functioning was negligible for variables known to affect functional status outcomes: sex, age, symptom acuity, surgical history, pain intensity, condition severity, and impairment. Items that provided maximum information at the median for the FABQ scales were selected as screening items to dichotomize subjects by high versus low levels of fear. The accuracy of the screening items was supported for both scales. This study represents a retrospective analysis, which should be replicated using prospective designs. Future prospective studies should assess the reliability and validity of using one FABQ item to screen people for high levels of fear-avoidance beliefs. The lack of differential item functioning in the FABQ scales in the sample tested in this study suggested that FABQ screening could be useful in routine clinical practice and allowed the development of single-item screening for fear-avoidance beliefs that accurately identified subjects with elevated levels of fear. Because screening was accurate and efficient, single IRT-based FABQ screening items are recommended to facilitate improved evaluation and care of heterogeneous populations of people receiving outpatient rehabilitation.
Garcia-Burgos, David; Lao, Junpeng; Munsch, Simone; Caldara, Roberto
2017-07-01
Although attentional biases towards food cues may play a critical role in food choices and eating behaviours, it remains largely unexplored which specific food attribute governs visual attentional deployment. The allocation of visual attention might be modulated by anticipatory postingestive consequences, from taste sensations derived from eating itself, or both. Therefore, in order to obtain a comprehensive understanding of the attentional mechanisms involved in the processing of food-related cues, we recorded the eye movements to five categories of well-standardised pictures: neutral non-food, high-calorie, good taste, distaste and dangerous food. In particular, forty-four healthy adults of both sexes were assessed with an antisaccade paradigm (which requires the generation of a voluntary saccade and the suppression of a reflex one) and a free viewing paradigm (which implies the free visual exploration of two images). The results showed that observers directed their initial fixations more often and faster on items with high survival relevance such as nutrient and possible dangers; although an increase in antisaccade error rates was only detected for high-calorie items. We also found longer prosaccade fixation duration and initial fixation duration bias score related to maintained attention towards high-calorie, good taste and danger categories; while shorter reaction times to correct an incorrect prosaccade related to less difficulties in inhibiting distasteful images. Altogether, these findings suggest that visual attention is differentially modulated by both the accepted and rejected food attributes, but also that normal-weight, non-eating disordered individuals exhibit enhanced approach to food's postingestive effects and avoidance of distasteful items (such as bitter vegetables or pungent products). Copyright © 2017 Elsevier Ltd. All rights reserved.
2012-01-01
Background The mini-Mental Adjustment to Cancer Scale (mini-MAC) is a well-recognised, popular measure of coping in psycho-oncology and assesses five cancer-specific coping strategies. It has been suggested that these five subscales could be grouped to form the over-arching adaptive and maladptive coping subscales to facilitate the interpretation and clinical application of the scale. Despite the popularity of the mini-MAC, few studies have examined its psychometric properties among long-term cancer survivors, and further validation of the mini-MAC is needed to substantiate its use with the growing population of survivors. Therefore, this study examined the psychometric properties and dimensionality of the mini-MAC in a sample of long-term cancer survivors using Rasch analysis. Methods RUMM 2030 was used to analyse the mini-MAC data (n=851). Separate Rasch analyses were conducted for each of the original mini-MAC subscales as well as the over-arching adaptive and maladaptive coping subscales to examine summary and individual model fit statistics, person separation index (PSI), response format, local dependency, targeting, item bias (or differential item functioning -DIF), and dimensionality. Results For the fighting spirit, fatalism, and helplessness-hopelessness subscales, a revised three-point response format seemed more optimal than the original four-point response. To achieve model fit, items were deleted from four of the five subscales – Anxious Preoccupation items 7, 25, and 29; Cognitive Avoidance items 11 and 17; Fighting Spirit item 18; and Helplessness-Hopelessness items 16 and 20. For those subscales with sufficient items, analyses supported unidimensionality. Combining items to form the adaptive and maladaptive subscales was partially supported. Conclusions The original five subscales required item deletion and/or rescaling to improve goodness of fit to the Rasch model. While evidence was found for overarching subscales of adaptive and maladaptive coping, extensive modifications were necessary to achieve this result. Further exploration and validation of over-arching subscales assessing adaptive and maladaptive coping is necessary with cancer survivors. PMID:22607052
Martinková, Patrícia; Drabinová, Adéla; Liaw, Yuan-Ling; Sanders, Elizabeth A.; McFarland, Jenny L.; Price, Rebecca M.
2017-01-01
We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments. After explaining a number of methodological approaches, we test for gender bias in two scenarios that demonstrate why DIF analysis is crucial for developing assessments, particularly because simply comparing two groups’ total scores can lead to incorrect conclusions about test fairness. First, a significant difference between groups on total scores can exist even when items are not biased, as we illustrate with data collected during the validation of the Homeostasis Concept Inventory. Second, item bias can exist even when the two groups have exactly the same distribution of total scores, as we illustrate with a simulated data set. We also present a brief overview of how DIF analysis has been used in the biology education literature to illustrate the way DIF items need to be reevaluated by content experts to determine whether they should be revised or removed from the assessment. Finally, we conclude by arguing that DIF analysis should be used routinely to evaluate items in developing conceptual assessments. These steps will ensure more equitable—and therefore more valid—scores from conceptual assessments. PMID:28572182
Use of Automated Scoring Features to Generate Hypotheses Regarding Language-Based DIF
ERIC Educational Resources Information Center
Shermis, Mark D.; Mao, Liyang; Mulholland, Matthew; Kieftenbeld, Vincent
2017-01-01
This study uses the feature sets employed by two automated scoring engines to determine if a "linguistic profile" could be formulated that would help identify items that are likely to exhibit differential item functioning (DIF) based on linguistic features. Sixteen items were administered to 1200 students where demographic information…
Differential Item Functioning Amplification and Cancellation in a Reading Test
ERIC Educational Resources Information Center
Bao, Han; Dayton, C. Mitchell; Hendrickson, Amy B.
2009-01-01
When testlet effects and item idiosyncratic features are both considered to be the reasons of DIF in educational tests using testlets (Wainer & Kiely, 1987) or item bundles (Rosenbaum, 1988), it is interesting to investigate the phenomena of DIF amplification and cancellation due to the interactive effects of these two factors. This research…
A Proposed System of "Project Management" for Study Items.
ERIC Educational Resources Information Center
Worcester Public Schools, MA.
The purposes of the proposed system are to provide a standard operating procedure for a systematic and effective handling of project-type study items as differentiated from informational-type items; to assign definite singular responsibility for projects; to suggest specific sequential steps to be taken in the preparation of the project report;…
Donati, Maria Anna; Chiesi, Francesca; Izzo, Viola A; Primi, Caterina
2017-01-01
As there is a lack of evidence attesting the equivalent item functioning across genders for the most employed instruments used to measure pathological gambling in adolescence, the present study was aimed to test the gender invariance of the Gambling Behavior Scale for Adolescents (GBS-A), a new measurement tool to assess the severity of Gambling Disorder (GD) in adolescents. The equivalence of the items across genders was assessed by analyzing Differential Item Functioning within an Item Response Theory framework. The GBS-A was administered to 1,723 adolescents, and the graded response model was employed. The results attested the measurement equivalence of the GBS-A when administered to male and female adolescent gamblers. Overall, findings provided evidence that the GBS-A is an effective measurement tool of the severity of GD in male and female adolescents and that the scale was unbiased and able to relieve truly gender differences. As such, the GBS-A can be profitably used in educational interventions and clinical treatments with young people.
ERIC Educational Resources Information Center
Anagnostopoulou, Kyriaki; Hatzinikita, Vassilia; Christidou, Vasilia; Dimopoulos, Kostas
2013-01-01
The paper explores the relationship of the global and the local assessment discourses as expressed by Programme for International Student Assessment (PISA) test items and school-based examinations, respectively. To this end, the paper compares PISA test items related to living systems and the context of life, health, and environment, with Greek…
Janulis, Patrick; Newcomb, Michael E; Sullivan, Patrick; Mustanski, Brian
2018-01-01
Knowledge about the transmission, prevention, and treatment of HIV remains a critical element in psychosocial models of HIV risk behavior and is commonly used as an outcome in HIV prevention interventions. However, most HIV knowledge questions have not undergone rigorous psychometric testing such as using item response theory. The current study used data from six studies of men who have sex with men (MSM; n = 3565) to (1) examine the item properties of HIV knowledge questions, (2) test for differential item functioning on commonly studied characteristics (i.e., age, race/ethnicity, and HIV risk behavior), (3) select items with the optimal item characteristics, and (4) leverage this combined dataset to examine the potential moderating effect of age on the relationship between condomless anal sex (CAS) and HIV knowledge. Findings indicated that existing questions tend to poorly differentiate those with higher levels of HIV knowledge, but items were relatively robust across diverse individuals. Furthermore, age moderated the relationship between CAS and HIV knowledge with older MSM having the strongest association. These findings suggest that additional items are required in order to capture a more nuanced understanding of HIV knowledge and that the association between CAS and HIV knowledge may vary by age.
Differential item functioning analysis of the Vanderbilt Expertise Test for cars
Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W.; Van Gulick, Ana Beth; Gauthier, Isabel
2015-01-01
The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge. PMID:26418499
Negative Symptom Dimensions of the Positive and Negative Syndrome Scale Across Geographical Regions
Liharska, Lora; Harvey, Philip D.; Atkins, Alexandra; Ulshen, Daniel; Keefe, Richard S.E.
2017-01-01
Objective: Recognizing the discrete dimensions that underlie negative symptoms in schizophrenia and how these dimensions are understood across localities might result in better understanding and treatment of these symptoms. To this end, the objectives of this study were to 1) identify the Positive and Negative Syndrome Scale negative symptom dimensions of expressive deficits and experiential deficits and 2) analyze performance on these dimensions over 15 geographical regions to determine whether the items defining them manifest similar reliability across these regions. Design: Data were obtained for the baseline Positive and Negative Syndrome Scale visits of 6,889 subjects across 15 geographical regions. Using confirmatory factor analysis, we examined whether a two-factor negative symptom structure that is found in schizophrenia (experiential deficits and expressive deficits) would be replicated in our sample, and using differential item functioning, we tested the degree to which specific items from each negative symptom subfactor performed across geographical regions in comparison with the United States. Results: The two-factor negative symptom solution was replicated in this sample. Most geographical regions showed moderate-to-large differential item functioning for Positive and Negative Syndrome Scale expressive deficit items, especially N3 Poor Rapport, as compared with Positive and Negative Syndrome Scale experiential deficit items, showing that these items might be interpreted or scored differently in different regions. Across countries, except for India, the differential item functioning values did not favor raters in the United States. Conclusion: These results suggest that the Positive and Negative Syndrome Scale negative symptom factor can be better represented by a two-factor model than by a single-factor model. Additionally, the results show significant differences in responses to items representing the Positive and Negative Syndrome Scale expressive factors, but not the experiential factors, across regions. This could be due to a lack of equivalence between the original and translated versions, cultural differences with the interpretation of items, dissimilarities in rater training, or diversity in the understanding of scoring anchors. Knowing which items are challenging for raters across regions can help to guide Positive and Negative Syndrome Scale training and improve the results of international clinical trials aimed at negative symptoms. PMID:29410935
Item Response Theory Applied to Factors Affecting the Patient Journey Towards Hearing Rehabilitation
Chenault, Michelene; Berger, Martijn; Kremer, Bernd; Anteunis, Lucien
2016-01-01
To develop a tool for use in hearing screening and to evaluate the patient journey towards hearing rehabilitation, responses to the hearing aid rehabilitation questionnaire scales aid stigma, pressure, and aid unwanted addressing respectively hearing aid stigma, experienced pressure from others; perceived hearing aid benefit were evaluated with item response theory. The sample was comprised of 212 persons aged 55 years or more; 63 were hearing aid users, 64 with and 85 persons without hearing impairment according to guidelines for hearing aid reimbursement in the Netherlands. Bias was investigated relative to hearing aid use and hearing impairment within the differential test functioning framework. Items compromising model fit or demonstrating differential item functioning were dropped. The aid stigma scale was reduced from 6 to 4, the pressure scale from 7 to 4, and the aid unwanted scale from 5 to 4 items. This procedure resulted in bias-free scales ready for screening purposes and application to further understand the help-seeking process of the hearing impaired. PMID:28028428
Langer, Michelle M.; Hill, Cheryl D.; Thissen, David; Burwinkle, Tasha M.; Varni, James W.; DeWalt, Darren A.
2008-01-01
Objective To demonstrate the value of item response theory (IRT) and differential item functioning (DIF) methods in examining a health-related quality of life (HRQOL) measure in children and adolescents. Study Design and Setting This illustration uses data from 5,429 children using the four subscales of the PedsQL™ 4.0 Generic Core Scales. The IRT model-based likelihood ratio test was used to detect and evaluate DIF between healthy children and children with a chronic condition. Results DIF was detected for a majority of items but cancelled out at the total test score level due to opposing directions of DIF. Post-hoc analysis indicated that this pattern of results may be due to multidimensionality. We discuss issues in detecting and handling DIF. Conclusion This paper describes how to perform DIF analyses in validating a questionnaire to ensure that scores have equivalent meaning across subgroups. It offers insight into ways information gained through the analysis can be used to evaluate an existing scale. PMID:18226750
Palmiero, Massimiliano; Di Matteo, Rosalia; Belardinelli, Marta Olivetti
2014-05-01
Two experiments comparing imaginative processing in different modalities and semantic processing were carried out to investigate the issue of whether conceptual knowledge can be represented in different format. Participants were asked to judge the similarity between visual images, auditory images, and olfactory images in the imaginative block, if two items belonged to the same category in the semantic block. Items were verbally cued in both experiments. The degree of similarity between the imaginative and semantic items was changed across experiments. Experiment 1 showed that the semantic processing was faster than the visual and the auditory imaginative processing, whereas no differentiation was possible between the semantic processing and the olfactory imaginative processing. Experiment 2 revealed that only the visual imaginative processing could be differentiated from the semantic processing in terms of accuracy. These results showed that the visual and auditory imaginative processing can be differentiated from the semantic processing, although both visual and auditory images strongly rely on semantic representations. On the contrary, no differentiation is possible within the olfactory domain. Results are discussed in the frame of the imagery debate.
Psychometric properties of the Triarchic Psychopathy Measure: An item response theory approach.
Shou, Yiyun; Sellbom, Martin; Xu, Jing
2018-05-01
There is cumulative evidence for the cross-cultural validity of the Triarchic Psychopathy Measure (TriPM; Patrick, 2010) among non-Western populations. Recent studies using correlational and regression analyses show promising construct validity of the TriPM in Chinese samples. However, little is known about the efficiency of items in TriPM in assessing the proposed latent traits. The current study evaluated the psychometric properties of the Chinese TriPM at the item level using item response theory analyses. It also examined the measurement invariance of the TriPM between the Chinese and the U.S. student samples by applying differential item functioning analyses under the item response theory framework. The results supported the unidimensional nature of the Disinhibition and Meanness scales. Both scales had a greater level of precision in the respective underlying constructs at the positive ends. The two scales, however, had several items that were weakly associated with their respective latent traits in the Chinese student sample. Boldness, on the other hand, was found to be multidimensional, and reflected a more normally distributed range of variation. The examination of measurement bias via differential item functioning analyses revealed that a number of items of the TriPM were not equivalent across the Chinese and the U.S. Some modification and adaptation of items might be considered for improving the precision of the TriPM for Chinese participants. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Haroz, E E; Bolton, P; Gross, A; Chan, K S; Michalopoulos, L; Bass, J
2016-07-01
Prevalence estimates of depression vary between countries, possibly due to differential functioning of items between settings. This study compared the performance of the widely used Hopkins symptom checklist 15-item depression scale (HSCL-15) across multiple settings using item response theory analyses. Data came from adult populations in the low and middle income countries (LMIC) of Colombia, Indonesia, Kurdistan Iraq, Rwanda, Iraq, Thailand (Burmese refugees), and Uganda (N = 4732). Item parameters based on a graded response model were compared across LMIC settings. Differential item functioning (DIF) by setting was evaluated using multiple indicators multiple causes (MIMIC) models. Most items performed well across settings except items related to suicidal ideation and "loss of sexual interest or pleasure," which had low discrimination parameters (suicide: a = 0.31 in Thailand to a = 2.49 in Indonesia; sexual interest: a = 0.74 in Rwanda to a = 1.26 in one region of Kurdistan). Most items showed some degree of DIF, but DIF only impacted aggregate scale-level scores in Indonesia. Thirteen of the 15 HSCL depression items performed well across diverse settings, with most items showing a strong relationship to the underlying trait of depression. The results support the cross-cultural applicability of most of these depression symptoms across LMIC settings. DIF impacted aggregate depression scores in one setting illustrating a possible source of measurement invariance in prevalence estimates.
NASA Astrophysics Data System (ADS)
Greenberg, Ariela Caren
Differential item functioning (DIF) and differential distractor functioning (DDF) are methods used to screen for item bias (Camilli & Shepard, 1994; Penfield, 2008). Using an applied empirical example, this mixed-methods study examined the congruency and relationship of DIF and DDF methods in screening multiple-choice items. Data for Study I were drawn from item responses of 271 female and 236 male low-income children on a preschool science assessment. Item analyses employed a common statistical approach of the Mantel-Haenszel log-odds ratio (MH-LOR) to detect DIF in dichotomously scored items (Holland & Thayer, 1988), and extended the approach to identify DDF (Penfield, 2008). Findings demonstrated that the using MH-LOR to detect DIF and DDF supported the theoretical relationship that the magnitude and form of DIF and are dependent on the DDF effects, and demonstrated the advantages of studying DIF and DDF in multiple-choice items. A total of 4 items with DIF and DDF and 5 items with only DDF were detected. Study II incorporated an item content review, an important but often overlooked and under-published step of DIF and DDF studies (Camilli & Shepard). Interviews with 25 female and 22 male low-income preschool children and an expert review helped to interpret the DIF and DDF results and their comparison, and determined that a content review process of studied items can reveal reasons for potential item bias that are often congruent with the statistical results. Patterns emerged and are discussed in detail. The quantitative and qualitative analyses were conducted in an applied framework of examining the validity of the preschool science assessment scores for evaluating science programs serving low-income children, however, the techniques can be generalized for use with measures across various disciplines of research.
ERIC Educational Resources Information Center
van der Linden, Wim J.; Scrams, David J.; Schnipke, Deborah L.
This paper proposes an item selection algorithm that can be used to neutralize the effect of time limits in computer adaptive testing. The method is based on a statistical model for the response-time distributions of the test takers on the items in the pool that is updated each time a new item has been administered. Predictions from the model are…
DIF Detection Using Multiple-Group Categorical CFA with Minimum Free Baseline Approach
ERIC Educational Resources Information Center
Chang, Yu-Wei; Huang, Wei-Kang; Tsai, Rung-Ching
2015-01-01
The aim of this study is to assess the efficiency of using the multiple-group categorical confirmatory factor analysis (MCCFA) and the robust chi-square difference test in differential item functioning (DIF) detection for polytomous items under the minimum free baseline strategy. While testing for DIF items, despite the strong assumption that all…
ERIC Educational Resources Information Center
Flowers, Claudia P.; Raju, Nambury S.; Oshima, T. C.
Current interest in the assessment of measurement equivalence emphasizes two methods of analysis, linear, and nonlinear procedures. This study simulated data using the graded response model to examine the performance of linear (confirmatory factor analysis or CFA) and nonlinear (item-response-theory-based differential item function or IRT-Based…
ERIC Educational Resources Information Center
Suh, Youngsuk; Talley, Anna E.
2015-01-01
This study compared and illustrated four differential distractor functioning (DDF) detection methods for analyzing multiple-choice items. The log-linear approach, two item response theory-model-based approaches with likelihood ratio tests, and the odds ratio approach were compared to examine the congruence among the four DDF detection methods.…
IRT-LR-DIF with Estimation of the Focal-Group Density as an Empirical Histogram
ERIC Educational Resources Information Center
Woods, Carol M.
2008-01-01
Item response theory-likelihood ratio-differential item functioning (IRT-LR-DIF) is used to evaluate the degree to which items on a test or questionnaire have different measurement properties for one group of people versus another, irrespective of group-mean differences on the construct. Usually, the latent distribution is presumed normal for both…
Detecting a Gender-Related Differential Item Functioning Using Transformed Item Difficulty
ERIC Educational Resources Information Center
Abedalaziz, Nabeel; Leng, Chin Hai; Alahmadi, Ahlam
2014-01-01
The purpose of the study was to examine gender differences in performance on multiple-choice mathematical ability test, administered within the context of high school graduation test that was designed to match eleventh grade curriculum. The transformed item difficulty (TID) was used to detect a gender related DIF. A random sample of 1400 eleventh…
Item Analysis and Differential Item Functioning of a Brief Conduct Problem Screen
ERIC Educational Resources Information Center
Wu, Johnny; King, Kevin M.; Witkiewitz, Katie; Racz, Sarah Jensen; McMahon, Robert J.
2012-01-01
Research has shown that boys display higher levels of childhood conduct problems than girls, and Black children display higher levels than White children, but few studies have tested for scalar equivalence of conduct problems across gender and race. The authors conducted a 2-parameter item response theory (IRT) model to examine item…
Examining the Effectiveness of Test Accommodation Using DIF and a Mixture IRT Model
ERIC Educational Resources Information Center
Cho, Hyun-Jeong; Lee, Jaehoon; Kingston, Neal
2012-01-01
This study examined the validity of test accommodation in third-eighth graders using differential item functioning (DIF) and mixture IRT models. Two data sets were used for these analyses. With the first data set (N = 51,591) we examined whether item type (i.e., story, explanation, straightforward) or item features were associated with item…
ERIC Educational Resources Information Center
Zheng, Yinggan; Gierl, Mark J.; Cui, Ying
2010-01-01
This study combined the kernel smoothing procedure and a nonparametric differential item functioning statistic--Cochran's Z--to statistically test the difference between the kernel-smoothed item response functions for reference and focal groups. Simulation studies were conducted to investigate the Type I error and power of the proposed…
ERIC Educational Resources Information Center
Feldt, Leonard S.
2004-01-01
In some settings, the validity of a battery composite or a test score is enhanced by weighting some parts or items more heavily than others in the total score. This article describes methods of estimating the total score reliability coefficient when differential weights are used with items or parts.
Interest Inventory Items as Reinforcing Stimuli: A Test of the A-R-D Theory.
ERIC Educational Resources Information Center
Staats, Arthur W.; And Others
An experiement was conducted to test the hypothesis that interest inventory items would function as reinforcing stimuli in a visual discrimination task. When previously rated liked and disliked items from the Strong Vocational Interest Blank were differentially presented following one of two responses, subjects learned to respond to the stimulus…
Gender Differences in Figural Matrices: The Moderating Role of Item Design Features
ERIC Educational Resources Information Center
Arendasy, Martin E.; Sommer, Markus
2012-01-01
There is a heated debate on whether observed gender differences in some figural matrices in adults can be attributed to gender differences in inductive reasoning/G[subscript f] or differential item functioning and/or test bias. Based on previous studies we hypothesized that three specific item design features moderate the effect size of the gender…
ERIC Educational Resources Information Center
Oliveri, Maria Elena; Olson, Brent F.; Ercikan, Kadriye; Zumbo, Bruno D.
2012-01-01
In this study, the Canadian English and French versions of the Problem-Solving Measure of the Programme for International Student Assessment 2003 were examined to investigate their degree of measurement comparability at the item- and test-levels. Three methods of differential item functioning (DIF) were compared: parametric and nonparametric item…
Effects of Ignoring Item Interaction on Item Parameter Estimation and Detection of Interacting Items
ERIC Educational Resources Information Center
Chen, Cheng-Te; Wang, Wen-Chung
2007-01-01
This study explores the effects of ignoring item interaction on item parameter estimation and the efficiency of using the local dependence index Q[subscript 3] and the SAS NLMIXED procedure to detect item interaction under the three-parameter logistic model and the generalized partial credit model. Through simulations, it was found that ignoring…
Applying Hierarchical Model Calibration to Automatically Generated Items.
ERIC Educational Resources Information Center
Williamson, David M.; Johnson, Matthew S.; Sinharay, Sandip; Bejar, Isaac I.
This study explored the application of hierarchical model calibration as a means of reducing, if not eliminating, the need for pretesting of automatically generated items from a common item model prior to operational use. Ultimately the successful development of automatic item generation (AIG) systems capable of producing items with highly similar…
Landfeldt, Erik; Mayhew, Anna; Straub, Volker; Bushby, Katharine; Lochmüller, Hanns; Lindgren, Peter
2017-12-18
To explore the psychometric properties of the full 22-item English (UK and US) version of the Zarit Caregiver Burden Interview administered to caregivers to patients with Duchenne muscular dystrophy. Caregivers to patients with Duchenne muscular dystrophy from the United Kingdom and the United States, recruited through the TREAT-NMD network, completed the Zarit Caregiver Burden Interview online. The psychometric properties of the Zarit Caregiver Burden Interview were examined using Rasch analysis. A total of 475 caregivers completed the Zarit Caregiver Burden Interview. Model misfit was identified for 9 of 22 items (mean item fit residual 0.061, SD: 2.736) and 13 of 22 items displayed disordered thresholds. The overall item-trait interaction chi-square value was 499 (198 degrees of freedom, p < 0.001). The mean person fit residual was estimated at -0.213 (SD: 1.235). The Person Separation Index and Cronbach's α were estimated at 0.902 and 0.914, respectively. Item dependency was low and we found no significant differential item functioning by country or sex. Our Rasch analysis shows that the Zarit Caregiver Burden Interview fails to fully operationalize a quantitative conceptualization of caregiver burden among caregivers to patients with Duchenne muscular dystrophy from the United Kingdom and the United States. Further research is needed to understand the psychometric properties of the Zarit Caregiver Burden Interview in other populations and settings. Implications for Rehabilitation Duchenne muscular dystrophy is a terminal disease characterized by progressive muscle degeneration resulting in substantial disability and a significant burden on family caregivers. The Zarit Caregiver Burden Interview is one of the most widely applied measures of caregiver burden. Our Rasch analysis suggests that the Zarit Caregiver Burden Interview is not fit for purpose to measure burden in UK and US caregivers to patients with Duchenne muscular dystrophy. Clinicians and decision-makers should interpret Zarit Caregiver Burden Interview data from these populations with caution.
Kulich, Károly; Keininger, Dorothy L; Tiplady, Brian; Banerji, Donald
2015-01-01
Symptoms, particularly dyspnea, and activity limitation, have an impact on the health status and the ability to function normally in patients with chronic obstructive pulmonary disease (COPD). To develop an electronic patient diary (eDiary), qualitative patient interviews were conducted from 2009 to 2010 to identify relevant symptoms and degree of bother due to symptoms. The eDiary was completed by a subset of 209 patients with moderate-to-severe COPD in the 26-week QVA149 SHINE study. Two morning assessments (since awakening and since the last assessment) and one evening assessment were made each day. Assessments covered five symptoms ("shortness of breath," "phlegm/mucus," "chest tightness," "wheezing," and "coughing") and two impact items ("bothered by COPD" and "difficulty with activities") and were scored on a 10-point numeric scale. Patient compliance with the eDiary was 90.4% at baseline and 81.3% at week 26. Correlations between shortness of breath and impact items were >0.95. Regression analysis showed that shortness of breath was a highly significant (P<0.0001) predictor of impact items. Exploratory factor analysis gave a single factor comprising all eDiary items, including both symptoms and impact items. Shortness of breath, the total score (including five symptoms and two impact items), and the five-item symptom score from the eDiary performed well, with good consistency and reliability. The eDiary showed good sensitivity to change, with a 0.6 points reduction in the symptoms scores (on a 0-10 point scale) representing a meaningful change. The eDiary was found to be valid, reliable, and responsive. The high correlations obtained between "shortness of breath" and the ratings of "bother" and "difficulty with activities" confirmed the relevance of this symptom in patients with COPD. Future studies will be required to explore further psychometric properties and their ability to differentiate between COPD treatments.
Hu, Jinxiang; Ward, Michael M
2017-09-01
To determine if persons with arthritis differ systematically from persons without arthritis in how they respond to questions on three depression questionnaires, which include somatic items such as fatigue and sleep disturbance. We extracted data on the Centers for Epidemiological Studies Depression (CES-D) scale, the Patient Health Questionnaire-9 (PHQ-9), and the Kessler-6 (K-6) scale from three large population-based national surveys. We assessed items on these questionnaires for differential item functioning (DIF) between persons with and without self-reported physician-diagnosed arthritis using multiple indicator multiple cause models, which controlled for the underlying level of depression and important confounders. We also examined if DIF by arthritis status was similar between women and men. Although five items of the CES-D, one item of the PHQ-9, and five items of the K-6 scale had evidence of DIF based on statistical comparisons, the magnitude of each difference was less than the threshold of a small effect. The statistical differences were a function of the very large sample sizes in the surveys. Effect sizes for DIF were similar between women and men except for two items on the Patient Health Questionnaire-9. For each questionnaire, DIF accounted for 8% or less of the arthritis-depression association, and excluding items with DIF did not reduce the difference in depression scores between those with and without arthritis. Persons with arthritis respond to items on the CES-D, PHQ-9, and K-6 depression scales similarly to persons without arthritis, despite the inclusion of somatic items in these scales.
Neural mechanisms of cue-approach training
Bakkour, Akram; Lewis-Peacock, Jarrod A.; Poldrack, Russell A.; Schonberg, Tom
2016-01-01
Biasing choices may prove a useful way to implement behavior change. Previous work has shown that a simple training task (the cue-approach task), which does not rely on external reinforcement, can robustly influence choice behavior by biasing choice toward items that were targeted during training. In the current study, we replicate previous behavioral findings and explore the neural mechanisms underlying the shift in preferences following cue-approach training. Given recent successes in the development and application of machine learning techniques to task-based fMRI data, which have advanced understanding of the neural substrates of cognition, we sought to leverage the power of these techniques to better understand neural changes during cue-approach training that subsequently led to a shift in choice behavior. Contrary to our expectations, we found that machine learning techniques applied to fMRI data during non-reinforced training were unsuccessful in elucidating the neural mechanism underlying the behavioral effect. However, univariate analyses during training revealed that the relationship between BOLD and choices for Go items increases as training progresses compared to choices of NoGo items primarily in lateral prefrontal cortical areas. This new imaging finding suggests that preferences are shifted via differential engagement of task control networks that interact with value networks during cue-approach training. PMID:27677231
Murray, Aja Louise; Allison, Carrie; Smith, Paula L; Baron-Cohen, Simon; Booth, Tom; Auyeung, Bonnie
2017-05-01
Diagnostic bias is a concern in autism spectrum conditions (ASC) where prevalence and presentation differ by sex. To ensure that females with ASC are not under-identified, it is important that ASC screening tools do not systematically underestimate autistic traits in females relative to males. We evaluated whether the AQ-10, a brief screen for ASC recommended by the National Institute of Clinical Excellence in cases of suspected ASC, exhibits such a bias. Using an item response theory approach, we evaluated differential item functioning and differential test functioning. We found that although individual items showed some sex bias, these biases at times favored males and at other times favored females. Thus, at the level of test scores the item-level biases cancelled out to give an unbiased overall score. Results support the continued use of the AQ-10 sum score in its current form; however, suggest that caution should be exercised when interpreting responses to individual items. The nature of the item level biases could serve as a guide for future research into how ASC affects males and females differently. Autism Res 2017, 10: 790-800. © 2016 International Society for Autism Research, Wiley Periodicals, Inc. © 2016 International Society for Autism Research, Wiley Periodicals, Inc.
Jafari, Peyman; Bagheri, Zahra; Hashemi, Seyyedeh Zahra; Shalileh, Keivan
2013-06-06
Limited studies have examined the effect of differential item functioning (DIF) on comparing health related quality of life (HRQoL) scores across child self-reports and parent proxy-reports. This study aims to determine whether parents and children respond differently to the items in the Persian version of the PedsQoLTM 4.0 measure. The PedsQLTM 4.0 Generic Core Scales was completed by 938 child-parent dyads. The graded response model (GRM) was used to detect DIF between parents and children. The IRT analyses were conducted using IRTPRO 2.1.On the whole, our findings showed that 50% (4 out of 8) of the items in the physical subscale and 40% (2 out of 5) in both emotional and school subscales were flagged with DIF. Among the DIF items, 62.5% (5 out of 8) were uniform and the remaining 37.5% (3 out of 8) were non-uniform. Parents and children interpret certain items of the PedsQLTM 4.0 in a different ways, except for the social subscale. Hence, we should be cautious about using parent proxy-report as a substitute for a child's ratings.
Multiple determinants of lifespan memory differences.
Henson, Richard N; Campbell, Karen L; Davis, Simon W; Taylor, Jason R; Emery, Tina; Erzinclioglu, Sharon; Kievit, Rogier A
2016-09-07
Memory problems are among the most common complaints as people grow older. Using structural equation modeling of commensurate scores of anterograde memory from a large (N = 315), population-derived sample (www.cam-can.org), we provide evidence for three memory factors that are supported by distinct brain regions and show differential sensitivity to age. Associative memory and item memory are dramatically affected by age, even after adjusting for education level and fluid intelligence, whereas visual priming is not. Associative memory and item memory are differentially affected by emotional valence, and the age-related decline in associative memory is faster for negative than for positive or neutral stimuli. Gray-matter volume in the hippocampus, parahippocampus and fusiform cortex, and a white-matter index for the fornix, uncinate fasciculus and inferior longitudinal fasciculus, show differential contributions to the three memory factors. Together, these data demonstrate the extent to which differential ageing of the brain leads to differential patterns of memory loss.
Armed Services Vocational Aptitude Battery: Differential Item Functioning on the High School Form.
1988-04-01
AD-RI93 693 ARMED SERVICES VOCATIONAL APTITUDE BATTERY:1/ DIFFERENTIAL ITEM FUNCTIONING..(U) UNIYERSAL ENERGY SYSTEMS INC DAYTON OH R L LINN ET AL...FUNCTIONING ON THE HIGH SCHOOL FORM - H U Robert L. Linn C. Nicholas Hastings Pei-Hua Gillian HuMKatherine E. Ryan A Universal Energy Systems , Inc. 40 Dayton...Period October 1985 - Ky 1987 0 U Approved for public release; distribution is unlimited. R ,. CES LABORATORY 1>2 Se DTIC AIR FORCE SYSTEMS COMMAND 0
Zachariae, Robert; O'Connor, Maja; Lassesen, Berit; Olesen, Martin; Kjær, Louise Binow; Thygesen, Marianne; Mørcke, Anne Mette
2015-09-15
Patient-centered communication is a core competency in modern health care and associated with higher levels of patient satisfaction, improved patient health outcomes, and lower levels of burnout among physicians. The objective of the present study was to develop a questionnaire assessing medical student and physician self-efficacy in patient-centeredness (SEPCQ) and explore its psychometric properties. A preliminary 88-item questionnaire (SEPCQ-88) was developed based on a review of the literature and medical student portfolios and completed by 448 medical students from Aarhus University. Exploratory Principal Component analysis resulted in a 27-item version (SEPCQ-27) with three underlying self-efficacy factors: 1) Exploring the patient perspective, 2) Sharing information and power, and 3) Dealing with communicative challenges. The SEPCQ-27 was completed by an independent sample of 291 medical students from 2 medical schools and 101 hospital physicians. Internal consistencies of total and subscales were acceptable for both students and physicians (Cronbach's alpha (range): 0.74-0.95). There were no overall indications of gender-related differential item function (DIF), and a Confirmatory Factor Analysis (CFA) indicated good fit (CFI = 0.98; NNFI = 0.98; RMSEA = 0.05; SRMR = 0.07). Responsiveness was indicated by increases in SEPCQ scores after a course in communication and peer-supervision (Cohen's d (range): 0.21 to 0.73; p: 0.053 to 0.001). Furthermore, positive associations were found between increases in SEPCQ-scores and course-related motivation to learn (medical students) and between SEPCQ scores and years of clinical experience (physicians). The final SEPCQ-27 showed satisfactory psychometric properties, and preliminary support was found for its construct validity, indicating that the SEPCQ-27 may be a valuable measure in future patient centered communication training and research.
Hogge, Michaël; Adam, Stéphane; Collette, Fabienne
2008-07-01
The directed forgetting effect obtained with the item method is supposed to depend on both selective rehearsal of to-be-remembered (TBR) items and attentional inhibition of to-be-forgotten (TBF) items. In this study, we investigated the locus of the directed forgetting deficit in older adults by exploring the influence of recollection and familiarity-based retrieval processes on age-related differences in directed forgetting. Moreover, we explored the influence of processing speed, short-term memory capacity, thought suppression tendencies, and sensitivity to proactive interference on performance. The results indicated that older adults' directed forgetting difficulties are due to decreased recollection of TBR items, associated with increased automatic retrieval of TBF items. Moreover, processing speed and proactive interference appeared to be responsible for the decreased recall of TBR items.
ERIC Educational Resources Information Center
Turner, Brandon M.; Betz, Nancy E.; Edwards, Michael C.; Borgen, Fred H.
2010-01-01
The psychometric properties of measures of self-efficacy for the six themes of Holland's theory were examined using item response theory. Item and scale quality were compared across levels of the trait continuum; all the scales were highly reliable but differentiated better at some levels of the continuum than others. Applications for adaptive…
ERIC Educational Resources Information Center
Lynch, Mervin D.; Chaves, John
Items from Peirs-Harris and Coopersmith self-concept tests were evaluated against independent measures on three self-constructs, idealized, empathic, and worth. Construct measurements were obtained with the semantic differential and D statistic. Ratings were obtained from 381 children, grades 4-6. For each test, item ratings and construct measures…
Item Discrimination and Type I Error in the Detection of Differential Item Functioning
ERIC Educational Resources Information Center
Li, Yanju; Brooks, Gordon P.; Johanson, George A.
2012-01-01
In 2009, DeMars stated that when impact exists there will be Type I error inflation, especially with larger sample sizes and larger discrimination parameters for items. One purpose of this study is to present the patterns of Type I error rates using Mantel-Haenszel (MH) and logistic regression (LR) procedures when the mean ability between the…
ERIC Educational Resources Information Center
Finch, Holmes
2011-01-01
Methods of uniform differential item functioning (DIF) detection have been extensively studied in the complete data case. However, less work has been done examining the performance of these methods when missing item responses are present. Research that has been done in this regard appears to indicate that treating missing item responses as…
ERIC Educational Resources Information Center
Immekus, Jason C.; Maller, Susan J.
2009-01-01
The Kaufman Adolescent and Adult Intelligence Test (KAIT[TM]) is an individually administered test of intelligence for individuals ranging in age from 11 to 85+ years. The item response theory-likelihood ratio procedure, based on the two-parameter logistic model, was used to detect differential item functioning (DIF) in the KAIT across males and…
Modeling Item-Position Effects within an IRT Framework
ERIC Educational Resources Information Center
Debeer, Dries; Janssen, Rianne
2013-01-01
Changing the order of items between alternate test forms to prevent copying and to enhance test security is a common practice in achievement testing. However, these changes in item order may affect item and test characteristics. Several procedures have been proposed for studying these item-order effects. The present study explores the use of…
RhinAsthma patient perspective: A Rasch validation study.
Molinengo, Giorgia; Baiardini, Ilaria; Braido, Fulvio; Loera, Barbara
2018-02-01
In daily practice, Health-Related Quality of Life (HRQoL) tools are useful for supplementing clinical data with the patient's perspective. To encourage their use by clinicians, the availability of tools that can quickly provide valid results is crucial. A new HRQoL tool has been proposed for patients with asthma and rhinitis: the RhinAsthma Patient Perspective-RAPP. The aim of this study was to evaluate the psychometric robustness of the RAPP using the Item Response Theory (IRT) approach, to evaluate the scalability of items and test whether or not patients use the items response scale correctly. 155 patients (53.5% women, mean age 39.1, range 16-76) were recruited during a multicenter study. RAPP metric properties were investigated using IRT models. Differential item functioning (DIF) was used for gender, age, and asthma control test (ACT). The RAPP adequately fitted the Rating Scale model, demonstrating the equality of the rating scale structure for all items. All statistics on items were satisfactory. The RAPP had adequate internal reliability and showed good ability to discriminate among different groups of participants. DIF analysis indicated that there were no differential item functioning issues for gender. One item showed a DIF by age and four items by ACT. The psychometric evaluation performed using IRT models demonstrated that the RAPP met all the criteria to be considered a reliable and valid method of measurement. From a clinical perspective, this will allow physicians to confidently interpret scores as good indicators of Quality of Life of patients with asthma.
Watt, Torquil; Groenvold, Mogens; Hegedüs, Laszlo; Bonnema, Steen Joop; Rasmussen, Åse Krogh; Feldt-Rasmussen, Ulla; Bjorner, Jakob Bue
2014-02-01
To evaluate the extent of differential item functioning (DIF) within the thyroid-specific quality of life patient-reported outcome measure, ThyPRO, according to sex, age, education and thyroid diagnosis. A total of 838 patients with benign thyroid diseases completed the ThyPRO questionnaire (84 five-point items, 13 scales). Uniform and nonuniform DIF were investigated using ordinal logistic regression, testing for both statistical significance and magnitude (∆R(2) > 0.02). Scale level was estimated by the sum score, after purification. Twenty instances of DIF in 17 of the 84 items were found. Eight according to diagnosis, where the goiter scale was the one most affected, possibly due to differing perceptions in patients with auto-immune thyroid diseases compared to patients with simple goiter. Eight DIFs according to age were found, of which 5 were in positively worded items, which younger patients were more likely to endorse; one according to gender: women were more likely to report crying, and three according to educational level. The vast majority of DIF had only minor influence on the scale scores (0.1-2.3 points on the 0-100 scales), but two DIF corresponded to a difference of 4.6 and 9.8, respectively. Ordinal logistic regression identified DIF in 17 of 84 items. The potential impact of this on the present scales was low, but items displaying DIF could be avoided when developing abbreviated scales, where the potential impact of DIF (due to fewer items) will be larger.
Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald
2006-11-01
We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.
Oude Voshaar, Martijn A H; Ten Klooster, Peter M; Vonkeman, Harald E; van de Laar, Mart A F J
2017-11-01
Traditional patient-reported physical function instruments often poorly differentiate patients with mild-to-moderate disability. We describe the development and psychometric evaluation of a generic item bank for measuring everyday activity limitations in outpatient populations. Seventy-two items generated from patient interviews and mapped to the International Classification of Functioning, Disability and Health (ICF) domestic life chapter were administered to 1128 adults representative of the Dutch population. The partial credit model was fitted to the item responses and evaluated with respect to its assumptions, model fit, and differential item functioning (DIF). Measurement performance of a computerized adaptive testing (CAT) algorithm was compared with the SF-36 physical functioning scale (PF-10). A final bank of 41 items was developed. All items demonstrated acceptable fit to the partial credit model and measurement invariance across age, sex, and educational level. Five- and ten-item CAT simulations were shown to have high measurement precision, which exceeded that of SF-36 physical functioning scale across the physical function continuum. Floor effects were absent for a 10-item empirical CAT simulation, and ceiling effects were low (13.5%) compared with SF-36 physical functioning (38.1%). CAT also discriminated better than SF-36 physical functioning between age groups, number of chronic conditions, and respondents with or without rheumatic conditions. The Rasch assessment of everyday activity limitations (REAL) item bank will hopefully prove a useful instrument for assessing everyday activity limitations. T-scores obtained using derived measures can be used to benchmark physical function outcomes against the general Dutch adult population.
Purcell, Susan E; Rhea, Karen; Maier, Philip; First, Michael; Zweede, Lisa; Sinisterra, Manuela; Nunn, M Brad; Austin, Marie-Paule; Brodey, Inger S
2018-01-01
Background The Structured Clinical Interview for DSM (SCID) is considered the gold standard assessment for accurate, reliable psychiatric diagnoses; however, because of its length, complexity, and training required, the SCID is rarely used outside of research. Objective This paper aims to describe the development and initial validation of a Web-based, self-report screening instrument (the Screening Assessment for Guiding Evaluation-Self-Report, SAGE-SR) based on the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) and the SCID-5-Clinician Version (CV) intended to make accurate, broad-based behavioral health diagnostic screening more accessible within clinical care. Methods First, study staff drafted approximately 1200 self-report items representing individual granular symptoms in the diagnostic criteria for the 8 primary SCID-CV modules. An expert panel iteratively reviewed, critiqued, and revised items. The resulting items were iteratively administered and revised through 3 rounds of cognitive interviewing with community mental health center participants. In the first 2 rounds, the SCID was also administered to participants to directly compare their Likert self-report and SCID responses. A second expert panel evaluated the final pool of items from cognitive interviewing and criteria in the DSM-5 to construct the SAGE-SR, a computerized adaptive instrument that uses branching logic from a screener section to administer appropriate follow-up questions to refine the differential diagnoses. The SAGE-SR was administered to healthy controls and outpatient mental health clinic clients to assess test duration and test-retest reliability. Cutoff scores for screening into follow-up diagnostic sections and criteria for inclusion of diagnoses in the differential diagnosis were evaluated. Results The expert panel reduced the initial 1200 test items to 664 items that panel members agreed collectively represented the SCID items from the 8 targeted modules and DSM criteria for the covered diagnoses. These 664 items were iteratively submitted to 3 rounds of cognitive interviewing with 50 community mental health center participants; the expert panel reviewed session summaries and agreed on a final set of 661 clear and concise self-report items representing the desired criteria in the DSM-5. The SAGE-SR constructed from this item pool took an average of 14 min to complete in a nonclinical sample versus 24 min in a clinical sample. Responses to individual items can be combined to generate DSM criteria endorsements and differential diagnoses, as well as provide indices of individual symptom severity. Preliminary measures of test-retest reliability in a small, nonclinical sample were promising, with good to excellent reliability for screener items in 11 of 13 diagnostic screening modules (intraclass correlation coefficient [ICC] or kappa coefficients ranging from .60 to .90), with mania achieving fair test-retest reliability (ICC=.50) and other substance use endorsed too infrequently for analysis. Conclusions The SAGE-SR is a computerized adaptive self-report instrument designed to provide rigorous differential diagnostic information to clinicians. PMID:29572204
Ashley, Laura; Smith, Adam B; Keding, Ada; Jones, Helen; Velikova, Galina; Wright, Penny
2013-12-01
To provide new insights into the psychometrics of the revised Illness Perception Questionnaire (IPQ-R) in cancer patients. To undertake, for the first time using data from breast, colorectal and prostate cancer patients, a confirmatory factor analysis (CFA) to assess the validity of the IPQ-R's core seven-factor structure. Also, for the first time in any illness group, to undertake Rasch analysis to explore the extent to which the IPQ-R factors form unidimensional scales, with linear measurement properties and no Differential Item Functioning (DIF). Patients with potentially curable breast, colorectal or prostate cancer, within 6months post-diagnosis, completed the IPQ-R online (N=531). CFA was conducted, including multi-sample analysis, and for each IPQ-R factor fit to the Rasch model was assessed by examining, amongst other things, item fit, DIF and unidimensionality. The CFA showed a moderate fit of the data to the IPQ-R model, and stability across diagnosis, although fit was significantly improved following the removal of selected items. All seven factors achieved fit to the Rasch model, and exhibited unidimensionality and minimal DIF, although in most cases this was after some item rescoring and/or deletion. In both analyses, IPQ-R items 12, 18 and 24 were indicated as misfitting and removed. Given the rigorous standard of Rasch measurement, and the generic nature of the IPQ-R, it stood up well to the demands of the Rasch model in this study. Importantly, the results show that with some relatively minor, pragmatic modifications the IPQ-R could possess Rasch-standard measurement in cancer patients. © 2013.
Xu, Hui; Tracey, Terence J G
2017-03-01
The current study developed an abbreviated version of the Career Indecision Profile-65 (CIP-65; Hacker, Carr, Abrams, & Brown, 2013) by using item response theory. In order to improve the efficiency of the CIP-65 in measuring career indecision, the individual item performance of the CIP-65 was examined with respect to the ordering of response occurrence and gender differential item functioning. The best 5 items of each scale of the CIP-65 (i.e., neuroticism/negative affectivity, choice/commitment anxiety, lack of readiness, and interpersonal conflicts) were retained in the CIP-Short using a sample of 588 college students. A validation sample (N = 174) supported the reliability and structural validity of the CIP-Short. The convergent and divergent validity of the CIP-Short was additionally supported in the findings of a hypothesized differential relational pattern in a separate sample (N = 360). While the current study supported the CIP-Short being a sound brief measure of career indecision, the limitations of this study and suggestions for future research were discussed as well. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Development and Validation of the Homeostasis Concept Inventory
McFarland, Jenny L.; Price, Rebecca M.; Wenderoth, Mary Pat; Martinková, Patrícia; Cliff, William; Michael, Joel; Modell, Harold; Wright, Ann
2017-01-01
We present the Homeostasis Concept Inventory (HCI), a 20-item multiple-choice instrument that assesses how well undergraduates understand this critical physiological concept. We used an iterative process to develop a set of questions based on elements in the Homeostasis Concept Framework. This process involved faculty experts and undergraduate students from associate’s colleges, primarily undergraduate institutions, regional and research-intensive universities, and professional schools. Statistical results provided strong evidence for the validity and reliability of the HCI. We found that graduate students performed better than undergraduates, biology majors performed better than nonmajors, and students performed better after receiving instruction about homeostasis. We used differential item analysis to assess whether students from different genders, races/ethnicities, and English language status performed differently on individual items of the HCI. We found no evidence of differential item functioning, suggesting that the items do not incorporate cultural or gender biases that would impact students’ performance on the test. Instructors can use the HCI to guide their teaching and student learning of homeostasis, a core concept of physiology. PMID:28572177
Thibodeau, Michel A; Leonard, Rachel C; Abramowitz, Jonathan S; Riemann, Bradley C
2015-12-01
The Dimensional Obsessive-Compulsive Scale (DOCS) is a promising measure of obsessive-compulsive disorder (OCD) symptoms but has received minimal psychometric attention. We evaluated the utility and reliability of DOCS scores. The study included 832 students and 300 patients with OCD. Confirmatory factor analysis supported the originally proposed four-factor structure. DOCS total and subscale scores exhibited good to excellent internal consistency in both samples (α = .82 to α = .96). Patient DOCS total scores reduced substantially during treatment (t = 16.01, d = 1.02). DOCS total scores discriminated between students and patients (sensitivity = 0.76, 1 - specificity = 0.23). The measure did not exhibit gender-based differential item functioning as tested by Mantel-Haenszel chi-square tests. Expected response options for each item were plotted as a function of item response theory and demonstrated that DOCS scores incrementally discriminate OCD symptoms ranging from low to extremely high severity. Incremental differences in DOCS scores appear to represent unbiased and reliable differences in true OCD symptom severity. © The Author(s) 2014.
Khan, Anzalee; Liharska, Lora; Harvey, Philip D; Atkins, Alexandra; Ulshen, Daniel; Keefe, Richard S E
2017-12-01
Objective: Recognizing the discrete dimensions that underlie negative symptoms in schizophrenia and how these dimensions are understood across localities might result in better understanding and treatment of these symptoms. To this end, the objectives of this study were to 1) identify the Positive and Negative Syndrome Scale negative symptom dimensions of expressive deficits and experiential deficits and 2) analyze performance on these dimensions over 15 geographical regions to determine whether the items defining them manifest similar reliability across these regions. Design: Data were obtained for the baseline Positive and Negative Syndrome Scale visits of 6,889 subjects across 15 geographical regions. Using confirmatory factor analysis, we examined whether a two-factor negative symptom structure that is found in schizophrenia (experiential deficits and expressive deficits) would be replicated in our sample, and using differential item functioning, we tested the degree to which specific items from each negative symptom subfactor performed across geographical regions in comparison with the United States. Results: The two-factor negative symptom solution was replicated in this sample. Most geographical regions showed moderate-to-large differential item functioning for Positive and Negative Syndrome Scale expressive deficit items, especially N3 Poor Rapport, as compared with Positive and Negative Syndrome Scale experiential deficit items, showing that these items might be interpreted or scored differently in different regions. Across countries, except for India, the differential item functioning values did not favor raters in the United States. Conclusion: These results suggest that the Positive and Negative Syndrome Scale negative symptom factor can be better represented by a two-factor model than by a single-factor model. Additionally, the results show significant differences in responses to items representing the Positive and Negative Syndrome Scale expressive factors, but not the experiential factors, across regions. This could be due to a lack of equivalence between the original and translated versions, cultural differences with the interpretation of items, dissimilarities in rater training, or diversity in the understanding of scoring anchors. Knowing which items are challenging for raters across regions can help to guide Positive and Negative Syndrome Scale training and improve the results of international clinical trials aimed at negative symptoms.
He, Qiwei; Glas, Cees A W; Veldkamp, Bernard P
2014-06-01
This article explores the generalizability of the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) diagnostic criteria for post-traumatic stress disorder (PTSD) to various subpopulations. Besides identifying the differential symptom functioning (also referred to as differential item functioning [DIF]) related to various background variables such as gender, marital status and educational level, this study emphasizes the importance of evaluating the impact of DIF on population inferences as made in health surveys and clinical trials, and on the diagnosis of individual patients. Using a sample from the National Comorbidity Study-Replication (NCS-R), four symptoms for gender, one symptom for marital status, and three symptoms for educational level were significantly flagged as DIF, but their impact on diagnosis was fairly small. We conclude that the DSM-IV diagnostic criteria for PTSD do not produce substantially biased results in the investigated subpopulations, and there should be few reservations regarding their use. Further, although the impact of DIF (i.e. the influence of differential symptom functioning on diagnostic results) was found to be quite small in the current study, we recommend that diagnosticians always perform a DIF analysis of various subpopulations using the methodology presented here to ensure the diagnostic criteria is valid in their own studies. Copyright © 2014 John Wiley & Sons, Ltd.
Michaelides, Michalis P.
2010-01-01
Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items. PMID:21833230
Michaelides, Michalis P
2010-01-01
Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items.
Type I Error Inflation in DIF Identification with Mantel-Haenszel: An Explanation and a Solution
ERIC Educational Resources Information Center
Magis, David; De Boeck, Paul
2014-01-01
It is known that sum score-based methods for the identification of differential item functioning (DIF), such as the Mantel-Haenszel (MH) approach, can be affected by Type I error inflation in the absence of any DIF effect. This may happen when the items differ in discrimination and when there is item impact. On the other hand, outlier DIF methods…
Steacy, Laura M; Elleman, Amy M; Lovett, Maureen W; Compton, Donald L
2016-01-01
In English, gains in decoding skill do not map directly onto increases in word reading. However, beyond the Self-Teaching Hypothesis (Share, 1995), little is known about the transfer of decoding skills to word reading. In this study, we offer a new approach to testing specific decoding elements on transfer to word reading. To illustrate, we modeled word-reading gains among children with reading disability (RD) enrolled in Phonological and Strategy Training (PHAST) or Phonics for Reading (PFR). Conditions differed in sublexical training with PHAST stressing multi-level connections and PFR emphasizing simple grapheme-phoneme correspondences. Thirty-seven children with RD, 3 rd - 6 th grade, were randomly assigned 60 lessons of PHAST or PFR. Crossed random-effects models allowed us to identify specific intervention elements that differentially impacted word-reading performance at posttest, with children in PHAST better able to read words with variant vowel pronunciations. Results suggest that sublexical emphasis influences transfer gains to word reading.
Effect of congenital blindness on the semantic representation of some everyday concepts.
Connolly, Andrew C; Gleitman, Lila R; Thompson-Schill, Sharon L
2007-05-15
This study explores how the lack of first-hand experience with color, as a result of congenital blindness, affects implicit judgments about "higher-order" concepts, such as "fruits and vegetables" (FV), but not others, such as "household items" (HHI). We demonstrate how the differential diagnosticity of color across our test categories interacts with visual experience to produce, in effect, a category-specific difference in implicit similarity. Implicit pair-wise similarity judgments were collected by using an odd-man-out triad task. Pair-wise similarities for both FV and for HHI were derived from this task and were compared by using cluster analysis and regression analyses. Color was found to be a significant component in the structure of implicit similarity for FV for sighted participants but not for blind participants; and this pattern remained even when the analysis was restricted to blind participants who had good explicit color knowledge of the stimulus items. There was also no evidence that either subject group used color knowledge in making decisions about HHI, nor was there an indication of any qualitative differences between blind and sighted subjects' judgments on HHI.
Grzadzinski, Rebecca; Dick, Catherine; Lord, Catherine; Bishop, Somer
2016-01-01
Children with attention deficit/hyperactivity disorder (ADHD) often present with social difficulties, though the extent to which these clearly overlap with symptoms of autism spectrum disorder (ASD) is not well understood. We explored parent-reported and directly-observed ASD symptoms on the Autism Diagnostic Interview-Revised (ADI-R) and the Autism Diagnostic Observation Schedule (ADOS) in children referred to ASD-specialty clinics who received diagnoses of either ADHD (n = 48) or ASD (n = 164). Of the ADHD sample, 21 % met ASD cut-offs on the ADOS and 30 % met ASD cut-offs on all domains of the ADI-R. Four social communication ADOS items (Quality of Social Overtures, Unusual Eye Contact, Facial Expressions Directed to Examiner, and Amount of Reciprocal Social Communication) adequately differentiated the groups while none of the items on the ADI-R met the criteria for adequate discrimination. Results of this work highlight the challenges that clinicians and researchers face when distinguishing ASD from other disorders in verbally fluent, school-age children.
Jäger, B; Schmid-Ott, G; Ernst, G; Dölle-Lange, E; Sack, M
2012-06-01
The aim of this study was to construct and validate a short self-rating questionnaire for the assessment of ego functions and ability of self regulation. An item pool of 120 items covering 6 postulated dimensions was reduced by two steps in independent samples (n = 136 + 470) via factor and item analyses to the final version consisting of 35 items. The 5 resulting questionnaire scales "interpersonal disturbances", "frustration tolerance and impulse control", "identity disturbances", "affect differentiation and affect tolerance" and "self-esteem" were well interpretable and showed in confirmatory factor analysis the best fit to the data (CHI²/df = 3.48; RMSEA = 0.73). Total scores were found to differentiate well between diagnostic groups of patients with more or less ego pathology (FANOVA = 9.8; df = 11; p < 0.001), thus proving good concurrent validity. Reliability was shown by testing internal consistency and test-retest correlations. The "Hannover self-regulation questionnaire" (HSRQ) evidently is an appropriate and reliable screening instrument in order to assess ego functions and capacities of self regulation in an economic and user-friendly means. The scale structure allows differentiated diagnostics of weak vs. stable ego functions and may be used for detailed therapy planning. © Georg Thieme Verlag KG Stuttgart · New York.
Barnett, Carolina; Merkies, Ingemar S J; Katzberg, Hans; Bril, Vera
2015-09-02
The Quantitative Myasthenia Gravis Score and the Myasthenia Gravis Composite are two commonly used outcome measures in Myasthenia Gravis. So far, their measurement properties have not been compared, so we aimed to study their psychometric properties using the Rasch model. 251 patients with stable myasthenia gravis were assessed with both scales, and 211 patients returned for a second assessment. We studied fit to the Rasch model at the first visit, and compared item fit, thresholds, differential item functioning, local dependence, person separation index, and tests for unidimensionality. We also assessed test-retest reliability and estimated the Minimal Detectable Change. Neither scale fit the Rasch model (X2p < 0.05). The Myasthenia Gravis Composite had lower discrimination properties than the Quantitative Myasthenia Gravis Scale (Person Separation Index: 0.14 and 0.7). There was local dependence in both scales, as well as differential item functioning for ocular and generalized disease. Disordered thresholds were found in 6(60%) items of the Myasthenia Gravis Composite and in 4(31%) of the Quantitative Myasthenia Gravis Score. Both tools had adequate test-retest reliability (ICCs >0.8). The minimally detectable change was 4.9 points for the Myasthenia Gravis Composite and 4.3 points for the Quantitative Myasthenia Gravis Score. Neither scale fulfilled Rasch model expectations. The Quantitative Myasthenia Gravis Score has higher discrimination than the Myasthenia Gravis Composite. Both tools have items with disordered thresholds, differential item functioning and local dependency. There was evidence of multidimensionality in the QMGS. The minimal detectable change values are higher than previous studies on the minimal significant change. These findings might inform future modifications of these tools.
ERIC Educational Resources Information Center
Vorstenbosch, Marc A. T. M.; Klaassen, Tim P. F. M.; Kooloos, Jan G. M.; Bolhuis, Sanneke M.; Laan, Roland F. J. M.
2013-01-01
Anatomists often use images in assessments and examinations. This study aims to investigate the influence of different types of images on item difficulty and item discrimination in written assessments. A total of 210 of 460 students volunteered for an extra assessment in a gross anatomy course. This assessment contained 39 test items grouped in…
Using the Rasch Measurement Model in Psychometric Analysis of the Family Effectiveness Measure
McCreary, Linda L.; Conrad, Karen M.; Conrad, Kendon J.; Scott, Christy K; Funk, Rodney R.; Dennis, Michael L.
2013-01-01
Background Valid assessment of family functioning can play a vital role in optimizing client outcomes. Because family functioning is influenced by family structure, socioeconomic context, and culture, existing measures of family functioning--primarily developed with nuclear, middle class European American families--may not be valid assessments of families in diverse populations. The Family Effectiveness Measure was developed to address this limitation. Objectives To test the Family Effectiveness Measure with data from a primarily low-income African American convenience sample, using the Rasch measurement model. Method A sample of 607 adult women completed the measure. Rasch analysis was used to assess unidimensionality, response category functioning, item fit, person reliability, differential item functioning by race and parental status, and item hierarchy. Criterion-related validity was tested using correlations with five other variables related to family functioning. Results The Family Effectiveness Measure measures two separate constructs: The effective family functioning construct was a psychometrically sound measure of the target construct that was more efficient due to the deletion of 22 items. The ineffective family functioning construct consisted of 16 of those deleted items but was not as strong psychometrically. Items in both constructs evidenced no differential item functioning by race. Criterion-related validity was supported for both. Discussion In contrast to the prevailing conceptualization that family functioning is a single construct, assessed by positively and negatively worded items, use of the Rasch analysis suggested the existence of two constructs. While the effective family functioning is a strong and efficient measure of family functioning, the ineffective family functioning will require additional item development and psychometric testing. PMID:23636342
Finders keepers: the features differentiating hoarding disorder from normative collecting.
Nordsletten, Ashley E; Fernández de la Cruz, Lorena; Billotti, Danielle; Mataix-Cols, David
2013-04-01
A new diagnostic category called Hoarding Disorder (HD) has been proposed for inclusion in DSM-5. It is paramount that this addition does not result in an over-pathologization of normative behavior. Collectors constitute a valid population within which to test the diagnostic boundaries of HD. The current study explored the features that differentiate pathological hoarding from normative collecting. Participants were 29 individuals with a diagnosis of HD and 20 individuals who self-identified as collectors who enrolled in the London Field Trial for HD. A series of semi-structured interviews (often in the participants' homes) were conducted, including a detailed assessment of the typical elements of the collecting process. Participants also completed a battery of self-report questionnaires. Collectors were more likely than those with HD to be male, partnered, and free of psychiatric conditions or medication. Like those with HD, collectors reported the acquisition of, attachment to, and reluctance to discarding objects. However, the resulting clutter and impairment were minimal in this group and ultimately insufficient to garner an HD diagnosis. Collectors were, additionally, more focused in their acquisitions (e.g., confining their accumulations to a narrow range of items), more selective (e.g., planning and purchasing only pre-determined items), more likely to organize their possessions and less likely to accumulate in an excessive manner. There are important quantitative and qualitative differences between HD and normative collecting. For this reason, collectors are unlikely to be inappropriately pathologized by the introduction of HD. Copyright © 2013 Elsevier Inc. All rights reserved.
Mueller, Anne E; Segal, Daniel L; Gavett, Brandon; Marty, Meghan A; Yochim, Brian; June, Andrea; Coolidge, Frederick L
2015-07-01
The Geriatric Anxiety Scale (GAS; Segal et al. (Segal, D. L., June, A., Payne, M., Coolidge, F. L. and Yochim, B. (2010). Journal of Anxiety Disorders, 24, 709-714. doi:10.1016/j.janxdis.2010.05.002) is a self-report measure of anxiety that was designed to address unique issues associated with anxiety assessment in older adults. This study is the first to use item response theory (IRT) to examine the psychometric properties of a measure of anxiety in older adults. A large sample of older adults (n = 581; mean age = 72.32 years, SD = 7.64 years, range = 60 to 96 years; 64% women; 88% European American) completed the GAS. IRT properties were examined. The presence of differential item functioning (DIF) or measurement bias by age and sex was assessed, and a ten-item short form of the GAS (called the GAS-10) was created. All GAS items had discrimination parameters of 1.07 or greater. Items from the somatic subscale tended to have lower discrimination parameters than items on the cognitive or affective subscales. Two items were flagged for DIF, but the impact of the DIF was negligible. Women scored significantly higher than men on the GAS and its subscales. Participants in the young-old group (60 to 79 years old) scored significantly higher on the cognitive subscale than participants in the old-old group (80 years old and older). Results from the IRT analyses indicated that the GAS and GAS-10 have strong psychometric properties among older adults. We conclude by discussing implications and future research directions.
NASA Astrophysics Data System (ADS)
Ilich, Maria O.
Psychometricians and test developers evaluate standardized tests for potential bias against groups of test-takers by using differential item functioning (DIF). English language learners (ELLs) are a diverse group of students whose native language is not English. While they are still learning the English language, they must take their standardized tests for their school subjects, including science, in English. In this study, linguistic complexity was examined as a possible source of DIF that may result in test scores that confound science knowledge with a lack of English proficiency among ELLs. Two years of fifth-grade state science tests were analyzed for evidence of DIF using two DIF methods, Simultaneous Item Bias Test (SIBTest) and logistic regression. The tests presented a unique challenge in that the test items were grouped together into testlets---groups of items referring to a scientific scenario to measure knowledge of different science content or skills. Very large samples of 10, 256 students in 2006 and 13,571 students in 2007 were examined. Half of each sample was composed of Spanish-speaking ELLs; the balance was comprised of native English speakers. The two DIF methods were in agreement about the items that favored non-ELLs and the items that favored ELLs. Logistic regression effect sizes were all negligible, while SIBTest flagged items with low to high DIF. A decrease in socioeconomic status and Spanish-speaking ELL diversity may have led to inconsistent SIBTest effect sizes for items used in both testing years. The DIF results for the testlets suggested that ELLs lacked sufficient opportunity to learn science content. The DIF results further suggest that those constructed response test items requiring the student to draw a conclusion about a scientific investigation or to plan a new investigation tended to favor ELLs.
Development and validation of a vision-specific quality-of-life questionnaire for Timor-Leste.
du Toit, Rènée; Palagyi, Anna; Ramke, Jacqueline; Brian, Garry; Lamoureux, Ecosse L
2008-10-01
To develop and determine the reliability and validity of a vision-specific quality-of-life instrument (TL-VSQOL) designed to assess the impact of distance and near vision impairment in adults living in Timor-Leste. A vision-specific quality-of-life questionnaire was developed, piloted, and administered to 704 Timorese aged >or=40 years during a population-based eye health rapid assessment. Rasch analysis was performed on the data of 457 participants with presenting near vision worse than N8 (78.5%) and/or distance vision worse than 6/18 (69.8%). Unidimensionality, item fit to the model, response category performance, differential item functioning, and targeting of items to participants were assessed. Initially, the questionnaire lacked fit to the Rasch model. Removal of two items concerning emotional well-being resulted in a fit of the data (overall item-trait interaction: chi(2) (df) = 81 (51); mean (SD) person and item fit residual values: -0.30 (1.02) and -0.32 (1.46), and good targeting of person ability and item difficulty was evident. Poorer distance and near visual acuities were significantly associated with worse quality-of-life scores (P < 0.001). Person separation reliability was substantial (0.93), indicating that the instrument can discriminate between groups with normal and impaired vision. All 17 items were free of differential item functioning, and there was no evidence of multidimensionality. This 17-item TL-VSQOL has high reliability, construct, and criterion validity and effective targeting. It can effectively assess the impact on quality of life of adult Timorese with distance and near vision impairment. The TL-VSQOL could be adapted for use in other low-resource settings.
ERIC Educational Resources Information Center
Wang, Wen-Chung; Su, Ya-Hui
2004-01-01
In this study we investigated the effects of the average signed area (ASA) between the item characteristic curves of the reference and focal groups and three test purification procedures on the uniform differential item functioning (DIF) detection via the Mantel-Haenszel (M-H) method through Monte Carlo simulations. The results showed that ASA,…
ERIC Educational Resources Information Center
Bancroft, Tyler D.; Hockley, William E.; Farquhar, Riley
2013-01-01
The effects of the duration of remember and forget cues were examined to test the differential rehearsal account of item-based directed forgetting. In Experiments 1 and 2, cues were shown for 300, 600, or 900 ms, and a directed forgetting effect (better recognition of remember than forget items) was found at each duration. In addition, recognition…
ERIC Educational Resources Information Center
Jin, Ying; Myers, Nicholas D.; Ahn, Soyeon
2014-01-01
Previous research has demonstrated that differential item functioning (DIF) methods that do not account for multilevel data structure could result in too frequent rejection of the null hypothesis (i.e., no DIF) when the intraclass correlation coefficient (?) of the studied item was the same as the ? of the total score. The current study extended…
Bender, Andrew R.; Raz, Naftali
2012-01-01
Ability to form new associations between unrelated items is particularly sensitive to aging, but the reasons for such differential vulnerability are unclear. In this study, we examined the role of objective and subjective factors (working memory and beliefs about memory strategies) on differential relations of age with recognition of items and associations. Healthy adults (N = 100, age 21 to 79) studied word pairs, completed item and association recognition tests, and rated the effectiveness of shallow (e.g., repetition) and deep (e.g., imagery or sentence generation) encoding strategies. Advanced age was associated with reduced working memory (WM) capacity and poorer associative recognition. In addition, reduced WM capacity, beliefs in the utility of ineffective encoding strategies, and lack of endorsement of effective ones were independently associated with impaired associative memory. Thus, maladaptive beliefs about memory in conjunction with reduced cognitive resources account in part for differences in associative memory commonly attributed to aging. PMID:22251381
Differential Performance by English Language Learners on an Inquiry-Based Science Assessment
NASA Astrophysics Data System (ADS)
Turkan, Sultan; Liu, Ou Lydia
2012-10-01
The performance of English language learners (ELLs) has been a concern given the rapidly changing demographics in US K-12 education. This study aimed to examine whether students' English language status has an impact on their inquiry science performance. Differential item functioning (DIF) analysis was conducted with regard to ELL status on an inquiry-based science assessment, using a multifaceted Rasch DIF model. A total of 1,396 seventh- and eighth-grade students took the science test, including 313 ELL students. The results showed that, overall, non-ELLs significantly outperformed ELLs. Of the four items that showed DIF, three favored non-ELLs while one favored ELLs. The item that favored ELLs provided a graphic representation of a science concept within a family context. There is some evidence that constructed-response items may help ELLs articulate scientific reasoning using their own words. Assessment developers and teachers should pay attention to the possible interaction between linguistic challenges and science content when designing assessment for and providing instruction to ELLs.
Disparities in Sense of Community: True race differences or differential item functioning?
Coffman, Donna L.; BeLue, Rhonda
2009-01-01
The sense of community index (SCI) has been widely used to measure psychological sense of community (SOC). Furthermore, SOC has been found to differ among racial groups. Since different ethnic groups have different cultural and historical experiences that may lead to different interpretations of measurement items, it is important to know whether the instrument used to measure the construct of interest has equivalency in measurement across groups or if the instrument exhibits differential item functioning (DIF). Examining DIF in the SCI helps assure that subgroup comparisons identify true differences in SOC between Blacks and Whites. We did not find DIF between races but we did find that that the SCI question ‘I feel at home in my neighborhood’ was a more reliable measure of SOC for Whites than for Blacks. In other words, this item has less measurement error for Whites than for Blacks. Therefore, differences on the SCI may be attributable to true differences in SOC between races rather than DIF. PMID:19890462
Paap, Muirne C S; Braeken, Johan; Pedersen, Geir; Urnes, Øyvind; Karterud, Sigmund; Wilberg, Theresa; Hummelen, Benjamin
2017-12-01
This study aims at evaluating the psychometric properties of the antisocial personality disorder (ASPD) criteria in a large sample of patients, most of whom had one or more personality disorders (PD). PD diagnoses were assessed by experienced clinicians using the Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders, 4th edition, Axis II PDs. Analyses were performed within an item response theory framework. Results of the analyses indicated that ASPD is a unidimensional construct that can be measured reliably at the upper range of the latent trait scale. Differential item functioning across gender was restricted to two criteria and had little impact on the latent ASPD trait level. Patients fulfilling both the adult ASPD criteria and the conduct disorder criteria had similar latent trait distributions as patients fulfilling only the adult ASPD criteria. Overall, the ASPD items fit the purpose of a diagnostic instrument well, that is, distinguishing patients with moderate from those with high antisocial personality scores.
Sharafi, Zahra
2017-01-01
Background The purpose of this study was to evaluate the effectiveness of two methods of detecting differential item functioning (DIF) in the presence of multilevel data and polytomously scored items. The assessment of DIF with multilevel data (e.g., patients nested within hospitals, hospitals nested within districts) from large-scale assessment programs has received considerable attention but very few studies evaluated the effect of hierarchical structure of data on DIF detection for polytomously scored items. Methods The ordinal logistic regression (OLR) and hierarchical ordinal logistic regression (HOLR) were utilized to assess DIF in simulated and real multilevel polytomous data. Six factors (DIF magnitude, grouping variable, intraclass correlation coefficient, number of clusters, number of participants per cluster, and item discrimination parameter) with a fully crossed design were considered in the simulation study. Furthermore, data of Pediatric Quality of Life Inventory™ (PedsQL™) 4.0 collected from 576 healthy school children were analyzed. Results Overall, results indicate that both methods performed equivalently in terms of controlling Type I error and detection power rates. Conclusions The current study showed negligible difference between OLR and HOLR in detecting DIF with polytomously scored items in a hierarchical structure. Implications and considerations while analyzing real data were also discussed. PMID:29312463
Sharafi, Zahra; Mousavi, Amin; Ayatollahi, Seyyed Mohammad Taghi; Jafari, Peyman
2017-01-01
The purpose of this study was to evaluate the effectiveness of two methods of detecting differential item functioning (DIF) in the presence of multilevel data and polytomously scored items. The assessment of DIF with multilevel data (e.g., patients nested within hospitals, hospitals nested within districts) from large-scale assessment programs has received considerable attention but very few studies evaluated the effect of hierarchical structure of data on DIF detection for polytomously scored items. The ordinal logistic regression (OLR) and hierarchical ordinal logistic regression (HOLR) were utilized to assess DIF in simulated and real multilevel polytomous data. Six factors (DIF magnitude, grouping variable, intraclass correlation coefficient, number of clusters, number of participants per cluster, and item discrimination parameter) with a fully crossed design were considered in the simulation study. Furthermore, data of Pediatric Quality of Life Inventory™ (PedsQL™) 4.0 collected from 576 healthy school children were analyzed. Overall, results indicate that both methods performed equivalently in terms of controlling Type I error and detection power rates. The current study showed negligible difference between OLR and HOLR in detecting DIF with polytomously scored items in a hierarchical structure. Implications and considerations while analyzing real data were also discussed.
Multiple determinants of lifespan memory differences
Henson, Richard N.; Campbell, Karen L.; Davis, Simon W.; Taylor, Jason R.; Emery, Tina; Erzinclioglu, Sharon; Tyler, Lorraine K.; Brayne, Carol; Bullmore, Edward T.; Calder, Andrew C.; Cusack, Rhodri; Dalgleish, Tim; Duncan, John; Matthews, Fiona E.; Marslen-Wilson, William D.; Rowe, James B.; Shafto, Meredith A.; Cheung, Teresa; Geerligs, Linda; McCarrey, Anna; Mustafa, Abdur; Price, Darren; Samu, David; Treder, Matthias; Tsvetanov, Kamen A.; van Belle, Janna; Williams, Nitin; Bates, Lauren; Gadie, Andrew; Gerbase, Sofia; Georgieva, Stanimira; Hanley, Claire; Parkin, Beth; Troy, David; Auer, Tibor; Correia, Marta; Gao, Lu; Green, Emma; Henriques, Rafael; Allen, Jodie; Amery, Gillian; Amunts, Liana; Barcroft, Anne; Castle, Amanda; Dias, Cheryl; Dowrick, Jonathan; Fair, Melissa; Fisher, Hayley; Goulding, Anna; Grewa, Adarsh; Hale, Geoff; Hilton, Andrew; Johnson, Frances; Johnston, Patricia; Kavanagh-Williamson, Thea; Kwasniewska, Magdalena; McMinn, Alison; Norman, Kim; Penrose, Jessica; Roby, Fiona; Rowland, Diane; Sargeant, John; Squire, Maggie; Stevens, Beth; Stoddart, Aldabra; Stone, Cheryl; Thompson, Tracy; Yazlik, Ozlem; Barnes, Dan; Dixon, Marie; Hillman, Jaya; Mitchell, Joanne; Villis, Laura; Kievit, Rogier A.
2016-01-01
Memory problems are among the most common complaints as people grow older. Using structural equation modeling of commensurate scores of anterograde memory from a large (N = 315), population-derived sample (www.cam-can.org), we provide evidence for three memory factors that are supported by distinct brain regions and show differential sensitivity to age. Associative memory and item memory are dramatically affected by age, even after adjusting for education level and fluid intelligence, whereas visual priming is not. Associative memory and item memory are differentially affected by emotional valence, and the age-related decline in associative memory is faster for negative than for positive or neutral stimuli. Gray-matter volume in the hippocampus, parahippocampus and fusiform cortex, and a white-matter index for the fornix, uncinate fasciculus and inferior longitudinal fasciculus, show differential contributions to the three memory factors. Together, these data demonstrate the extent to which differential ageing of the brain leads to differential patterns of memory loss. PMID:27600595
Superficial Priming in Episodic Recognition
ERIC Educational Resources Information Center
Dopkins, Stephen; Sargent, Jesse; Ngo, Catherine T.
2010-01-01
We explored the effect of superficial priming in episodic recognition and found it to be different from the effect of semantic priming in episodic recognition. Participants made recognition judgments to pairs of items, with each pair consisting of a prime item and a test item. Correct positive responses to the test item were impeded if the prime…
A Generative Approach to the Development of Hidden-Figure Items.
ERIC Educational Resources Information Center
Bejar, Issac I.; Yocom, Peter
This report explores an approach to item development and psychometric modeling which explicitly incorporates knowledge about the mental models used by examinees in the solution of items into a psychometric model that characterize performances on a test, as well as incorporating that knowledge into the item development process. The paper focuses on…
Cross-Classification and Category Representation in Children's Concepts
ERIC Educational Resources Information Center
Nguyen, Simone P.
2007-01-01
Items commonly belong to many categories. Cross-classification is the classification of a single item into more than one category. This research explored 2- to 6-year-old children's use of 2 different category systems for cross-classification: script (e.g., school-time items, birthday party items) and taxonomic (e.g., animals, clothes). The…
Affective Outcomes of Schooling: Full-Information Item Factor Analysis of a Student Questionnaire.
ERIC Educational Resources Information Center
Muraki, Eiji; Engelhard, George, Jr.
Recent developments in dichotomous factor analysis based on multidimensional item response models (Bock and Aitkin, 1981; Muthen, 1978) provide an effective method for exploring the dimensionality of questionnaire items. Implemented in the TESTFACT program, this "full information" item factor analysis accounts not only for the pairwise joint…
Preti, Antonio; Vellante, Marcello; Petretto, Donatella R
2017-05-01
The "Reading the Mind in the Eyes" Test (hereafter: Eyes Test) is considered an advanced task of the Theory of Mind aimed at assessing the performance of the participant in perspective-takingthat is, the ability to sense or understand other people's cognitive and emotional states. In this study, the item response theory analysis was applied to the adult version of the Eyes Test. The Italian version of the Eyes Test was administered to 200 undergraduate students of both genders (males = 46%). Modified parallel analysis (MPA) was used to test unidimensionality. Marginal maximum likelihood estimation was used to fit the 1-, 2-, and 3-parameter logistic (PL) model to the data. Differential Item Functioning (DIF) due to gender was explored with five independent methods. MPA provided evidence in favour of unidimensionality. The Rasch model (1-PL) was superior to the other two models in explaining participants' responses to the Eyes Test. There was no robust evidence of gender-related DIF in the Eyes Test, although some differences may exist for some items as a reflection of real differences by group. The study results support a one-factor model of the Eyes Test. Performance on the Eyes Test is defined by the participant's ability in perspective-taking. Researchers should cease using arbitrarily selected subscores in assessing the performance of participants to the Eyes Test. Lack of gender-related DIF favours the use of the Eyes Test in the investigation of gender differences concerning empathy and social cognition.
ConTour: Data-Driven Exploration of Multi-Relational Datasets for Drug Discovery.
Partl, Christian; Lex, Alexander; Streit, Marc; Strobelt, Hendrik; Wassermann, Anne-Mai; Pfister, Hanspeter; Schmalstieg, Dieter
2014-12-01
Large scale data analysis is nowadays a crucial part of drug discovery. Biologists and chemists need to quickly explore and evaluate potentially effective yet safe compounds based on many datasets that are in relationship with each other. However, there is a lack of tools that support them in these processes. To remedy this, we developed ConTour, an interactive visual analytics technique that enables the exploration of these complex, multi-relational datasets. At its core ConTour lists all items of each dataset in a column. Relationships between the columns are revealed through interaction: selecting one or multiple items in one column highlights and re-sorts the items in other columns. Filters based on relationships enable drilling down into the large data space. To identify interesting items in the first place, ConTour employs advanced sorting strategies, including strategies based on connectivity strength and uniqueness, as well as sorting based on item attributes. ConTour also introduces interactive nesting of columns, a powerful method to show the related items of a child column for each item in the parent column. Within the columns, ConTour shows rich attribute data about the items as well as information about the connection strengths to other datasets. Finally, ConTour provides a number of detail views, which can show items from multiple datasets and their associated data at the same time. We demonstrate the utility of our system in case studies conducted with a team of chemical biologists, who investigate the effects of chemical compounds on cells and need to understand the underlying mechanisms.
ERIC Educational Resources Information Center
Quaigrain, Kennedy; Arhin, Ato Kwamina
2017-01-01
Item analysis is essential in improving items which will be used again in later tests; it can also be used to eliminate misleading items in a test. The study focused on item and test quality and explored the relationship between difficulty index (p-value) and discrimination index (DI) with distractor efficiency (DE). The study was conducted among…
Brodey, Benjamin; Purcell, Susan E; Rhea, Karen; Maier, Philip; First, Michael; Zweede, Lisa; Sinisterra, Manuela; Nunn, M Brad; Austin, Marie-Paule; Brodey, Inger S
2018-03-23
The Structured Clinical Interview for DSM (SCID) is considered the gold standard assessment for accurate, reliable psychiatric diagnoses; however, because of its length, complexity, and training required, the SCID is rarely used outside of research. This paper aims to describe the development and initial validation of a Web-based, self-report screening instrument (the Screening Assessment for Guiding Evaluation-Self-Report, SAGE-SR) based on the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) and the SCID-5-Clinician Version (CV) intended to make accurate, broad-based behavioral health diagnostic screening more accessible within clinical care. First, study staff drafted approximately 1200 self-report items representing individual granular symptoms in the diagnostic criteria for the 8 primary SCID-CV modules. An expert panel iteratively reviewed, critiqued, and revised items. The resulting items were iteratively administered and revised through 3 rounds of cognitive interviewing with community mental health center participants. In the first 2 rounds, the SCID was also administered to participants to directly compare their Likert self-report and SCID responses. A second expert panel evaluated the final pool of items from cognitive interviewing and criteria in the DSM-5 to construct the SAGE-SR, a computerized adaptive instrument that uses branching logic from a screener section to administer appropriate follow-up questions to refine the differential diagnoses. The SAGE-SR was administered to healthy controls and outpatient mental health clinic clients to assess test duration and test-retest reliability. Cutoff scores for screening into follow-up diagnostic sections and criteria for inclusion of diagnoses in the differential diagnosis were evaluated. The expert panel reduced the initial 1200 test items to 664 items that panel members agreed collectively represented the SCID items from the 8 targeted modules and DSM criteria for the covered diagnoses. These 664 items were iteratively submitted to 3 rounds of cognitive interviewing with 50 community mental health center participants; the expert panel reviewed session summaries and agreed on a final set of 661 clear and concise self-report items representing the desired criteria in the DSM-5. The SAGE-SR constructed from this item pool took an average of 14 min to complete in a nonclinical sample versus 24 min in a clinical sample. Responses to individual items can be combined to generate DSM criteria endorsements and differential diagnoses, as well as provide indices of individual symptom severity. Preliminary measures of test-retest reliability in a small, nonclinical sample were promising, with good to excellent reliability for screener items in 11 of 13 diagnostic screening modules (intraclass correlation coefficient [ICC] or kappa coefficients ranging from .60 to .90), with mania achieving fair test-retest reliability (ICC=.50) and other substance use endorsed too infrequently for analysis. The SAGE-SR is a computerized adaptive self-report instrument designed to provide rigorous differential diagnostic information to clinicians. ©Benjamin Brodey, Susan E Purcell, Karen Rhea, Philip Maier, Michael First, Lisa Zweede, Manuela Sinisterra, M Brad Nunn, Marie-Paule Austin, Inger S Brodey. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 23.03.2018.
Kassam, Aliya; Donnon, Tyrone; Rigby, Ian
2014-03-01
There is a question of whether a single assessment tool can assess the key competencies of residents as mandated by the Royal College of Physicians and Surgeons of Canada CanMEDS roles framework. The objective of the present study was to investigate the reliability and validity of an emergency medicine (EM) in-training evaluation report (ITER). ITER data from 2009 to 2011 were combined for residents across the 5 years of the EM residency training program. An exploratory factor analysis with varimax rotation was used to explore the construct validity of the ITER. A total of 172 ITERs were completed on residents across their first to fifth year of training. A combined, 24-item ITER yielded a five-factor solution measuring the CanMEDs role Medical Expert/Scholar, Communicator/Collaborator, Professional, Health Advocate and Manager subscales. The factor solution accounted for 79% of the variance, and reliability coefficients (Cronbach alpha) ranged from α = 0.90 to 0.95 for each subscale and α = 0.97 overall. The combined, 24-item ITER used to assess residents' competencies in the EM residency program showed strong reliability and evidence of construct validity for assessment of the CanMEDS roles. Further research is needed to develop and test ITER items that will differentiate each CanMEDS role exclusively.
Cradle-to-Grave Logistic Technologies for Exploration Missions
NASA Technical Reports Server (NTRS)
Broyan, James L.; Ewert, Michael K.; Shull, Sarah
2013-01-01
Human exploration missions under study are very limited by the launch mass capacity of exiting and planned vehicles. The logistical mass of crew items is typically considered separate from the vehicle structure, habitat outfitting, and life support systems. Consequently, crew item logistical mass is typically competing with vehicle systems for mass allocation. NASA is Advanced Exploration Systems (AES) Logistics Reduction and Repurposing (LRR) Project is developing four logistics technologies guided by a systems engineering cradle-to-grave approach to enable used crew items to augment vehicle systems. Specifically, AES LRR is investigating the direct reduction of clothing mass, the repurposing of logistical packaging, the processing of spent crew items to benefit radiation shielding and water recovery, and the conversion of trash to propulsion supply gases. The systematic implementation of these types of technologies will increase launch mass efficiency by enabling items to be used for secondary purposes and improve the habitability of the vehicle as the mission duration increases. This paper provides a description, benefits, and challenges of the four technologies under development and a status of progress at the mid ]point of the three year AES project.
Logistics Reduction Technologies for Exploration Missions
NASA Technical Reports Server (NTRS)
Broyan, James L., Jr.; Ewert, Michael K.; Fink, Patrick W.
2014-01-01
Human exploration missions under study are very limited by the launch mass capacity of existing and planned vehicles. The logistical mass of crew items is typically considered separate from the vehicle structure, habitat outfitting, and life support systems. Consequently, crew item logistical mass is typically competing with vehicle systems for mass allocation. NASA's Advanced Exploration Systems (AES) Logistics Reduction and Repurposing (LRR) Project is developing five logistics technologies guided by a systems engineering cradle-to-grave approach to enable used crew items to augment vehicle systems. Specifically, AES LRR is investigating the direct reduction of clothing mass, the repurposing of logistical packaging, the use of autonomous logistics management technologies, the processing of spent crew items to benefit radiation shielding and water recovery, and the conversion of trash to propulsion gases. The systematic implementation of these types of technologies will increase launch mass efficiency by enabling items to be used for secondary purposes and improve the habitability of the vehicle as the mission duration increases. This paper provides a description and the challenges of the five technologies under development and the estimated overall mission benefits of each technology.
ERIC Educational Resources Information Center
Goldston, J. W.
This unit introduces analytic solutions of ordinary differential equations. The objective is to enable the student to decide whether a given function solves a given differential equation. Examples of problems from biology and chemistry are covered. Problem sets, quizzes, and a model exam are included, and answers to all items are provided. The…
Item Analyses of Memory Differences
Salthouse, Timothy A.
2017-01-01
Objective Although performance on memory and other cognitive tests is usually assessed with a score aggregated across multiple items, potentially valuable information is also available at the level of individual items. Method The current study illustrates how analyses of variance with item as one of the factors, and memorability analyses in which item accuracy in one group is plotted as a function of item accuracy in another group, can provide a more detailed characterization of the nature of group differences in memory. Data are reported for two memory tasks, word recall and story memory, across age, ability, repetition, delay, and longitudinal contrasts. Results The item-level analyses revealed evidence for largely uniform differences across items in the age, ability, and longitudinal contrasts, but differential patterns across items in the repetition contrast, and unsystematic item relations in the delay contrast. Conclusion Analyses at the level of individual items have the potential to indicate the manner by which group differences in the aggregate test score are achieved. PMID:27618285
Examining the validity and reliability of the Taita symptom checklist using Rasch analysis.
Chen, Yun-Ling; Pan, Ay-Woan; Chung, LyInn; Chen, Tsyr-Jang
2015-03-01
The Taita symptom checklist (TSCL) is a standardized self-rating psychiatric symptom scale for outpatients with mental illness in Taiwan. This study aimed to examine the validity and reliability of the TSCL using Rasch analysis. The TSCL was given to 583 healthy people and 479 people with mental illness. Rasch analysis was used to examine the appropriateness of the rating scale, the unidimensionality of the scale, the differential item functioning across sex and diagnosis, and the Rasch cut-off score of the scale. Rasch analysis confirmed that the revised 37 items with a three-point rating scale of the TSCL demonstrated good internal consistency and met criteria for unidimensionality. The person and item reliability indices were high. The TSCL could reliably measure healthy participants and patients with mental illness. Differential item functioning due to sex or psychiatric diagnosis was evident for three items. A Rasch cut-off score for TSCL was produced for detecting participants' psychiatric symptoms based on an eight-level classification. The TSCL is a reliable and valid assessment to evaluate the participants' perceived disturbance of psychiatric symptoms based on Rasch analysis. Copyright © 2013. Published by Elsevier B.V.
An Exploration of Paradox: High School and College Students' Self-Reported Motivations for Smoking.
ERIC Educational Resources Information Center
Austin, Megan K.; Brosh, Joanne; Chambliss, Catherine
This study explored experiential factors underlying cigarette smoking by administering a questionnaire consisting of the Rosenberg Self-esteem scale and items assessing smoking habits and motivations to 115 college students and 108 high school students. Directionally adjusted items were totaled to create summary scores for the four hypothesized…
ERIC Educational Resources Information Center
Baylor, Carolyn; McAuliffe, Megan J.; Hughes, Louise E.; Yorkston, Kathryn; Anderson, Tim; Jiseon, Kim; Amtmann, Dagmar
2014-01-01
Purpose: To examine the cross-cultural applicability of the Communicative Participation Item Bank (CPIB) through a comparison of respondents with Parkinson's disease (PD) from the United States and New Zealand. Method: A total of 428 respondents--218 from the United States and 210 from New Zealand-completed the self-report CPIB and a series of…
ERIC Educational Resources Information Center
Monahan, Patrick O.; Ankenmann, Robert D.
2010-01-01
When the matching score is either less than perfectly reliable or not a sufficient statistic for determining latent proficiency in data conforming to item response theory (IRT) models, Type I error (TIE) inflation may occur for the Mantel-Haenszel (MH) procedure or any differential item functioning (DIF) procedure that matches on summed-item…
ERIC Educational Resources Information Center
Elder, Catherine; McNamara, Tim; Congdon, Peter
2003-01-01
Used Rasch analytic procedures to study item bias or differential item functioning in both dichotomous and scalar items on a test of English for academic purposes. Results for 139 college students on a pilot English language test model the approach and illustrate the measurement challenges posed by a diagnostic instrument to measure English…
Item Response Theory and Health Outcomes Measurement in the 21st Century
Hays, Ron D.; Morales, Leo S.; Reise, Steve P.
2006-01-01
Item response theory (IRT) has a number of potential advantages over classical test theory in assessing self-reported health outcomes. IRT models yield invariant item and latent trait estimates (within a linear transformation), standard errors conditional on trait level, and trait estimates anchored to item content. IRT also facilitates evaluation of differential item functioning, inclusion of items with different response formats in the same scale, and assessment of person fit and is ideally suited for implementing computer adaptive testing. Finally, IRT methods can be helpful in developing better health outcome measures and in assessing change over time. These issues are reviewed, along with a discussion of some of the methodological and practical challenges in applying IRT methods. PMID:10982088
Using Shaping to Increase Foods Consumed by Children with Autism
ERIC Educational Resources Information Center
Hodges, Abby; Davis, Tonya; Crandall, Madison; Phipps, Laura; Weston, Regan
2017-01-01
The current study used differential reinforcement and shaping to increase the variety of foods accepted by children with autism who demonstrated significant feeding inflexibility. Participants were introduced to four new food items via a hierarchical exposure, which involved systematically increasing the desired response with the food item. Level…
Comparing DIF Methods for Data with Dual Dependency
ERIC Educational Resources Information Center
Jin, Ying; Kang, Minsoo
2016-01-01
Background: The current study compared four differential item functioning (DIF) methods to examine their performances in terms of accounting for dual dependency (i.e., person and item clustering effects) simultaneously by a simulation study, which is not sufficiently studied under the current DIF literature. The four methods compared are logistic…
A Rasch Analysis of the Junior Metacognitive Awareness Inventory with Singapore Students
ERIC Educational Resources Information Center
Ning, Hoi Kwan
2018-01-01
The psychometric properties of the 2 versions of the Junior Metacognitive Awareness Inventory were examined with Singapore student samples. Other than 2 misfitting items and an underutilized response scale, Rasch analysis demonstrated that the instruments have good measurement precision, and no differential item functioning was detected across…
Are Teacher Course Evaluations Biased against Faculty That Teach Quantitative Methods Courses?
ERIC Educational Resources Information Center
Royal, Kenneth D.; Stockdale, Myrah R.
2015-01-01
The present study investigated graduate students' responses to teacher/course evaluations (TCE) to determine if students' responses were inherently biased against faculty who teach quantitative methods courses. Item response theory (IRT) and Differential Item Functioning (DIF) techniques were utilized for data analysis. Results indicate students…
A Comparison of Strategies for Estimating Conditional DIF
ERIC Educational Resources Information Center
Moses, Tim; Miao, Jing; Dorans, Neil J.
2010-01-01
In this study, the accuracies of four strategies were compared for estimating conditional differential item functioning (DIF), including raw data, logistic regression, log-linear models, and kernel smoothing. Real data simulations were used to evaluate the estimation strategies across six items, DIF and No DIF situations, and four sample size…
IRTs of the ABCs: Children's Letter Name Acquisition
ERIC Educational Resources Information Center
Phillips, Beth M.; Piasta, Shayne B.; Anthony, Jason L.; Lonigan, Christopher J.; Francis, David J.
2012-01-01
We examined the developmental sequence of letter name knowledge acquisition by children from 2 to five years of age. Data from 2 samples representing diverse regions, ethnicity, and socioeconomic backgrounds (ns=1074 and 500) were analyzed using item response theory (IRT) and differential item functioning techniques. Results from factor analyses…
Lin, Chung-Ying; Griffiths, Mark D; Pakpour, Amir H
2018-03-01
Background and aims Research examining problematic mobile phone use has increased markedly over the past 5 years and has been related to "no mobile phone phobia" (so-called nomophobia). The 20-item Nomophobia Questionnaire (NMP-Q) is the only instrument that assesses nomophobia with an underlying theoretical structure and robust psychometric testing. This study aimed to confirm the construct validity of the Persian NMP-Q using Rasch and confirmatory factor analysis (CFA) models. Methods After ensuring the linguistic validity, Rasch models were used to examine the unidimensionality of each Persian NMP-Q factor among 3,216 Iranian adolescents and CFAs were used to confirm its four-factor structure. Differential item functioning (DIF) and multigroup CFA were used to examine whether males and females interpreted the NMP-Q similarly, including item content and NMP-Q structure. Results Each factor was unidimensional according to the Rach findings, and the four-factor structure was supported by CFA. Two items did not quite fit the Rasch models (Item 14: "I would be nervous because I could not know if someone had tried to get a hold of me;" Item 9: "If I could not check my smartphone for a while, I would feel a desire to check it"). No DIF items were found across gender and measurement invariance was supported in multigroup CFA across gender. Conclusions Due to the satisfactory psychometric properties, it is concluded that the Persian NMP-Q can be used to assess nomophobia among adolescents. Moreover, NMP-Q users may compare its scores between genders in the knowledge that there are no score differences contributed by different understandings of NMP-Q items.
Automated Scoring of an Interactive Geometry Item: A Proof-of-Concept
ERIC Educational Resources Information Center
Masters, Jessica
2010-01-01
An online interactive geometry item was developed to explore students' abilities to create prototypical and "tilted" rectangles out of line segments. The item was administered to 1,002 students. The responses to the item were hand-coded as correct, incorrect, or incorrect with possible evidence of a misconception. A variation of the nearest…
Examination of the PROMIS upper extremity item bank.
Hung, Man; Voss, Maren W; Bounsanga, Jerry; Crum, Anthony B; Tyser, Andrew R
Clinical measurement. The psychometric properties of the PROMIS v1.2 UE item bank were tested on various samples prior to its release, but have not been fully evaluated among the orthopaedic population. This study assesses the performance of the UE item bank within the UE orthopaedic patient population. The UE item bank was administered to 1197 adult patients presenting to a tertiary orthopaedic clinic specializing in hand and UE conditions and was examined using traditional statistics and Rasch analysis. The UE item bank fits a unidimensional model (outfit MNSQ range from 0.64 to 1.70) and has adequate reliabilities (person = 0.84; item = 0.82) and local independence (item residual correlations range from -0.37 to 0.34). Only one item exhibits gender differential item functioning. Most items target low levels of function. The UE item bank is a useful clinical assessment tool. Additional items covering higher functions are needed to enhance validity. Supplemental testing is recommended for patients at higher levels of function until more high function UE items are developed. 2c. Copyright © 2016 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.
Item Response Theory Analyses of the Cambridge Face Memory Test (CFMT)
Cho, Sun-Joo; Wilmer, Jeremy; Herzmann, Grit; McGugin, Rankin; Fiset, Daniel; Van Gulick, Ana E.; Ryan, Katie; Gauthier, Isabel
2014-01-01
We evaluated the psychometric properties of the Cambridge face memory test (CFMT; Duchaine & Nakayama, 2006). First, we assessed the dimensionality of the test with a bi-factor exploratory factor analysis (EFA). This EFA analysis revealed a general factor and three specific factors clustered by targets of CFMT. However, the three specific factors appeared to be minor factors that can be ignored. Second, we fit a unidimensional item response model. This item response model showed that the CFMT items could discriminate individuals at different ability levels and covered a wide range of the ability continuum. We found the CFMT to be particularly precise for a wide range of ability levels. Third, we implemented item response theory (IRT) differential item functioning (DIF) analyses for each gender group and two age groups (Age ≤ 20 versus Age > 21). This DIF analysis suggested little evidence of consequential differential functioning on the CFMT for these groups, supporting the use of the test to compare older to younger, or male to female, individuals. Fourth, we tested for a gender difference on the latent facial recognition ability with an explanatory item response model. We found a significant but small gender difference on the latent ability for face recognition, which was higher for women than men by 0.184, at age mean 23.2, controlling for linear and quadratic age effects. Finally, we discuss the practical considerations of the use of total scores versus IRT scale scores in applications of the CFMT. PMID:25642930
Setodji, Claude M; Elliott, Marc N; Abel, Gary; Burt, Jenni; Roland, Martin; Campbell, John
2015-09-01
To evaluate two 5-item patient experience scales from the English General Practice (GP) Patient Survey for evidence of differential item functioning (DIF) given prior evidence of substantially worse reported health care experiences for South Asian compared with white British respondents. A national survey of English patients' primary care experiences. We used classic test and item response theory analysis to examine the possibility of DIF by patient ethnicity (South Asian, white British) after controlling for age, sex, health status, and quality of life in the English GP Patient Survey conducted in 2011/2012. Data were available for 873,051 respondents (818,219 white British/54,832 South Asian from 7795 English practices) who answered items relating to experiences of GP or nurses' care. Internal consistency reliability was high and similar for South Asian and white British patients. White British patients reported better average experiences than South Asians, but there was no evidence of DIF or different item response curves for white British and South Asian respondents, even in sensitivity analyses using matched samples. All communication items in the English GP Patient Survey showed similar South Asian versus white British differences, with no evidence of DIF. In contrast, differences due to scale use or expectations are typically variable rather than constant across scales. While other possibilities remain, these findings increase the likelihood that the observed negative responses of South Asian patients to this national survey reflect true differences in their experiences of care.
Neural correlates of differential retrieval orientation: Sustained and item-related components.
Woodruff, C Chad; Uncapher, Melina R; Rugg, Michael D
2006-01-01
Retrieval orientation refers to a cognitive state that biases processing of retrieval cues in service of a specific goal. The present study used a mixed fMRI design to investigate whether adoption of different retrieval orientations - as indexed by differences in the activity elicited by retrieval cues corresponding to unstudied items - is associated with differences in the state-related activity sustained across a block of test trials sharing a common retrieval goal. Subjects studied mixed lists comprising visually presented words and pictures. They then undertook a series of short test blocks in which all test items were visually presented words. The blocks varied according to whether the test items were used to cue retrieval of studied words or studied pictures. In several regions, neural activity elicited by correctly classified new items differed according to whether words or pictures were the targeted material. The loci of these effects suggest that one factor driving differential cue processing is modulation of the degree of overlap between cue and targeted memory representations. In addition to these item-related effects, neural activity sustained throughout the test blocks also differed according to the nature of the targeted material. These findings indicate that the adoption of different retrieval orientations is associated with distinct neural states. The loci of these sustained effects were distinct from those where new item activity varied, suggesting that the effects may play a role in biasing retrieval cue processing in favor of the current retrieval goal.
Survey Page Length and Progress Indicators: What Are Their Relationships to Item Nonresponse?
ERIC Educational Resources Information Center
Bowman, Nicholas A.; Herzog, Serge; Sarraf, Shimon; Tukibayeva, Malika
2014-01-01
The popularity of online student surveys has been associated with greater item nonresponse. This chapter presents research aimed at exploring what factors might help minimize item nonresponse, such as altering online survey page length and using progress indicators.
Hagquist, Curt; Andrich, David
2017-09-19
Rasch analysis with a focus on Differential Item Functioning (DIF) is increasingly used for examination of psychometric properties of health outcome measures. To take account of DIF in order to retain precision of measurement, split of DIF-items into separate sample specific items has become a frequently used technique. The purpose of the paper is to present and summarise recent advances of analysis of DIF in a unified methodology. In particular, the paper focuses on the use of analysis of variance (ANOVA) as a method to simultaneously detect uniform and non-uniform DIF, the need to distinguish between real and artificial DIF and the trade-off between reliability and validity. An illustrative example from health research is used to demonstrate how DIF, in this case between genders, can be identified, quantified and under specific circumstances accounted for using the Rasch model. Rasch analyses of DIF were conducted of a composite measure of psychosomatic problems using Swedish data from the Health Behaviour in School-aged Children study for grade 9 students collected during the 1985-2014 time periods. The procedures demonstrate how DIF can be identified efficiently by ANOVA of residuals, and how the magnitude of DIF can be quantified and potentially accounted for by resolving items according to identifiable groups and using principles of test equating on the resolved items. The results of the analysis also show that the real DIF in some items does affect person measurement estimates. Firstly, in order to distinguish between real and artificial DIF, the items showing DIF initially should not be resolved simultaneously but sequentially. Secondly, while resolving instead of deleting a DIF item may retain reliability, both options may affect the content validity negatively. Resolving items with DIF is not justified if the source of the DIF is relevant for the content of the variable; then resolving DIF may deteriorate the validity of the instrument. Generally, decisions on resolving items to deal with DIF should also rely on external information.
Lix, Lisa M; Wu, Xiuyun; Hopman, Wilma; Mayo, Nancy; Sajobi, Tolulope T; Liu, Juxin; Prior, Jerilynn C; Papaioannou, Alexandra; Josse, Robert G; Towheed, Tanveer E; Davison, K Shawn; Sawatzky, Richard
2016-01-01
Self-reported health status measures, like the Short Form 36-item Health Survey (SF-36), can provide rich information about the overall health of a population and its components, such as physical, mental, and social health. However, differential item functioning (DIF), which arises when population sub-groups with the same underlying (i.e., latent) level of health have different measured item response probabilities, may compromise the comparability of these measures. The purpose of this study was to test for DIF on the SF-36 physical functioning (PF) and mental health (MH) sub-scale items in a Canadian population-based sample. Study data were from the prospective Canadian Multicentre Osteoporosis Study (CaMos), which collected baseline data in 1996-1997. DIF was tested using a multiple indicators multiple causes (MIMIC) method. Confirmatory factor analysis defined the latent variable measurement model for the item responses and latent variable regression with demographic and health status covariates (i.e., sex, age group, body weight, self-perceived general health) produced estimates of the magnitude of DIF effects. The CaMos cohort consisted of 9423 respondents; 69.4% were female and 51.7% were less than 65 years. Eight of 10 items on the PF sub-scale and four of five items on the MH sub-scale exhibited DIF. Large DIF effects were observed on PF sub-scale items about vigorous and moderate activities, lifting and carrying groceries, walking one block, and bathing or dressing. On the MH sub-scale items, all DIF effects were small or moderate in size. SF-36 PF and MH sub-scale scores were not comparable across population sub-groups defined by demographic and health status variables due to the effects of DIF, although the magnitude of this bias was not large for most items. We recommend testing and adjusting for DIF to ensure comparability of the SF-36 in population-based investigations.
Chan, Kitty S; Gross, Alden L; Pezzin, Liliana E; Brandt, Jason; Kasper, Judith D
2015-12-01
To harmonize measures of cognitive performance using item response theory (IRT) across two international aging studies. Data for persons ≥65 years from the Health and Retirement Study (HRS, N = 9,471) and the English Longitudinal Study of Aging (ELSA, N = 5,444). Cognitive performance measures varied (HRS fielded 25, ELSA 13); 9 were in common. Measurement precision was examined for IRT scores based on (a) common items, (b) common items adjusted for differential item functioning (DIF), and (c) DIF-adjusted all items. Three common items (day of date, immediate word recall, and delayed word recall) demonstrated DIF by survey. Adding survey-specific items improved precision but mainly for HRS respondents at lower cognitive levels. IRT offers a feasible strategy for harmonizing cognitive performance measures across other surveys and for other multi-item constructs of interest in studies of aging. Practical implications depend on sample distribution and the difficulty mix of in-common and survey-specific items. © The Author(s) 2015.
Detecting Differential Person Functioning in Emotional Intelligence
ERIC Educational Resources Information Center
Alsmadi, Yahia M.; Alsmadi, Abdalla A.
2009-01-01
Differential Item Functioning (DIF) is a widely used term in test development literature. It is very important to analyze test's data for DIF because It is a serious threat to validity. If the same data matrix was transposed, similar analysis can be carried for Differential Person Functioning (DPF). The purpose of this paper is to introduce and…
ERIC Educational Resources Information Center
Marsh, Herbert W.; Nagengast, Benjamin; Morin, Alexandre J. S.; Parada, Roberto H.; Craven, Rhonda G.; Hamilton, Linda R.
2011-01-01
Existing research posits multiple dimensions of bullying and victimization but has not identified well-differentiated facets of these constructs that meet standards of good measurement: goodness of fit, measurement invariance, lack of differential item functioning, and well-differentiated factors that are not so highly correlated as to detract…
Application of Item Response Theory to Tests of Substance-related Associative Memory
Shono, Yusuke; Grenard, Jerry L.; Ames, Susan L.; Stacy, Alan W.
2015-01-01
A substance-related word association test (WAT) is one of the commonly used indirect tests of substance-related implicit associative memory and has been shown to predict substance use. This study applied an item response theory (IRT) modeling approach to evaluate psychometric properties of the alcohol- and marijuana-related WATs and their items among 775 ethnically diverse at-risk adolescents. After examining the IRT assumptions, item fit, and differential item functioning (DIF) across gender and age groups, the original 18 WAT items were reduced to 14- and 15-items in the alcohol- and marijuana-related WAT, respectively. Thereafter, unidimensional one- and two-parameter logistic models (1PL and 2PL models) were fitted to the revised WAT items. The results demonstrated that both alcohol- and marijuana-related WATs have good psychometric properties. These results were discussed in light of the framework of a unified concept of construct validity (Messick, 1975, 1989, 1995). PMID:25134051
Measurement of Women's Agency in Egypt: A National Validation Study.
Yount, Kathryn M; VanderEnde, Kristin E; Dodell, Sylvie; Cheong, Yuk Fai
2016-09-01
Despite widespread assumptions about women's empowerment and agency in the Arab Middle East, psychometric research of these constructs is limited. Using national data from 6214 married women ages 16-49 who took part in the 2006 Egypt Labor Market Panel Survey, we applied factor analysis to explore and then to test the factor structure of women's agency. We then used multiple indicator multiple cause structural equations models to test for differential item functioning (DIF) by women's age at first marriage, a potential resource for women's agency. Our results confirm that women's agency in Egypt is multi-dimensional and comprised of their (1) influence in family decisions, including those reserved for men, (2) freedom of movement in public spaces, and (3) attitudes about gender, specifically violence against wives. These dimensions confirm those explored previously in selected rural areas of Egypt and South Asia. Yet, three items showed significant uniform DIF by women's categorical age at first marriage, with and without a control for women's age in years. Models adjusting for DIF and women's age in years showed that women's older age at first marriage was positively associated with the factor means for family decision-making and gender-violence attitudes, but not freedom of movement. Our findings reveal the value of our analytical strategy for research on the dimensions and determinants of women's agency. Our approach offers a promising model to discern "hierarchies of evidence" for social policies and programs to enhance women's empowerment.
Obbarius, Nina; Fischer, Felix; Obbarius, Alexander; Nolte, Sandra; Liegl, Gregor; Rose, Matthias
2018-04-10
To develop the first item bank to measure Stress Resilience (SR) in clinical populations. Qualitative item development resulted in an initial pool of 131 items covering a broad theoretical SR concept. These items were tested in n=521 patients at a psychosomatic outpatient clinic. Exploratory and Confirmatory Factor Analysis (CFA), as well as other state-of-the-art item analyses and IRT were used for item evaluation and calibration of the final item bank. Out of the initial item pool of 131 items, we excluded 64 items (54 factor loading <.5, 4 residual correlations >.3, 2 non-discriminative Item Response Curves, 4 Differential Item Functioning). The final set of 67 items indicated sufficient model fit in CFA and IRT analyses. Additionally, a 10-item short form with high measurement precision (SE≤.32 in a theta range between -1.8 and +1.5) was derived. Both the SR item bank and the SR short form were highly correlated with an existing static legacy tool (Connor-Davidson Resilience Scale). The final SR item bank and 10-item short form showed good psychometric properties. When further validated, they will be ready to be used within a framework of Computer-Adaptive Tests for a comprehensive assessment of the Stress-Construct. Copyright © 2018. Published by Elsevier Inc.
Harpole, Jared K; Levinson, Cheri A; Woods, Carol M; Rodebaugh, Thomas L; Weeks, Justin W; Brown, Patrick J; Heimberg, Richard G; Menatti, Andrew R; Blanco, Carlos; Schneier, Franklin; Liebowitz, Michael
2015-06-01
The Brief Fear of Negative Evaluation Scale (BFNE; Leary Personality and Social Psychology Bulletin , 9, 371-375, 1983) assesses fear and worry about receiving negative evaluation from others. Rodebaugh et al. Psychological Assessment, 16 , 169-181, (2004) found that the BFNE is composed of a reverse-worded factor (BFNE-R) and straightforwardly-worded factor (BFNE-S). Further, they found the BFNE-S to have better psychometric properties and provide more information than the BFNE-R. Currently there is a lack of research regarding the measurement invariance of the BFNE-S across gender and ethnicity with respect to item thresholds. The present study uses item response theory (IRT) to test the BFNE-S for differential item functioning (DIF) related to gender and ethnicity (White, Asian, and Black). Six data sets consisting of clinical, community, and undergraduate participants were utilized ( N =2,109). The factor structure of the BFNE-S was confirmed using categorical confirmatory factor analysis, IRT model assumptions were tested, and the BFNE-S was evaluated for DIF. Item nine demonstrated significant non-uniform DIF between White and Black participants. No other items showed significant uniform or non-uniform DIF across gender or ethnicity. Results suggest the BFNE-S can be used reliably with men and women and Asian and White participants. More research is needed to understand the implications of using the BFNE-S with Black participants.
Pedraza, Otto; Graff-Radford, Neill R.; Smith, Glenn E.; Ivnik, Robert J.; Willis, Floyd B.; Petersen, Ronald C.; Lucas, John A.
2010-01-01
Scores on the Boston Naming Test (BNT) are frequently lower for African American when compared to Caucasian adults. Although demographically-based norms can mitigate the impact of this discrepancy on the likelihood of erroneous diagnostic impressions, a growing consensus suggests that group norms do not sufficiently address or advance our understanding of the underlying psychometric and sociocultural factors that lead to between-group score discrepancies. Using item response theory and methods to detect differential item functioning (DIF), the current investigation moves beyond comparisons of the summed total score to examine whether the conditional probability of responding correctly to individual BNT items differs between African American and Caucasian adults. Participants included 670 adults age 52 and older who took part in Mayo's Older Americans and Older African Americans Normative Studies. Under a 2-parameter logistic IRT framework and after correction for the false discovery rate, 12 items where shown to demonstrate DIF. Six of these 12 items (“dominoes,” “escalator,” “muzzle,” “latch,” “tripod,” and “palette”) were also identified in additional analyses using hierarchical logistic regression models and represent the strongest evidence for race/ethnicity-based DIF. These findings afford a finer characterization of the psychometric properties of the BNT and expand our understanding of between-group performance. PMID:19570311
Yau, David T W; Wong, May C M; Lam, K F; McGrath, Colman
2015-08-19
Four-factor structure of the two 8-item short forms of Child Perceptions Questionnaire CPQ11-14 (RSF:8 and ISF:8) has been confirmed. However, the sum scores are typically reported in practice as a proxy of Oral health-related Quality of Life (OHRQoL), which implied a unidimensional structure. This study first assessed the unidimensionality of 8-item short forms of CPQ11-14. Item response theory (IRT) was employed to offer an alternative and complementary approach of validation and to overcome the limitations of classical test theory assumptions. A random sample of 649 12-year-old school children in Hong Kong was analyzed. Unidimensionality of the scale was tested by confirmatory factor analysis (CFA), principle component analysis (PCA) and local dependency (LD) statistic. Graded response model was fitted to the data. Contribution of each item to the scale was assessed by item information function (IIF). Reliability of the scale was assessed by test information function (TIF). Differential item functioning (DIF) across gender was identified by Wald test and expected score functions. Both CPQ11-14 RSF:8 and ISF:8 did not deviate much from the unidimensionality assumption. Results from CFA indicated acceptable fit of the one-factor model. PCA indicated that the first principle component explained >30 % of the total variation with high factor loadings for both RSF:8 and ISF:8. Almost all LD statistic <10 indicated the absence of local dependency. Flat and low IIFs were observed in the oral symptoms items suggesting little contribution of information to the scale and item removal caused little practical impact. Comparing the TIFs, RSF:8 showed slightly better information than ISF:8. In addition to oral symptoms items, the item "Concerned with what other people think" demonstrated a uniform DIF (p < 0.001). The expected score functions were not much different between boys and girls. Items related to oral symptoms were not informative to OHRQoL and deletion of these items is suggested. The impact of DIF across gender on the overall score was minimal. CPQ11-14 RSF:8 performed slightly better than ISF:8 in measurement precision. The 6-item short forms suggested by IRT validation should be further investigated to ensure their robustness, responsiveness and discriminative performance.
Early-Emerging Social Adaptive Skills in Toddlers with Autism Spectrum Disorders: An Item Analysis
ERIC Educational Resources Information Center
Ventola, Pamela; Saulnier, Celine A.; Steinberg, Elizabeth; Chawarska, Katarzyna; Klin, Ami
2014-01-01
Individuals with ASD have significant impairments in adaptive skills, particularly adaptive socialization skills. The present study examined the extent to which 20 items from the Vineland Adaptive Behavior Scales-Socialization Domain differentiated between ASD and developmentally delayed (DD) groups. Participants included 108 toddlers with ASD or…
Effects of Learning Experience on Forgetting Rates of Item and Associative Memories
ERIC Educational Resources Information Center
Yang, Jiongjiong; Zhan, Lexia; Wang, Yingying; Du, Xiaoya; Zhou, Wenxi; Ning, Xueling; Sun, Qing; Moscovitch, Morris
2016-01-01
Are associative memories forgotten more quickly than item memories, and does the level of original learning differentially influence forgetting rates? In this study, we addressed these questions by having participants learn single words and word pairs once (Experiment 1), three times (Experiment 2), and six times (Experiment 3) in a massed…
Fairness in Computerized Testing: Detecting Item Bias Using CATSIB with Impact Present
ERIC Educational Resources Information Center
Chu, Man-Wai; Lai, Hollis
2013-01-01
In educational assessment, there is an increasing demand for tailoring assessments to individual examinees through computer adaptive tests (CAT). As such, it is particularly important to investigate the fairness of these adaptive testing processes, which require the investigation of differential item function (DIF) to yield information about item…
Comparison of IRT Likelihood Ratio Test and Logistic Regression DIF Detection Procedures
ERIC Educational Resources Information Center
Atar, Burcu; Kamata, Akihito
2011-01-01
The Type I error rates and the power of IRT likelihood ratio test and cumulative logit ordinal logistic regression procedures in detecting differential item functioning (DIF) for polytomously scored items were investigated in this Monte Carlo simulation study. For this purpose, 54 simulation conditions (combinations of 3 sample sizes, 2 sample…
Effect of Purification Procedures on DIF Analysis in IRTPRO
ERIC Educational Resources Information Center
Fikis, David R. J.; Oshima, T. C.
2017-01-01
Purification of the test has been a well-accepted procedure in enhancing the performance of tests for differential item functioning (DIF). As defined by Lord, purification requires reestimation of ability parameters after removing DIF items before conducting the final DIF analysis. IRTPRO 3 is a recently updated program for analyses in item…
A Comparison of Lord's Chi Square and Raju's Area Measures in Detection of DIF.
ERIC Educational Resources Information Center
Cohen, Allan S.; Kim, Seock-Ho
1993-01-01
The effectiveness of two statistical tests of the area between item response functions (exact signed area and exact unsigned area) estimated in different samples, a measure of differential item functioning (DIF), was compared with Lord's chi square. Lord's chi square was found the most effective in determining DIF. (SLD)
Dynamic switching between semantic and episodic memory systems.
Kompus, Kristiina; Olsson, Carl-Johan; Larsson, Anne; Nyberg, Lars
2009-09-01
It has been suggested that episodic and semantic long-term memory systems interact during retrieval. Here we examined the flexibility of memory retrieval in an associative task taxing memories of different strength, assumed to differentially engage episodic and semantic memory. Healthy volunteers were pre-trained on a set of 36 face-name pairs over a 6-week period. Another set of 36 items was shown only once during the same time period. About 3 months after the training period all items were presented in a randomly intermixed order in an event-related fMRI study of face-name memory. Once presented items differentially activated anterior cingulate cortex and a right prefrontal region that previously have been associated with episodic retrieval mode. High-familiar items were associated with stronger activation of posterior cortices and a left frontal region. These findings fit a model of memory retrieval by which early processes determine, on a trial-by-trial basis, if the task can be solved by the default semantic system. If not, there is a dynamic shift to cognitive control processes that guide retrieval from episodic memory.
Bozzay, Melanie L; O'Leary, Kimberly N; De Nadai, Alessandro S; Gryglewicz, Kim; Romero, Gabriela; Karver, Marc S
2017-04-01
The present study examined differences in symptom presentation in screening for pediatric depression via evaluation of the Patient Health Questionnaire-9 (PHQ-9). In particular, we examined whether PHQ-9 items function differentially among deaf and hard-of-hearing (DHH; n = 75) and hearing (n = 75) youth based on participants recruited from crisis assessment services. Multiple indicators multiple causes models were used to examine whether items of the PHQ-9 functioned differently between groups as well as whether there were group differences in the mean severity of depressive symptoms. Results indicate that DHH youth were more likely to endorse psychosomatic items, and less likely to endorse an affective item. These findings indicate that the PHQ-9 functions differently when used with DHH youth. Implications of these findings are discussed, including both for future work with the PHQ-9 and with regard to the conceptualization of depression across hearing groups. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
The Exploration of the Relationship between Guessing and Latent Ability in IRT Models
ERIC Educational Resources Information Center
Gao, Song
2011-01-01
This study explored the relationship between successful guessing and latent ability in IRT models. A new IRT model was developed with a guessing function integrating probability of guessing an item correctly with the examinee's ability and the item parameters. The conventional 3PL IRT model was compared with the new 2PL-Guessing model on…
ERIC Educational Resources Information Center
Shumway, Jessica F.; Moyer-Packenham, Patricia S.; Baker, Joseph M.; Westenskow, Arla; Anderson-Pence, Katie L.; Tucker, Stephen I.; Boyer-Thurgood, Jennifer; Jordan, Kerry E.
2016-01-01
The purpose of this study was to explore the relationship between instructional modality used for teaching fractions and third- and fourth-grade students' responses and strategies to open-response fraction items. The participants were 155 third-grade and 200 fourth-grade students from 17 public school classrooms. Students within each class were…
Reeve, Bryce B; Stover, Angela M; Alfano, Catherine M; Smith, Ashley Wilder; Ballard-Barbash, Rachel; Bernstein, Leslie; McTiernan, Anne; Baumgartner, Kathy B; Piper, Barbara F
2012-11-01
Brief, valid measures of fatigue, a prevalent and distressing cancer symptom, are needed for use in research. This study's primary aim was to create a shortened version of the revised Piper Fatigue Scale (PFS-R) based on data from a diverse cohort of breast cancer survivors. A secondary aim was to determine whether the PFS captured multiple distinct aspects of fatigue (a multidimensional model) or a single overall fatigue factor (a unidimensional model). Breast cancer survivors (n = 799; stages in situ through IIIa; ages 29-86 years) were recruited through three SEER registries (New Mexico, Western Washington, and Los Angeles, CA) as part of the Health, Eating, Activity, and Lifestyle (HEAL) study. Fatigue was measured approximately 3 years post-diagnosis using the 22-item PFS-R that has four subscales (Behavior, Affect, Sensory, and Cognition). Confirmatory factor analysis was used to compare unidimensional and multidimensional models. Six criteria were used to make item selections to shorten the PFS-R: scale's content validity, items' relationship with fatigue, content redundancy, differential item functioning by race and/or education, scale reliability, and literacy demand. Factor analyses supported the original 4-factor structure. There was also evidence from the bi-factor model for a dominant underlying fatigue factor. Six items tested positive for differential item functioning between African-American and Caucasian survivors. Four additional items either showed poor association, local dependence, or content validity concerns. After removing these 10 items, the reliability of the PFS-12 subscales ranged from 0.87 to 0.89, compared to 0.90-0.94 prior to item removal. The newly developed PFS-12 can be used to assess fatigue in African-American and Caucasian breast cancer survivors and reduces response burden without compromising reliability or validity. This is the first study to determine PFS literacy demand and to compare PFS-R responses in African-Americans and Caucasian breast cancer survivors. Further testing in diverse populations is warranted.
Hill, Bridget; Pallant, Julie; Williams, Gavin; Olver, John; Ferris, Scott; Bialocerkowski, Andrea
2016-12-01
To evaluate the internal construct validity and dimensionality of a new patient-reported outcome measure for people with traumatic brachial plexus injury (BPI) based on the International Classification of Functioning, Disability and Health definition of activity. Cross-sectional study. Outpatient clinics. Adults (age range, 18-82y) with a traumatic BPI (N=106). There were 106 people with BPI who completed a 51-item 5-response questionnaire. Responses were analyzed in 4 phases (missing responses, item correlations, exploratory factor analysis, and Rasch analysis) to evaluate the properties of fit to the Rasch model, threshold response, local dependency, dimensionality, differential item functioning, and targeting. Not applicable, as this study addresses the development of an outcome measure. Six items were deleted for missing responses, and 10 were deleted for high interitem correlations >.81. The remaining 35 items, while demonstrating fit to the Rasch model, showed evidence of local dependency and multidimensionality. Items were divided into 3 subscales: dressing and grooming (8 items), arm and hand (17 items), and no hand (6 items). All 3 subscales demonstrated fit to the model with no local dependency, minimal disordered thresholds, no unidimensionality or differential item functioning for age, time postinjury, or self-selected dominance. Subscales were combined into 3 subtests and demonstrated fit to the model, no misfit, and unidimensionality, allowing calculation of a summary score. This preliminary analysis supports the internal construct validity of the Brachial Assessment Tool, a unidimensional targeted 4-response patient-reported outcome measure designed to solely assess activity after traumatic BPI regardless of level of injury, age at recruitment, premorbid limb dominance, and time postinjury. Further examination is required to determine test-retest reliability and responsiveness. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Bleau Lavigne, Maude; Reeves, Isabelle; Sasseville, Marie-Josée; Loignon, Christine
The primary purpose of this study was to develop 2 survey tools to explore factors influencing adoption of best practices for diabetic foot ulcer offloading treatment in primary health care settings. One survey was intended for the patients receiving care for a diabetic foot ulcer in primary health care settings and the other was intended for the health professionals providing treatment. The second purpose of this study was to evaluate the psychometric properties of the 2 surveys. Development and validation of survey instruments. Two surveys were developed using a published guide. Following review of pertinent literature and identification of variables to be measured, a bank of items was developed and pretested to determine clarity of the item and responses. Psychometric testing comprised measurement of content validity index (CVI) and intraclass correlation coefficient (ICC). Only items obtaining satisfactory CVI and ICC scores were included in the final version of the surveys. The final version of the patient survey contained 41 items and the final version of the survey for health care professionals contained 21 items. The patient-intended survey's items demonstrate high content validity scores and satisfactory test-retest reliability scores. The overall CVI score was 0.98. Forty of the 49 items eligible for testing obtain satisfactory ICC scores. One item's test-retest reliability could not be tested but it was retained based on its high CVI. The health professional-intended survey, an overall CVI score of 0.91 but items had lower ICC scores (63%, 31 of the 49 items), did not achieve a satisfactory ICC score for inclusion in the final instrument. This project led to development of 2 instruments designed to identify and explore factors influencing adoption of best practices for diabetic foot ulcer offloading treatment in the primary health care setting. Future research and testing is required to translate these French surveys into English and additional languages, in order to reach a broader population.
ERIC Educational Resources Information Center
Ye, Meng; Xin, Tao
2014-01-01
The authors explored the effects of drifting common items on vertical scaling within the higher order framework of item parameter drift (IPD). The results showed that if IPD occurred between a pair of test levels, the scaling performance started to deviate from the ideal state, as indicated by bias of scaling. When there were two items drifting…
Examining Multiple Sources of Differential Item Functioning on the Clinician & Group CAHPS® Survey
Rodriguez, Hector P; Crane, Paul K
2011-01-01
Objective To evaluate psychometric properties of a widely used patient experience survey. Data Sources English-language responses to the Clinician & Group Consumer Assessment of Healthcare Providers and Systems (CG-CAHPS®) survey (n = 12,244) from a 2008 quality improvement initiative involving eight southern California medical groups. Methods We used an iterative hybrid ordinal logistic regression/item response theory differential item functioning (DIF) algorithm to identify items with DIF related to patient sociodemographic characteristics, duration of the physician–patient relationship, number of physician visits, and self-rated physical and mental health. We accounted for all sources of DIF and determined its cumulative impact. Principal Findings The upper end of the CG-CAHPS® performance range is measured with low precision. With sensitive settings, some items were found to have DIF. However, overall DIF impact was negligible, as 0.14 percent of participants had salient DIF impact. Latinos who spoke predominantly English at home had the highest prevalence of salient DIF impact at 0.26 percent. Conclusions The CG-CAHPS® functions similarly across commercially insured respondents from diverse backgrounds. Consequently, previously documented racial and ethnic group differences likely reflect true differences rather than measurement bias. The impact of low precision at the upper end of the scale should be clarified. PMID:22092021
Cameron, Isobel M; Scott, Neil W; Adler, Mats; Reid, Ian C
2014-12-01
It is important for clinical practice and research that measurement scales of well-being and quality of life exhibit only minimal differential item functioning (DIF). DIF occurs where different groups of people endorse items in a scale to different extents after being matched by the intended scale attribute. We investigate the equivalence or otherwise of common methods of assessing DIF. Three methods of measuring age- and sex-related DIF (ordinal logistic regression, Rasch analysis and Mantel χ(2) procedure) were applied to Hospital Anxiety Depression Scale (HADS) data pertaining to a sample of 1,068 patients consulting primary care practitioners. Three items were flagged by all three approaches as having either age- or sex-related DIF with a consistent direction of effect; a further three items identified did not meet stricter criteria for important DIF using at least one method. When applying strict criteria for significant DIF, ordinal logistic regression was slightly less sensitive. Ordinal logistic regression, Rasch analysis and contingency table methods yielded consistent results when identifying DIF in the HADS depression and HADS anxiety scales. Regardless of methods applied, investigators should use a combination of statistical significance, magnitude of the DIF effect and investigator judgement when interpreting the results.
People's Intuitions about Randomness and Probability: An Empirical Study
ERIC Educational Resources Information Center
Lecoutre, Marie-Paule; Rovira, Katia; Lecoutre, Bruno; Poitevineau, Jacques
2006-01-01
What people mean by randomness should be taken into account when teaching statistical inference. This experiment explored subjective beliefs about randomness and probability through two successive tasks. Subjects were asked to categorize 16 familiar items: 8 real items from everyday life experiences, and 8 stochastic items involving a repeatable…
The Research Identity Scale: Psychometric Analyses and Scale Refinement
ERIC Educational Resources Information Center
Jorgensen, Maribeth F.; Schweinle, William E.
2018-01-01
The 68-item Research Identity Scale (RIS) was informed through qualitative exploration of research identity development in master's-level counseling students and practitioners. Classical psychometric analyses revealed the items had strong validity and reliability and a single factor. A one-parameter Rasch analysis and item review was used to…
Wang, Wei-Chun; Giovanello, Kelly S
2016-06-01
Considerable neuropsychological and neuroimaging work indicates that the medial temporal lobes are critical for both item and relational memory retrieval. However, there remain outstanding issues in the literature, namely the extent to which medial temporal lobe regions are differentially recruited during incidental and intentional retrieval of item and relational information, and the extent to which aging may affect these neural substrates. The current fMRI study sought to address these questions; participants incidentally encoded word pairs embedded in sentences and incidental item and relational retrieval were assessed through speeded reading of intact, rearranged, and new word-pair sentences, while intentional item and relational retrieval were assessed through old/new associative recognition of a separate set of intact, rearranged, and new word pairs. Results indicated that, in both younger and older adults, anterior hippocampus and perirhinal cortex indexed incidental and intentional item retrieval in the same manner. In contrast, posterior hippocampus supported incidental and intentional relational retrieval in both age groups and an adjacent cluster in posterior hippocampus was recruited during both forms of relational retrieval for older, but not younger, adults. Our findings suggest that while medial temporal lobe regions do not differentiate between incidental and intentional forms of retrieval, there are distinct roles for anterior and posterior medial temporal lobe regions during retrieval of item and relational information, respectively, and further indicate that posterior regions may, under certain conditions, be over-recruited in healthy aging. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Mowla, Arash; Kalantarhormozi, Mohammad Reza; Khazraee, Samaneh
2011-01-01
Differentiating major depressive disorder (MDD) without hypothyroidism from MDD associated with hypothyroidism can be challenging. Therefore some authors have suggested that thyroid function should be tested in all depressed patients. This study compared the clinical characteristics of patients with MDD associated with hypothyroidism with those of patients with MDD without hypothyroidism. Thyroid function tests were administered to 75 patients (60 female and 15 male) who met DSM-IV criteria for MDD. The 15 patients with hypothyroidism (8 with subclinical hypothyroidism and 7 with overt hypothyroidism) were compared with the other 60 patients with regard to depressive characteristics. The primary measure of depressive signs and symptoms used to assess depression severity and symptoms was the Hamilton Rating Scale for Depression, first 17 items (Ham-D-17). Baseline demographic data, including age and sex, were also compared. The two groups did not differ significantly in severity of overall depression at baseline, as measured by total score on the Ham-D-17 (P=0.471, Z=0.970). Patients with MDD without hypothyroidism had worse scores on item 1 (depressed mood), item 2 (feelings of guilt), item 3 (suicidality), item 6 (late insomnia), and item 16 (loss of weight). In contrast, depressed patients with hypothyroidism had more severe anxiety symptoms and greater agitation (items 9, 10, and 11). Our results may help clinicians differentiate MDD associated with hypothyroidism from MDD without hypothyroidism. Depressed patients with hypothyroidism had more anxiety symptoms and greater agitation, but they had fewer severe core depressive symptoms and biological signs of MDD. (Journal of Psychiatric Practice. 2011;17:67-71).
Malec, James F; Kean, Jacob; Altman, Irwin M; Swick, Shannon
2012-12-01
(1) To evaluate the measurement reliability and construct validity of the Mayo-Portland Adaptability Inventory, 4th revision (MPAI-4) in a sample consisting exclusively of patients with cerebrovascular accident (CVA) using single parameter (Rasch) item-response methods; (2) to examine the differential item functioning (DIF) by sex within the CVA population; and (3) to examine DIF and differential test functioning (DTF) across traumatic brain injury (TBI) and CVA samples. Retrospective psychometric analysis of rating scale data. Home- and community-based brain injury rehabilitation program. Individuals post-CVA (n=861) and individuals with TBI (n=603). Not applicable. MPAI-4. Item data on admission to community-based rehabilitation were submitted to Rasch, DIF, and DTF analyses. The final calibration in the CVA sample revealed satisfactory reliability/separation for persons (.91/3.16) and items (1.00/23.64). DIF showed that items for pain, anger, audition, and memory were associated with higher levels of disability for CVA than TBI patients; whereas, self-care, mobility, and use of hands indicated greater overall disability for TBI patients. DTF analyses showed a high degree of association between the 2 sets of items (R=.92; R(2)=.85) and, at most, a 3.7 point difference in raw scores. The MPAI-4 demonstrates satisfactory psychometric properties for use with individuals with CVA applying for interdisciplinary posthospital rehabilitation. DIF reveals clinically meaningful differences between CVA and TBI groups that should be considered in results at the item and subscale level. Copyright © 2012 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
A bait we cannot avoid: Food-induced motor distractibility.
Foroni, Francesco; Rumiati, Raffaella I; Coricelli, Carol; Ambron, Elisabetta
2016-12-01
Food is so central to humans' life that keeping our mind away from it is not an easy task. Because of its strong motivational value, food cues attract our attention. However, often food is truly not relevant to our on-going activities. In the present study we investigated the distracting role that task-irrelevant foods (natural and manufactured) and food-cues play in performing goal-directed reaching movements. We explored whether spatial and temporal parameters of reaching movement were influenced by the presence of task-irrelevant stimuli (i.e., distractor effect), and whether this effect was modulated by participants' implicit and explicit ratings of food items and participants' tendency to restrain their diet. First we found that the movement trajectory veered consistently toward food items and food-related distractors. Second, we found that participants' own evaluation of natural and manufactured food played a differential predicting role of the magnitude of temporal and spatial parameters of the distractor effect induced by these types of food. We conclude that perceptual and attentional systems provide preferential access to stimuli in the environment with high significance for organisms. Copyright © 2015 Elsevier Inc. All rights reserved.
Measuring romantic love: psychometric properties of the infatuation and attachment scales.
Langeslag, Sandra J E; Muris, Peter; Franken, Ingmar H A
2013-01-01
Romantic love is ubiquitous and has major influences on people's lives. Because romantic love consists of infatuation and attachment, researchers need to be able to differentiate between these constructs when examining the behavioral, affective, cognitive, and physiological correlates of this intriguing phenomenon. Existing love questionnaires appear less suitable for measuring the two-dimensional construct of romantic love. We present here the new 20-item Infatuation and Attachment Scales (IAS) questionnaire. In Study 1, exploratory factor analyses in a Dutch-speaking sample (n = 162) revealed a clear-cut two-factor structure, with 10 infatuation and 10 attachment items loading on separate components. This two-factor structure was confirmed in a new Dutch-speaking sample (n = 214, Study 2), and in an English-speaking sample (n = 183, Study 3). In all studies, it was additionally shown that both scales possessed good convergent and discriminant validity, as well as excellent internal consistency and test-retest reliability. We argue that the IAS is a widely applicable, psychometrically sound instrument that will be useful in future research exploring the effects of infatuation and attachment on behavior, emotion, cognition, peripheral physiology, and brain functioning.
Development and Validation of the Consumer Health Activation Index.
Wolf, Michael S; Smith, Samuel G; Pandit, Anjali U; Condon, David M; Curtis, Laura M; Griffith, James; O'Conor, Rachel; Rush, Steven; Bailey, Stacy C; Kaplan, Gordon; Haufle, Vincent; Martin, David
2018-04-01
Although there has been increasing interest in patient engagement, few measures are publicly available and suitable for patients with limited health literacy. We sought to develop a Consumer Health Activation Index (CHAI) for use among diverse patients. Expert opinion, a systematic literature review, focus groups, and cognitive interviews with patients were used to create and revise a potential set of items. Psychometric testing guided by item response theory was then conducted among 301 English-speaking, community-dwelling adults. This included differential item functioning analyses to evaluate item performance across participant health literacy levels. To determine construct validity, CHAI scores were compared to scales measuring similar personality constructs. Associations between the CHAI and physical and mental health established predictive validity. A second study among 9,478 adults was used to confirm CHAI associations with health outcomes. Exploratory factor analyses revealed a single-factor solution with a 10-item scale. The CHAI showed good internal consistency (alpha = 0.81) and moderate test-retest reliability (ICC = 0.53). Reading grade level was found to be at the 6 th grade. Moderate to strong correlations were found with similar constructs (Multidimensional Health Locus of Control, r = 0.38, P < 0.001; Conscientiousness, r = 0.41, P < 0.001). Predictive validity was demonstrated through associations with functional health status measures (depression, r = -0.28, P < 0.001; anxiety, r = -0.22, P < 0.001; and physical functioning, r = 0.22, P < 0.001). In the validation sample, the CHAI was significantly associated with self-reported physical and mental health ( r = 0.31 and 0.32 respectively; both P < 0.001). The CHAI appears to be a valid, reliable, and easily administered tool that can be used to assess health activation among adults, including those with limited health literacy. Future studies should test the tool in actual use and explore further applications.
ERIC Educational Resources Information Center
Wu, Li-Tzy; Ringwalt, Christopher L.; Yang, Chongming; Reeve, Bryce B.; Pan, Jeng-Jong; Blazer, Dan G.
2009-01-01
DSM-IV's hierarchical distinction between abuse of and dependence on prescription opioids is not supported since the symptoms of abuse in adolescents are not less severe than dependence. The finding is based on the examination of the DSM-IV criteria for opioid use disorders using item response theory.
ERIC Educational Resources Information Center
Donovan, Courtney; Green, Kathy E.; Seidel, Kent
2017-01-01
Core competencies essential for effective teaching were identified via a literature review and a review of standards for teacher education, and vetted by state groups with interests in teacher education. Survey items based on these competencies asked teacher candidates, graduates, and teacher education program faculty how well the program prepared…
ERIC Educational Resources Information Center
Edelen, Maria Orlando; McCaffrey, Daniel F.; Marshall, Grant N.; Jaycox, Lisa H.
2009-01-01
Accurate assessment of attitudes about intimate partner violence is important for evaluation of prevention and early intervention programs. Assessment of attitudes about cross-gender interactions is particularly susceptible to bias because it requires specifying the gender of the perpetrator and the victim. As it is likely that respondents will…
Testing the Item-Order Account of Design Effects Using the Production Effect
ERIC Educational Resources Information Center
Jonker, Tanya R.; Levene, Merrick; MacLeod, Colin M.
2014-01-01
A number of memory phenomena evident in recall in within-subject, mixed-lists designs are reduced or eliminated in between-subject, pure-list designs. The item-order account (McDaniel & Bugg, 2008) proposes that differential retention of order information might underlie this pattern. According to this account, order information may be encoded…
Testing for DIF in a Model with Single Peaked Item Characteristic Curves: The PARELLA Model.
ERIC Educational Resources Information Center
Hoijtink, Herbert; Molenaar, Ivo W.
1992-01-01
The PARallELogram Analysis (PARELLA) model is a probabilistic parallelogram model that can be used for the measurement of latent attitudes or latent preferences. A method is presented for testing for differential item functioning (DIF) for the PARELLA model using the approach of D. Thissen and others (1988). (SLD)
Disparities in Sense of Community: True Race Differences or Differential Item Functioning?
ERIC Educational Resources Information Center
Coffman, Donna L.; BeLue, Rhonda
2009-01-01
The sense of community index (SCI) has been widely used to measure psychological sense of community (SOC). Furthermore, SOC has been found to differ among racial groups. Because different ethnic groups have different cultural and historical experiences that may lead to different interpretations of measurement items, it is important to know whether…
Small-Sample DIF Estimation Using SIBTEST, Cochran's Z, and Log-Linear Smoothing
ERIC Educational Resources Information Center
Lei, Pui-Wa; Li, Hongli
2013-01-01
Minimum sample sizes of about 200 to 250 per group are often recommended for differential item functioning (DIF) analyses. However, there are times when sample sizes for one or both groups of interest are smaller than 200 due to practical constraints. This study attempts to examine the performance of Simultaneous Item Bias Test (SIBTEST),…
Examination of a Social-Networking Site Activities Scale (SNSAS) Using Rasch Analysis
ERIC Educational Resources Information Center
Alhaythami, Hassan; Karpinski, Aryn; Kirschner, Paul; Bolden, Edward
2017-01-01
This study examined the psychometric properties of a social-networking site (SNS) activities scale (SNSAS) using Rasch Analysis. Items were also examined with Rasch Principal Components Analysis (PCA) and Differential Item Functioning (DIF) across groups of university students (i.e., males and females from the United States [US] and Europe; N =…
Examining Gender DIF on a Multiple-Choice Test of Mathematics: A Confirmatory Approach.
ERIC Educational Resources Information Center
Ryan, Katherine E.; Fan, Meichu
1996-01-01
Results for 3,244 female and 3,033 male junior high school students from the Second International Mathematics Study show that applied items in algebra, geometry, and computation were easier for males but arithmetic items were differentially easier for females. Implications of these findings for assessment and instruction are discussed. (SLD)
A Comparison of Uniform DIF Effect Size Estimators under the MIMIC and Rasch Models
ERIC Educational Resources Information Center
Jin, Ying; Myers, Nicholas D.; Ahn, Soyeon; Penfield, Randall D.
2013-01-01
The Rasch model, a member of a larger group of models within item response theory, is widely used in empirical studies. Detection of uniform differential item functioning (DIF) within the Rasch model typically employs null hypothesis testing with a concomitant consideration of effect size (e.g., signed area [SA]). Parametric equivalence between…
Evaluation of MIMIC-Model Methods for DIF Testing with Comparison to Two-Group Analysis
ERIC Educational Resources Information Center
Woods, Carol M.
2009-01-01
Differential item functioning (DIF) occurs when an item on a test or questionnaire has different measurement properties for 1 group of people versus another, irrespective of mean differences on the construct. This study focuses on the use of multiple-indicator multiple-cause (MIMIC) structural equation models for DIF testing, parameterized as item…
To Sum or Not to Sum: Taxometric Analysis with Ordered Categorical Assessment Items
ERIC Educational Resources Information Center
Walters, Glenn D.; Ruscio, John
2009-01-01
Meehl's taxometric method has been shown to differentiate between categorical and dimensional data, but there are many ways to implement taxometric procedures. When analyzing the ordered categorical data typically provided by assessment instruments, summing items to form input indicators has been a popular practice for more than 20 years. A Monte…
ERIC Educational Resources Information Center
Raykov, Tenko; Dimitrov, Dimiter M.; Marcoulides, George A.; Li, Tatyana; Menold, Natalja
2018-01-01
A latent variable modeling method for studying measurement invariance when evaluating latent constructs with multiple binary or binary scored items with no guessing is outlined. The approach extends the continuous indicator procedure described by Raykov and colleagues, utilizes similarly the false discovery rate approach to multiple testing, and…
ERIC Educational Resources Information Center
Lee, Soo; Suh, Youngsuk
2018-01-01
Lord's Wald test for differential item functioning (DIF) has not been studied extensively in the context of the multidimensional item response theory (MIRT) framework. In this article, Lord's Wald test was implemented using two estimation approaches, marginal maximum likelihood estimation and Bayesian Markov chain Monte Carlo estimation, to detect…
ERIC Educational Resources Information Center
Kaplan, Randy M.; Bennett, Randy Elliot
This study explores the potential for using a computer-based scoring procedure for the formulating-hypotheses (F-H) item. This item type presents a situation and asks the examinee to generate explanations for it. Each explanation is judged right or wrong, and the number of creditable explanations is summed to produce an item score. Scores were…
ERIC Educational Resources Information Center
McLeod, Lori D.; Lewis, Charles; Thissen, David.
With the increased use of computerized adaptive testing, which allows for continuous testing, new concerns about test security have evolved, one being the assurance that items in an item pool are safeguarded from theft. In this paper, the risk of score inflation and procedures to detect test takers using item preknowledge are explored. When test…
Lambert, Michael Canute; Ferguson, Gail M; Rowan, George T
2016-03-01
Cross-national study of adolescents' psychological adjustment requires measures that permit reliable and valid assessment across informants and nations, but such measures are virtually nonexistent. Item-response-theory-based linking is a promising yet underutilized methodological procedure that permits more accurate assessment across informants and nations. To demonstrate this procedure, the Resilience Scale of the Behavioral Assessment for Children of African Heritage (Lambert et al., 2005) was administered to 250 African American and 294 Jamaican nonreferred adolescents and their caregivers. Multiple items without significant differential item functioning emerged, allowing scale linking across informants and nations. Calibrating item parameters via item response theory linking can permit cross-informant cross-national assessment of youth. (c) 2016 APA, all rights reserved).
Multiple Hypnotizabilities: Differentiating the Building Blocks of Hypnotic Response
ERIC Educational Resources Information Center
Woody, Erik Z.; Barnier, Amanda J.; McConkey, Kevin M.
2005-01-01
Although hypnotizability can be conceptualized as involving component subskills, standard measures do not differentiate them from a more general unitary trait, partly because the measures include limited sets of dichotomous items. To overcome this, the authors applied full-information factor analysis, a sophisticated analytic approach for…
Tree-Based Global Model Tests for Polytomous Rasch Models
ERIC Educational Resources Information Center
Komboz, Basil; Strobl, Carolin; Zeileis, Achim
2018-01-01
Psychometric measurement models are only valid if measurement invariance holds between test takers of different groups. Global model tests, such as the well-established likelihood ratio (LR) test, are sensitive to violations of measurement invariance, such as differential item functioning and differential step functioning. However, these…
Kisala, Pamela A.; Victorson, David; Pace, Natalie; Heinemann, Allen W.; Choi, Seung W.; Tulsky, David S.
2015-01-01
Objective To describe the development and psychometric properties of the SCI-QOL Psychological Trauma item bank and short form. Design Using a mixed-methods design, we developed and tested a Psychological Trauma item bank with patient and provider focus groups, cognitive interviews, and item response theory based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. Setting We tested a 31-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Veterans Administration hospital. Participants A total of 716 individuals with SCI completed the trauma items Results The 31 items fit a unidimensional model (CFI=0.952; RMSEA=0.061) and demonstrated good precision (theta range between 0.6 and 2.5). Nine items demonstrated negligible DIF with little impact on score estimates. The final calibrated item bank contains 19 items Conclusion The SCI-QOL Psychological Trauma item bank is a psychometrically robust measurement tool from which a short form and a computer adaptive test (CAT) version are available. PMID:26010967
Assessing psychological well-being: self-report instruments for the NIH Toolbox.
Salsman, John M; Lai, Jin-Shei; Hendrie, Hugh C; Butt, Zeeshan; Zill, Nicholas; Pilkonis, Paul A; Peterson, Christopher; Stoney, Catherine M; Brouwers, Pim; Cella, David
2014-02-01
Psychological well-being (PWB) has a significant relationship with physical and mental health. As a part of the NIH Toolbox for the Assessment of Neurological and Behavioral Function, we developed self-report item banks and short forms to assess PWB. Expert feedback and literature review informed the selection of PWB concepts and the development of item pools for positive affect, life satisfaction, and meaning and purpose. Items were tested with a community-dwelling US Internet panel sample of adults aged 18 and above (N = 552). Classical and item response theory (IRT) approaches were used to evaluate unidimensionality, fit of items to the overall measure, and calibrations of those items, including differential item function (DIF). IRT-calibrated item banks were produced for positive affect (34 items), life satisfaction (16 items), and meaning and purpose (18 items). Their psychometric properties were supported based on the results of factor analysis, fit statistics, and DIF evaluation. All banks measured the concepts precisely (reliability ≥0.90) for more than 98% of participants. These adult scales and item banks for PWB provide the flexibility, efficiency, and precision necessary to promote future epidemiological, observational, and intervention research on the relationship of PWB with physical and mental health.
Restricted interests and teacher presentation of items.
Stocco, Corey S; Thompson, Rachel H; Rodriguez, Nicole M
2011-01-01
Restricted and repetitive behavior (RRB) is more pervasive, prevalent, frequent, and severe in individuals with autism spectrum disorders (ASDs) than in their typical peers. One subtype of RRB is restricted interests in items or activities, which is evident in the manner in which individuals engage with items (e.g., repetitious wheel spinning), the types of items or activities they select (e.g., preoccupation with a phone book), or the range of items or activities they select (i.e., narrow range of items). We sought to describe the relation between restricted interests and teacher presentation of items. Overall, we observed 5 teachers interacting with 2 pairs of students diagnosed with an ASD. Each pair included 1 student with restricted interests. During these observations, teachers were free to present any items from an array of 4 stimuli selected by experimenters. We recorded student responses to teacher presentation of items and analyzed the data to determine the relation between teacher presentation of items and the consequences for presentation provided by the students. Teacher presentation of items corresponded with differential responses provided by students with ASD, and those with restricted preferences experienced a narrower array of items.
Packham, Tara L; Cappelleri, Joseph C; Sadosky, Alesia; MacDermid, Joy C; Brunner, Florian
2017-03-04
painDETECT (PD-Q) is a self-reported assessment of pain qualities developed as a screening tool for pain of neuropathic origin. Rasch analysis is a strategy for examining the measurement characteristics of a scale using a form of item response theory. We conducted a Rasch analysis to consider if the scoring and measurement properties of PD-Q would support its use as an outcome measure. Rasch analysis was conducted on PD-Q scores drawn from a cross-sectional study of the burden and costs of NeP. The analysis followed an iterative process based on recommendations in the literature, including examination of sequential scoring categories, unidimensionality, reliability and differential item function. Data from 624 persons with a diagnosis of painful diabetic polyneuropathy, small fibre neuropathy, and neuropathic pain associated with chronic low back pain, spinal cord injury, HIV-related pain, or chronic post-surgical pain was used for this analysis. PD-Q demonstrated fit to the Rasch model after adjustments of scoring categories for four items, and omission of the time course and radiating questions. The resulting seven-item scale of pain qualities demonstrated good reliability with a person-separation index of 0.79. No scoring bias (differential item functioning) was found for this version. Rasch modelling suggests the seven pain-qualities items from PD-Q may be used as an outcome measure. Further research is required to confirm validity and responsiveness in a clinical setting.
ERIC Educational Resources Information Center
Lee, Hee-Sun; Liu, Ou Lydia; Linn, Marcia C.
2011-01-01
This study explores measurement of a construct called knowledge integration in science using multiple-choice and explanation items. We use construct and instructional validity evidence to examine the role multiple-choice and explanation items plays in measuring students' knowledge integration ability. For construct validity, we analyze item…
77 FR 66872 - Records Schedules; Availability and Request for Comments
Federal Register 2010, 2011, 2012, 2013, 2014
2012-11-07
... lease and exploration. 4. Department of Justice, Agency-wide (DAA-0060-2012-0017, 1 item, 1 temporary.... Department of Justice, Federal Bureau of Investigation (N1-65- 12-1, 3 items, 3 temporary items). Master... victims of child pornography. 6. Department of Justice, Federal Bureau of Investigation (N1-65- 12-4, 5...
Memory for Multiple Visual Ensembles in Infancy
ERIC Educational Resources Information Center
Zosh, Jennifer M.; Halberda, Justin; Feigenson, Lisa
2011-01-01
The number of individual items that can be maintained in working memory is limited. One solution to this problem is to store representations of ensembles that contain summary information about large numbers of items (e.g., the approximate number or cumulative area of a group of many items). Here we explored the developmental origins of ensemble…
Automatic Item Generation via Frame Semantics: Natural Language Generation of Math Word Problems.
ERIC Educational Resources Information Center
Deane, Paul; Sheehan, Kathleen
This paper is an exploration of the conceptual issues that have arisen in the course of building a natural language generation (NLG) system for automatic test item generation. While natural language processing techniques are applicable to general verbal items, mathematics word problems are particularly tractable targets for natural language…
Measuring Knowledge of Introductory Psychology: What Are the Relevant Constructs?
ERIC Educational Resources Information Center
Milewski, Glenn B.; Patelis, Thanos
The 1999 Advanced Placement[R] (AP[R] Psychology Examination contains items drawn from 13 factors related to the study of psychology. This factor structure had not been explored previously. This study focuses on evaluating the fit of confirmatory factor analysis (CFA) models to examination items. Since examination items were dichotomous and…
Using Kernel Equating to Assess Item Order Effects on Test Scores
ERIC Educational Resources Information Center
Moses, Tim; Yang, Wen-Ling; Wilson, Christine
2007-01-01
This study explored the use of kernel equating for integrating and extending two procedures proposed for assessing item order effects in test forms that have been administered to randomly equivalent groups. When these procedures are used together, they can provide complementary information about the extent to which item order effects impact test…
Generalized Mantel-Haenszel Methods for Differential Item Functioning Detection
ERIC Educational Resources Information Center
Fidalgo, Angel M.; Madeira, Jaqueline M.
2008-01-01
Mantel-Haenszel methods comprise a highly flexible methodology for assessing the degree of association between two categorical variables, whether they are nominal or ordinal, while controlling for other variables. The versatility of Mantel-Haenszel analytical approaches has made them very popular in the assessment of the differential functioning…
ERIC Educational Resources Information Center
Gaitas, Sérgio; Alves Martins, Margarida
2017-01-01
This study analyses teacher perceived difficulty in implementing differentiated instructional strategies in regular classes. The participants were 273 Portuguese primary school teachers with teaching experience ranging from 1 to 33 years. A 39-item questionnaire was used to evaluate teacher perceived difficulty in relation to different…
Item Response Theory Analysis of the Psychopathic Personality Inventory-Revised.
Eichenbaum, Alexander E; Marcus, David K; French, Brian F
2017-06-01
This study examined item and scale functioning in the Psychopathic Personality Inventory-Revised (PPI-R) using an item response theory analysis. PPI-R protocols from 1,052 college student participants (348 male, 704 female) were analyzed. Analyses were conducted on the 131 self-report items comprising the PPI-R's eight content scales, using a graded response model. Scales collected a majority of their information about respondents possessing higher than average levels of the traits being measured. Each scale contained at least some items that evidenced limited ability to differentiate between respondents with differing levels of the trait being measured. Moreover, 80 items (61.1%) yielded significantly different responses between men and women presumably possessing similar levels of the trait being measured. Item performance was also influenced by the scoring format (directly scored vs. reverse-scored) of the items. Overall, the results suggest that the PPI-R, despite identifying psychopathic personality traits in individuals possessing high levels of those traits, may not identify these traits equally well for men and women, and scores are likely influenced by the scoring format of the individual item and scale.
Cupani, Marcos; Zamparella, Tatiana Castro; Piumatti, Gisella; Vinculado, Grupo
The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants' test burden. This study aims to develop a bank of items to measure the level of Knowledge on Biology using the Rasch model. The sample consisted of 1219 participants that studied in different faculties of the National University of Cordoba (mean age = 21.85 years, SD = 4.66; 66.9% are women). The items were organized in different forms and into separate subtests, with some common items across subtests. The students were told they had to answer 60 questions of knowledge on biology. Evaluation of Rasch model fit (Zstd >|2.0|), differential item functioning, dimensionality, local independence, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 180 items with good psychometric properties. The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. The contribution of this work is significant in the field of educational assessment in Argentina.
ERIC Educational Resources Information Center
Sachse, Karoline A.; Roppelt, Alexander; Haag, Nicole
2016-01-01
Trend estimation in international comparative large-scale assessments relies on measurement invariance between countries. However, cross-national differential item functioning (DIF) has been repeatedly documented. We ran a simulation study using national item parameters, which required trends to be computed separately for each country, to compare…
ERIC Educational Resources Information Center
Moreno-Martinez, F. Javier; Laws, Kieth R.
2007-01-01
There is a consensus that Alzheimer's disease (AD) impairs semantic information, with one of the first markers being anomia i.e. an impaired ability to name items. Doubts remain, however, about whether this naming impairment differentially affects items from the living and nonliving knowledge domains. Most studies have reported an impairment for…
ERIC Educational Resources Information Center
Wei, Tianlan; Chesnut, Steven R.; Barnard-Brak, Lucy; Stevens, Tara; Olivárez, Arturo, Jr.
2014-01-01
As the United States has begun to lag behind other developed countries in performance on mathematics and science, researchers have sought to explain this with theories of teaching, knowledge, and motivation. We expand this examination by further analyzing a measure of interest that has been linked to student performance in mathematics and…
ERIC Educational Resources Information Center
Puhan, Gautam; Moses, Tim P.; Yu, Lei; Dorans, Neil J.
2007-01-01
The purpose of the current study was to examine whether log-linear smoothing of observed score distributions in small samples results in more accurate differential item functioning (DIF) estimates under the simultaneous item bias test (SIBTEST) framework. Data from a teacher certification test were analyzed using White candidates in the reference…
ERIC Educational Resources Information Center
Monahan, Patrick O.; McHorney, Colleen A.; Stump, Timothy E.; Perkins, Anthony J.
2007-01-01
Previous methodological and applied studies that used binary logistic regression (LR) for detection of differential item functioning (DIF) in dichotomously scored items either did not report an effect size or did not employ several useful measures of DIF magnitude derived from the LR model. Equations are provided for these effect size indices.…
ERIC Educational Resources Information Center
Fidalgo, Angel M.
2011-01-01
Mantel-Haenszel (MH) methods constitute one of the most popular nonparametric differential item functioning (DIF) detection procedures. GMHDIF has been developed to provide an easy-to-use program for conducting DIF analyses. Some of the advantages of this program are that (a) it performs two-stage DIF analyses in multiple groups simultaneously;…
ERIC Educational Resources Information Center
Arikan, Serkan; van de Vijver, Fons J. R.; Yagmur, Kutlay
2018-01-01
We examined Differential Item Functioning (DIF) and the size of cross-cultural performance differences in the Programme for International Student Assessment (PISA) 2012 mathematics data before and after application of propensity score matching. The mathematics performance of Indonesian, Turkish, Australian, and Dutch students on released items was…
Investigating Causal DIF via Propensity Score Methods
ERIC Educational Resources Information Center
Liu, Yan; Zumbo, Bruno D.; Gustafson, Paul; Huang, Yi; Kroc, Edward; Wu, Amery D.
2016-01-01
A variety of differential item functioning (DIF) methods have been proposed and used for ensuring that a test is fair to all test takers in a target population in the situations of, for example, a test being translated to other languages. However, once a method flags an item as DIF, it is difficult to conclude that the grouping variable (e.g.,…
ERIC Educational Resources Information Center
Zhao, Jing
2012-01-01
The purpose of the study is to further investigate the validity of instruments used for collecting preservice teachers' perceptions of self-efficacy adapting the three-level IRT model described in Cheong's study (2006). The focus of the present study is to investigate whether the polytomously-scored items on the preservice teachers' self-efficacy…
ERIC Educational Resources Information Center
Braeken, Johan; Blömeke, Sigrid
2016-01-01
Using data from the international Teacher Education and Development Study: Learning to Teach Mathematics (TEDS-M), the measurement equivalence of teachers' beliefs across countries is investigated for the case of "mathematics-as-a fixed-ability". Measurement equivalence is a crucial topic in all international large-scale assessments and…
Baylor, Carolyn; Yorkston, Kathryn; Eadie, Tanya; Kim, Jiseon; Chung, Hyewon; Amtmann, Dagmar
2015-01-01
Purpose The purpose of this study was to calibrate the items for the Communicative Participation Item Bank (CPIB) using Item Response Theory (IRT). One overriding objective was to examine if the IRT item parameters would be consistent across different diagnostic groups, thereby allowing creation of a disorder-generic instrument. The intended outcomes were the final item bank and a short form ready for clinical and research applications. Methods Self-report data were collected from 701 individuals representing four diagnoses: multiple sclerosis, Parkinson’s disease, amyotrophic lateral sclerosis and head and neck cancer. Participants completed the CPIB and additional self-report questionnaires. CPIB data were analyzed using the IRT Graded Response Model (GRM). Results The initial set of 94 candidate CPIB items were reduced to an item bank of 46 items demonstrating unidimensionality, local independence, good item fit, and good measurement precision. Differential item function (DIF) analyses detected no meaningful differences across diagnostic groups. A 10-item, disorder-generic short form was generated. Conclusions The CPIB provides speech-language pathologists with a unidimensional, self-report outcomes measurement instrument dedicated to the construct of communicative participation. This instrument may be useful to clinicians and researchers wanting to implement measures of communicative participation in their work. PMID:23816661
Jones, Richard N
2006-11-01
Knowledge of the extent to which measurement of adult cognitive functioning differs between Spanish and English language administrations of the Mini-Mental State Examination (MMSE) is critical for inclusive, representative, and valid research of older adults in the United States. We sought to demonstrate the use of an item response theory (IRT) based structural equation model, that is, the MIMIC model (multiple indicators, multiple causes), to evaluate MMSE responses for evidence of differential item functioning (DIF) attributable to language of administration. We studied participants in a dementia case registry study (n = 1546), 42% of whom were examined with the Spanish language MMSE. Twelve of 21 items were identified as having significant uniform DIF. The 4 most discrepant included orientation to season, orientation to state, repeat phrase, and follow command. DIF accounted for two-thirds of the observed difference in underlying level of cognitive functioning between Spanish- and English-language administration groups. Failing to account for measurement differences may lead to spurious inferences regarding language group differences in level of underlying level of cognitive functioning. The MIMIC model can be used to detect and adjust for such measurement differences in substantive research.
The Psychometric Structure of Items Assessing Autogynephilia.
Hsu, Kevin J; Rosenthal, A M; Bailey, J Michael
2015-07-01
Autogynephilia, or paraphilic sexual arousal in a man to the thought or image of himself as a woman, manifests in a variety of different behaviors and fantasies. We examined the psychometric structure of 22 items assessing five known types of autogynephilia by subjecting them to exploratory factor analysis in a sample of 149 autogynephilic men. Results of oblique factor analyses supported the ability to distinguish five group factors with suitable items. Results of hierarchical factor analyses suggest that the five group factors were strongly underlain by a general factor of autogynephilia. Because the general factor accounted for a much greater amount of the total variance of the 22 items than did the group factors, the types of autogynephilia that a man has seem less important than the degree to which he has autogynephilia. However, the five types of autogynephilia remain conceptually useful because meaningful distinctions were found among them, including differential rates of endorsement and differential ability to predict other relevant variables like gender dysphoria. Factor-derived scales and subscales demonstrated good internal consistency reliabilities, and validity, with large differences found between autogynephilic men and heterosexual male controls. Future research should attempt to replicate our findings, which were mostly exploratory.
Basch, Corey Hannah; Ethan, Danna; Rajan, Sonali
2013-08-25
Legislation in NYC requires chain restaurants to post calorie information on menu boards in an effort to help consumers make more informed decisions about food and beverage items they are purchasing. While this is a step in the right direction in light of the current obesity epidemic, there are other issues that warrant attention in a fast food setting, namely the pricing of healthy food options, promotional strategies, and access to comprehensive nutrition information. This study focused on a popular fast-food chain in NYC. The study's aims were threefold: (1) to determine the cost differential between the healthiest meal item on the chain's general menu and meal items available specifically on a reduced cost menu for one dollar (US$1.00); (2) to identify and describe the promotions advertised in the windows of these restaurants, as well as the nutrition content of promoted items; and (3) to ascertain availability of comprehensive nutrition information to consumers within the restaurants. We found the healthiest meal item to be significantly higher in price than less nutritious meal items available for $1.00 (t=146.9, p<.001), with the mean cost differential equal to $4.33 (95% CI: $4.27, $4.39). Window promotions generally advertised less healthful menu items, which may aid in priming customers to purchase these versus more healthful options. Comprehensive nutrition information beyond calorie counts was not readily accessible prior to purchasing. In addition to improving access to comprehensive nutrition information, advertising more of and lowering the prices of nutritious options may encourage consumers to purchase healthier foods in a fast food setting. Additional research in this area is needed in other geographic locations and restaurant chains.
Peipert, John D; Bentler, Peter; Klicko, Kristi; Hays, Ron D
2018-05-14
Black dialysis patients report better health-related quality of life (HRQOL) than White patients, which may be explained if Black and White patients respond systematically differently to HRQOL survey items. We examined differential item functioning (DIF) of the Kidney Disease Quality of Life 36-item (KDQOL TM -36) Burden of Kidney Disease, Symptoms and Problems with Kidney Disease, and Effects of Kidney Disease scales between Black (n = 18,404) and White (n = 21,439) dialysis patients. We fit multiple group confirmatory factor analysis models with increasing invariance: a Configural model (invariant factor structure), a Metric model (invariant factor loadings), and a Scalar model (invariant intercepts). Criteria for invariance included non-significant χ 2 tests, > 0.002 difference in the models' CFI, and > 0.015 difference in RMSEA and SRMR. Next, starting with a fully invariant model, we freed loadings and intercepts item-by-item to determine if DIF impacted estimated KDQOL TM -36 scale means. ΔCFI was 0.006 between the metric and scalar models but was reduced to 0.001 when we freed intercepts for the burdens and symptoms and problems of kidney disease scales. In comparison to standardized means of 0 in the White group, those for the Black group on the Burdens, Symptoms and Problems, and Effects of Kidney Disease scales were 0.218, 0.061, and 0.161, respectively. When loadings and thresholds were released sequentially, differences in means between models ranged between 0.001 and 0.048. Despite some DIF, impacts on KDQOL TM -36 responses appear to be minimal. We conclude that the KDQOL TM -36 is appropriate to make substantive comparisons of HRQOL between Black and White dialysis patients.
Chin, Kelly M; Gomberg-Maitland, Mardi; Channick, Richard N; Cuttica, Michael J; Fischer, Aryeh; Frantz, Robert P; Hunsche, Elke; Kleinman, Leah; McConnell, John W; McLaughlin, Vallerie V; Miller, Chad E; Zamanian, Roham T; Zastrow, Michael S; Badesch, David B
2018-04-26
Disease-specific patient-reported outcome (PRO) instruments are important in assessing the impact of disease and treatment. PAH-SYMPACT ® is the first questionnaire for quantifying pulmonary arterial hypertension (PAH) symptoms and impacts developed following the 2009 FDA PRO guidance; previous qualitative research with PAH patients supported its initial content validity. Content finalization and psychometric validation were conducted using data from SYMPHONY, a single-arm, 16-week study with macitentan 10mg in US patients with PAH. Item performance, Rasch, and factor analyses were used to select final item content of the PRO and define its domain structure. Internal consistency, test-retest reliability, known-group and construct validity, sensitivity to change, and influence of oxygen on item performance were evaluated. Data from 278 patients (79% female, mean age 60 years) were analyzed. Following removal of redundant/misfitting items, the final questionnaire has 11 symptom items across 2 domains (cardiopulmonary and cardiovascular symptoms) and 11 impact items across 2 domains (physical and cognitive/emotional impacts). Differential item function analysis confirmed PRO scoring is unaffected by oxygen use. For all 4 domains, internal consistency reliability was high (Cronbach's alpha >0.80) and scores were highly reproducible in stable patients (intra-class correlation coefficient 0.84-0.94). Correlations with CAMPHOR and SF-36 were moderate-to-high ([r]=0.34-0.80). The questionnaire differentiated well between patients with different disease severity levels, and was sensitive to improvements in clinician- and patient-reported disease severity. The PAH-SYMPACT ® is a brief, disease-specific PRO instrument possessing good psychometric properties which can be administered in clinical practice and clinical studies. Copyright © 2018. Published by Elsevier Inc.
Souza, Mariana Angélica Peixoto; Coster, Wendy Jane; Mancini, Marisa Cotta; Dutra, Fabiana Caetano Martins Silva; Kramer, Jessica; Sampaio, Rosana Ferreira
2017-12-08
A person's participation is acknowledged as an important outcome of the rehabilitation process. The Participation Scale (P-Scale) is an instrument that was designed to assess the participation of individuals with a health condition or disability. The scale was developed in an effort to better describe the participation of people living in middle-income and low-income countries. The aim of this study was to use Rasch analysis to examine whether the Participation Scale is suitable to assess the perceived ability to take part in participation situations by patients with diverse levels of function. The sample was comprised by 302 patients from a public rehabilitation services network. Participants had orthopaedic or neurological health conditions, were at least 18 years old, and completed the Participation Scale. Rasch analysis was conducted using the Winsteps software. The mean age of all participants was 45.5 years (standard deviation = 14.4), 52% were male, 86% had orthopaedic conditions, and 52% had chronic symptoms. Rasch analysis was performed using a dichotomous rating scale, and only one item showed misfit. Dimensionality analysis supported the existence of only one Rasch dimension. The person separation index was 1.51, and the item separation index was 6.38. Items N2 and N14 showed Differential Item Functioning between men and women. Items N6 and N12 showed Differential Item Functioning between acute and chronic conditions. The item difficulty range was -1.78 to 2.09 logits, while the sample ability range was -2.41 to 4.61 logits. The P-Scale was found to be useful as a screening tool for participation problems reported by patients in a rehabilitation context, despite some issues that should be addressed to further improve the scale.
Basch, Corey Hannah; Ethan, Danna; Rajan, Sonali
2013-01-01
Legislation in NYC requires chain restaurants to post calorie information on menu boards in an effort to help consumers make more informed decisions about food and beverage items they are purchasing. While this is a step in the right direction in light of the current obesity epidemic, there are other issues that warrant attention in a fast food setting, namely the pricing of healthy food options, promotional strategies, and access to comprehensive nutrition information. This study focused on a popular fast-food chain in NYC. The study’s aims were threefold: (1) to determine the cost differential between the healthiest meal item on the chain’s general menu and meal items available specifically on a reduced cost menu for one dollar (US$1.00); (2) to identify and describe the promotions advertised in the windows of these restaurants, as well as the nutrition content of promoted items; and (3) to ascertain availability of comprehensive nutrition information to consumers within the restaurants. We found the healthiest meal item to be significantly higher in price than less nutritious meal items available for $1.00 (t = 146.9, p < .001), with the mean cost differential equal to $4.33 (95% CI $4.27, $4.39). Window promotions generally advertised less healthful menu items, which may aid in priming customers to purchase these versus more healthful options. Comprehensive nutrition information beyond calorie counts was not readily accessible prior to purchasing. In addition to improving access to comprehensive nutrition information, advertising more of and lowering the prices of nutritious options may encourage consumers to purchase healthier foods in a fast food setting. Additional research in this area is needed in other geographic locations and restaurant chains. PMID:24171876
[Differential item functioning: a bibliometric analysis of journals published in Spanish].
Guilera, Georgina; Gómez, Juana; Hidalgo, M Dolores
2006-11-01
Differential item functioning: a bibliometric analysis of journals published in Spanish. This study aims to provide an overview of scientific productivity with respect to articles published in Spanish on the issue of DIF. The documents included in the study were identified using the Psicodoc database, as well as the Science Citation Index and Social Science Citation Index from the Web of Science. The analyses carried out are focused mainly on presenting the frequencies and percentages of publications with respect to various bibliometric indicators. The results reveal that interest in the issue of DIF has increased, and that the universities are the most productive institutions. The majority of articles have been published in the journal Psicothema.
Are Atypical Things More Popular?
Berger, Jonah; Packard, Grant
2018-04-01
Why do some cultural items become popular? Although some researchers have argued that success is random, we suggest that how similar items are to each other plays an important role. Using natural language processing of thousands of songs, we examined the relationship between lyrical differentiation (i.e., atypicality) and song popularity. Results indicated that the more different a song's lyrics are from its genre, the more popular it becomes. This relationship is weaker in genres where lyrics matter less (e.g., dance) or where differentiation matters less (e.g., pop) and occurs for lyrical topics but not style. The results shed light on cultural dynamics, why things become popular, and the psychological foundations of culture more broadly.
Using the Item Response Theory (IRT) for Educational Evaluation through Games
ERIC Educational Resources Information Center
Euzébio Batista, Marcelo Henrique; Victória Barbosa, Jorge Luis; da Rosa Tavares, João Elison; Hackenhaar, Jonathan Luis
2013-01-01
This article shows the application of Item Response Theory (IRT) for educational evaluation using games. The article proposes a computational model to create user profiles, called Psychometric Profile Generator (PPG). PPG uses the IRT mathematical model for exploring the levels of skills and behaviors in the form of items and/or stimuli. The model…
ERIC Educational Resources Information Center
Anderson, Daniel; Kahn, Joshua D.; Tindal, Gerald
2017-01-01
Unidimensionality and local independence are two common assumptions of item response theory. The former implies that all items measure a common latent trait, while the latter implies that responses are independent, conditional on respondents' location on the latent trait. Yet, few tests are truly unidimensional. Unmodeled dimensions may result in…
ERIC Educational Resources Information Center
Zhang, Danhui; Orrill, Chandra; Campbell, Todd
2015-01-01
The purpose of this study was to investigate whether mixture Rasch models followed by qualitative item-by-item analysis of selected Programme for International Student Assessment (PISA) mathematics and science items offered insight into knowledge students invoke in mathematics and science separately and combined. The researchers administered an…
Sensitivity of Equated Aggregate Scores to the Treatment of Misbehaving Common Items
ERIC Educational Resources Information Center
Michaelides, Michalis P.
2010-01-01
The delta-plot method (Angoff, 1972) is a graphical technique used in the context of test equating for identifying common items with aberrant changes in their item difficulties across administrations or alternate forms. This brief research report explores the effects on equated aggregate scores when delta-plot outliers are either retained in or…
Vafaei, Afshin; Alvarado, Beatriz; Tomás, Concepcion; Muro, Carmen; Martinez, Beatriz; Zunzunegui, Maria Victoria
2014-01-01
The Bem Sex Role Inventory (BSRI) is the most commonly used and validated gender role measurement tool across countries and age groups. However, it has been rarely validated in older adults and sporadically used in aging and health studies. Perceived gender role is a crucial part of a person's identity and an established determinant of health. Androgyny model suggests that those with high levels of both masculinity and femininity (androgynous) are more adaptive and hence have better health. Our objectives were to explore the validity of BSRI in an older Spanish population, to compare different standard methods of measuring gender roles, and to examine their impact on health indicators. The BSRI and health indicator questions were completed by 120 community-dwelling adults aged 65+ living in Aragon, Spain. Exploratory factor analysis was performed to examine psychometric properties of the BSRI. Androgyny was measured by three approaches: geometric mean, t-ratio, and traditional four-gender groups classification. Relationships between health indicators and gender roles were explored. Factor analysis resulted in two-factor solution consistent with the original masculine and feminine items with high loadings and good reliability. There were no associations between biological sex and gender roles. Different gender role measurement approaches classified participants differently into gender role groups. Overall, androgyny was associated with better mobility and physical and mental health. The traditional four groups approach showed higher compatibility with the androgyny model and was better able to disentangle the differential impact of gender roles on health. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Measurement of Women’s Agency in Egypt: A National Validation Study
VanderEnde, Kristin E.; Dodell, Sylvie; Cheong, Yuk Fai
2015-01-01
Despite widespread assumptions about women’s empowerment and agency in the Arab Middle East, psychometric research of these constructs is limited. Using national data from 6214 married women ages 16–49 who took part in the 2006 Egypt Labor Market Panel Survey, we applied factor analysis to explore and then to test the factor structure of women’s agency. We then used multiple indicator multiple cause structural equations models to test for differential item functioning (DIF) by women’s age at first marriage, a potential resource for women’s agency. Our results confirm that women’s agency in Egypt is multi-dimensional and comprised of their (1) influence in family decisions, including those reserved for men, (2) freedom of movement in public spaces, and (3) attitudes about gender, specifically violence against wives. These dimensions confirm those explored previously in selected rural areas of Egypt and South Asia. Yet, three items showed significant uniform DIF by women’s categorical age at first marriage, with and without a control for women’s age in years. Models adjusting for DIF and women’s age in years showed that women’s older age at first marriage was positively associated with the factor means for family decision-making and gender-violence attitudes, but not freedom of movement. Our findings reveal the value of our analytical strategy for research on the dimensions and determinants of women’s agency. Our approach offers a promising model to discern “hierarchies of evidence” for social policies and programs to enhance women’s empowerment. PMID:27597801
Cubaka, Vincent Kalumire; Schriver, Michael; Vedsted, Peter; Makoul, Gregory; Kallestrup, Per
2018-04-23
To identify, adapt and validate a measure for providers' communication and interpersonal skills in Rwanda. After selection, translation and piloting of the measure, structural validity, test-retest reliability, and differential item functioning were assessed. Identification and adaptation: The 14-item Communication Assessment Tool (CAT) was selected and adapted. Content validation found all items highly relevant in the local context except two, which were retained upon understanding the reasoning applied by patients. Eleven providers and 291 patients were involved in the field-testing. Confirmatory factor analysis showed a good fit for the original one factor model. Test-retest reliability assessment revealed a mean quadratic weighted Kappa = 0.81 (range: 0.69-0.89, N = 57). The average proportion of excellent scores was 15.7% (SD: 24.7, range: 9.9-21.8%, N = 180). Differential item functioning was not observed except for item 1, which focuses on greetings, for age groups (p = 0.02, N = 180). The Kinyarwanda version of CAT (K-CAT) is a reliable and valid patient-reported measure of providers' communication and interpersonal skills. K-CAT was validated on nurses and its use on other types of providers may require further validation. K-CAT is expected to be a valuable feedback tool for providers in practice and in training. Copyright © 2018 Elsevier B.V. All rights reserved.
Kalibatseva, Z; Leong, F T L; Ham, E H
2014-09-01
Theoretical and clinical publications suggest the existence of cultural differences in the expression and experience of depression. Measurement non-equivalence remains a potential methodological explanation for the lower prevalence of depression among Asian Americans compared to European Americans. This study compared DSM-IV depressive symptoms among Asian Americans and European Americans using secondary data analysis of the Collaborative Psychiatric Epidemiology Surveys (CPES). The Composite International Diagnostic Interview (CIDI) was used for the assessment of depressive symptoms. Of the entire sample, 310 Asian Americans and 1974 European Americans reported depressive symptoms and were included in the analyses. Measurement variance was examined with an item response theory differential item functioning (IRT DIF) analysis. χ2 analyses indicated that, compared to Asian Americans, European American participants more frequently endorsed affective symptoms such as 'feeling depressed', 'feeling discouraged' and 'cried more often'. The IRT analysis detected DIF for four out of the 15 depression symptom items. At equal levels of depression, Asian Americans endorsed feeling worthless and appetite changes more easily than European Americans, and European Americans endorsed feeling nervous and crying more often than Asian Americans. Asian Americans did not seem to over-report somatic symptoms; however, European Americans seemed to report more affective symptoms than Asian Americans. The results suggest that there was measurement variance in a few of the depression items.
Optimal segmentation and packaging process
Kostelnik, Kevin M.; Meservey, Richard H.; Landon, Mark D.
1999-01-01
A process for improving packaging efficiency uses three dimensional, computer simulated models with various optimization algorithms to determine the optimal segmentation process and packaging configurations based on constraints including container limitations. The present invention is applied to a process for decontaminating, decommissioning (D&D), and remediating a nuclear facility involving the segmentation and packaging of contaminated items in waste containers in order to minimize the number of cuts, maximize packaging density, and reduce worker radiation exposure. A three-dimensional, computer simulated, facility model of the contaminated items are created. The contaminated items are differentiated. The optimal location, orientation and sequence of the segmentation and packaging of the contaminated items is determined using the simulated model, the algorithms, and various constraints including container limitations. The cut locations and orientations are transposed to the simulated model. The contaminated items are actually segmented and packaged. The segmentation and packaging may be simulated beforehand. In addition, the contaminated items may be cataloged and recorded.
Measurement equivalence and differential item functioning in family psychology.
Bingenheimer, Jeffrey B; Raudenbush, Stephen W; Leventhal, Tama; Brooks-Gunn, Jeanne
2005-09-01
Several hypotheses in family psychology involve comparisons of sociocultural groups. Yet the potential for cross-cultural inequivalence in widely used psychological measurement instruments threatens the validity of inferences about group differences. Methods for dealing with these issues have been developed via the framework of item response theory. These methods deal with an important type of measurement inequivalence, called differential item functioning (DIF). The authors introduce DIF analytic methods, linking them to a well-established framework for conceptualizing cross-cultural measurement equivalence in psychology (C.H. Hui and H.C. Triandis, 1985). They illustrate the use of DIF methods using data from the Project on Human Development in Chicago Neighborhoods (PHDCN). Focusing on the Caregiver Warmth and Environmental Organization scales from the PHDCN's adaptation of the Home Observation for Measurement of the Environment Inventory, the authors obtain results that exemplify the range of outcomes that may result when these methods are applied to psychological measurement instruments. (c) 2005 APA, all rights reserved
[Evaluation of the factorial and metric equivalence of the Sexual Assertiveness Scale (SAS) by sex].
Sierra, Juan Carlos; Santos-Iglesias, Pablo; Vallejo-Medina, Pablo
2012-05-01
Sexual assertiveness refers to the ability to initiate sexual activity, refuse unwanted sexual activity, and use contraceptive methods to avoid sexually transmitted diseases, developing healthy sexual behaviors. The Sexual Assertiveness Scale (SAS) assesses these three dimensions. The purpose of this study is to evaluate, using structural equation modeling and differential item functioning, the equivalence of the scale between men and women. Standard scores are also provided. A total of 4,034 participants from 21 Spanish provinces took part in the study. Quota sampling method was used. Results indicate a strict equivalent dimensionality of the Sexual Assertiveness Scale across sexes. One item was flagged by differential item functioning, although it does not affect the scale. Therefore, there is no significant bias in the scale when comparing across sexes. Standard scores show similar Initiation assertiveness scores for men and women, and higher scores on Refusal and Sexually Transmitted Disease Prevention for women. This scale can be used on men and women with sufficient psychometric guarantees.
Jafari, Peyman; Sharafi, Zahra; Bagheri, Zahra; Shalileh, Sara
2014-06-01
Measurement equivalence is a necessary assumption for meaningful comparison of pediatric quality of life rated by children and parents. In this study, differential item functioning (DIF) analysis is used to examine whether children and their parents respond consistently to the items in the KINDer Lebensqualitätsfragebogen (KINDL; in German, Children Quality of Life Questionnaire). Two DIF detection methods, graded response model (GRM) and ordinal logistic regression (OLR), were applied for comparability. The KINDL was completed by 1,086 school children and 1,061 of their parents. While the GRM revealed that 12 out of the 24 items were flagged with DIF, the OLR identified 14 out of the 24 items with DIF. Seven items with DIF and five items without DIF were common across the two methods, yielding a total agreement rate of 50 %. This study revealed that parent proxy-reports cannot be used as a substitute for a child's ratings in the KINDL.
Arias González, Víctor B; Crespo Sierra, María Teresa; Arias Martínez, Benito; Martínez-Molina, Agustín; Ponce, Fernando P
2015-09-23
The Connor-Davidson Resilience Scale (CD-RISC) is inarguably one of the best-known instruments in the field of resilience assessment. However, the criteria for the psychometric quality of the instrument were based only on classical test theory. The aim of this paper has focused on the calibration of the CD-RISC with a nonclinical sample of 444 adults using the Rasch-Andrich Rating Scale Model, in order to clarify its structure and analyze its psychometric properties at the level of item. Two items showed misfit to the model and were eliminated. The remaining 22 items form basically a unidimensional scale. The CD-RISC has good psychometric properties. The fit of both the items and the persons to the Rasch model was good, and the response categories were functioning properly. Two of the items showed differential item functioning. The CD-RISC has an obvious ceiling effect, which suggests to include more difficult items in future versions of the scale.
Computer-adaptive test to measure community reintegration of Veterans.
Resnik, Linda; Tian, Feng; Ni, Pengsheng; Jette, Alan
2012-01-01
The Community Reintegration of Injured Service Members (CRIS) measure consists of three scales measuring extent of, perceived limitations in, and satisfaction with community reintegration. Length of the CRIS may be a barrier to its widespread use. Using item response theory (IRT) and computer-adaptive test (CAT) methodologies, this study developed and evaluated a briefer community reintegration measure called the CRIS-CAT. Large item banks for each CRIS scale were constructed. A convenience sample of 517 Veterans responded to all items. Exploratory and confirmatory factor analyses (CFAs) were used to identify the dimensionality within each domain, and IRT methods were used to calibrate items. Accuracy and precision of CATs of different lengths were compared with the full-item bank, and data were examined for differential item functioning (DIF). CFAs supported unidimensionality of scales. Acceptable item fit statistics were found for final models. Accuracy of 10-, 15-, 20-, and variable-item CATs for all three scales was 0.88 or above. CAT precision increased with number of items administered and decreased at the upper ranges of each scale. Three items exhibited moderate DIF by sex. The CRIS-CAT demonstrated promising measurement properties and is recommended for use in community reintegration assessment.
The Curiosity and Exploration Inventory-II: Development, Factor Structure, and Psychometrics
Kashdan, Todd B.; Gallagher, Matthew W.; Silvia, Paul J.; Winterstein, Beate P.; Breen, William E.; Terhar, Daniel; Steger, Michael F.
2009-01-01
Given curiosity’s fundamental role in motivation, learning, and well-being, we sought to refine the measurement of trait curiosity with an improved version of the Curiosity and Exploration Inventory (CEI; Kashdan, Rose, & Fincham, 2004). A preliminary pool of 36 items was administered to 311 undergraduate students, who also completed measures of emotion, emotion regulation, personality, and well-being. Factor analyses indicated a two factor model—motivation to seek out knowledge and new experiences (Stretching; 5 items) and a willingness to embrace the novel, uncertain, and unpredictable nature of everyday life (Embracing; 5 items). In two additional samples (ns = 150 and 119), we cross-validated this factor structure and provided initial evidence for construct validity. This includes positive correlations with personal growth, openness to experience, autonomy, purpose in life, self-acceptance, psychological flexibility, positive affect, and positive social relations, among others. Applying item response theory (IRT) to these samples (n = 578), we showed that the items have good discrimination and a desirable breadth of difficulty. The item information functions and test information function were centered near zero, indicating that the scale assesses the mid-range of the latent curiosity trait most reliably. The findings thus far provide good evidence for the psychometric properties of the 10-item CEI-II. PMID:20160913
Scientific literacy: Factor structure and gender differences
NASA Astrophysics Data System (ADS)
Manhart, James Joseph
The purpose of this study was to investigate the factor structure of scientific literacy and to document any gender differences with respect to each factor. Participants included 1139 students (574 females, 565 males) in grades 9 through 12 who were taking a science class at one of four Midwestern high schools. Based on National Science Education Standards, a 100 item multiple-choice test was constructed to assess scientific literacy. Confirmatory factor analysis of item parcels suggested a three factor model was the best way to explain the data resulting from the administration of this test. The factors were labeled constructs of science, abilities necessary to do scientific inquiry, and social aspects of science. Gender differences with respect to these factors were examined using analysis of variance procedures. Because differential enrollment in science classes could cause gender differences in grades 11 and 12, parallel analyses were conducted on the grades 9 and 10 subsample and the grades 11 and 12 subsample. However, the results of the two analyses were similar. The most consistent gender difference observed was that females performed better than males on the social aspects of science factor. Males tended to perform better than females on the constructs of science factor, although no consistent gender difference was noted for items dealing with life science. With respect to the abilities necessary to do scientific inquiry factor, females tended to perform better than males in grades 9 and 10, while no consistent gender difference was observed in grades 11 and 12. Gender differences were also examined using the Mantel-Haenszel procedure to flag individual items that functioned differently for females and males of the same ability. Twelve items were flagged for grades 9 and 10 (8 in favor of females, 4 in favor of males). Fourteen items were flagged for grades 11 and 12 (7 in favor of females, 7 in favor of males). All of the flagged items exhibited only small to moderate differential item functioning (DIF). Only three items were similarly flagged in both subsamples, one item from each factor.
NASA Astrophysics Data System (ADS)
Ding, Lin
2014-02-01
Discipline-based science concept assessments are powerful tools to measure learners' disciplinary core ideas. Among many such assessments, the Brief Electricity and Magnetism Assessment (BEMA) has been broadly used to gauge student conceptions of key electricity and magnetism (E&M) topics in college-level introductory physics courses. Differing from typical concept inventories that focus only on one topic of a subject area, BEMA covers a broad range of topics in the electromagnetism domain. In spite of this fact, prior studies exclusively used a single aggregate score to represent individual students' overall understanding of E&M without explicating the construct of this assessment. Additionally, BEMA has been used to compare traditional physics courses with a reformed course entitled Matter and Interactions (M&I). While prior findings were in favor of M&I, no empirical evidence was sought to rule out possible differential functioning of BEMA that may have inadvertently advantaged M&I students. In this study, we used Rasch analysis to seek two missing pieces regarding the construct and differential functioning of BEMA. Results suggest that although BEMA items generally can function together to measure the same construct of application and analysis of E&M concepts, several items may need further revision. Additionally, items that demonstrate differential functioning for the two courses are detected. Issues such as item contextual features and student familiarity with question settings may underlie these findings. This study highlights often overlooked threats in science concept assessments and provides an exemplar for using evidence-based reasoning to make valid inferences and arguments.
What you say matters: exploring visual-verbal interactions in visual working memory.
Mate, Judit; Allen, Richard J; Baqués, Josep
2012-01-01
The aim of this study was to explore whether the content of a simple concurrent verbal load task determines the extent of its interference on memory for coloured shapes. The task consisted of remembering four visual items while repeating aloud a pair of words that varied in terms of imageability and relatedness to the task set. At test, a cue appeared that was either the colour or the shape of one of the previously seen objects, with participants required to select the object's other feature from a visual array. During encoding and retention, there were four verbal load conditions: (a) a related, shape-colour pair (from outside the experimental set, i.e., "pink square"); (b) a pair of unrelated but visually imageable, concrete, words (i.e., "big elephant"); (c) a pair of unrelated and abstract words (i.e., "critical event"); and (d) no verbal load. Results showed differential effects of these verbal load conditions. In particular, imageable words (concrete and related conditions) interfered to a greater degree than abstract words. Possible implications for how visual working memory interacts with verbal memory and long-term memory are discussed.
Three Classes of Nonparametric Differential Step Functioning Effect Estimators
ERIC Educational Resources Information Center
Penfield, Randall D.
2008-01-01
The examination of measurement invariance in polytomous items is complicated by the possibility that the magnitude and sign of lack of invariance may vary across the steps underlying the set of polytomous response options, a concept referred to as differential step functioning (DSF). This article describes three classes of nonparametric DSF effect…
ERIC Educational Resources Information Center
Pierce, W. David; Sydie, R. A.; Stratkotter, Rainer
2003-01-01
Male and female participants (N = 274) made judgments about the social concepts of "feminist," "man," and "woman" on 63 semantic differential items. Factor analysis identified three basic dimensions termed evaluative, potency, and activity as well as two secondary factors called expressiveness and sexuality. Results for the evaluative dimension…
ERIC Educational Resources Information Center
Mendes-Barnett, Sharon; Ercikan, Kadriye
2006-01-01
This study contributes to understanding sources of gender differential item functioning (DIF) on mathematics tests. This study focused on identifying sources of DIF and differential bundle functioning for boys and girls on the British Columbia Principles of Mathematics Exam (Grade 12) using a confirmatory SIBTEST approach based on a…
Sex Differential Item Functioning in the Inventory of Early Development III Social-Emotional Skills
ERIC Educational Resources Information Center
Beaver, Jessica L.; French, Brian F.; Finch, W. Holmes; Ullrich-French, Sarah C.
2014-01-01
Social-emotional (SE) skills in the early developmental years of children influence outcomes in psychological, behavioral, and learning domains. The adult ratings of a child's SE skills can be influenced by sex stereotypes. These rating differences could lead to differential conclusions about developmental progress or risk. To ensure that…
Comparing the Lexical Features of EAP Students' Essays by Prompt and Rating
ERIC Educational Resources Information Center
Lavallée, Maxime; McDonough, Kim
2015-01-01
Previous research has shown that high frequency lexical items, such as AWL words and formulaic expressions, may differentiate between texts written by expert and novice writers (Chen & Baker, 2010; Hancioglu, 2009), and that lexical features related to breadth, depth, and accessibility differentiate among texts from L2 writers of different…
Child-rearing in the context of childhood cancer: perspectives of parents and professionals.
Long, Kristin A; Keeley, Lauren; Reiter-Purtill, Jennifer; Vannatta, Kathryn; Gerhardt, Cynthia A; Noll, Robert B
2014-02-01
Elevated distress has been well documented among parents of children with cancer. Family systems theories suggest that cancer-related stressors and parental distress have the potential to affect child-rearing practices, but this topic has received limited empirical attention. The present work examined self-reported child-rearing practices among mothers and fathers of children with cancer and matched comparisons. Medical and psychosocial professionals with expertise in pediatric oncology selected items from the Child-Rearing Practices Report (CRPR) likely to differentiate parents of children with cancer from matched comparison parents. Then, responses on these targeted items were compared between parents of children with cancer (94 mothers, 67 fathers) and matched comparisons (98 mothers, 75 fathers). Effect sizes of between-group differences were compared for mothers versus fathers. Pediatric oncology healthcare providers predicted that 14 items would differentiate child-rearing practices of parents of children with cancer from parents of typically developing children. Differences emerged on six of the 14 CRPR items. Parents of children with cancer reported higher levels of spoiling and concern about their child's health and development than comparison parents. Items assessing overprotection and emotional responsiveness did not distinguish the two groups of parents. The effect size for the group difference between mothers in the cancer versus comparison groups was significantly greater than that for fathers on one item related to worry about the child's health. Parents of children with cancer report differences in some, but not all, domains of child-rearing, as predicted by healthcare professionals. © 2013 Wiley Periodicals, Inc.
Erschens, Rebecca; Herrmann-Werner, Anne; Keifenheim, Katharina Eva; Loda, Teresa; Bugaj, Till Johannes; Nikendei, Christoph; Lammerding-Köppel, Maria; Zipfel, Stephan; Junne, Florian
2018-01-01
Numerous studies from diverse contexts have confirmed high stress levels and stress-associated health impairment in medical students. This study aimed to explore the differential association of perceived stress with private and training-related stressors in medical students according to their stage of medical education. Participants were high-school graduates who plan to study medicine and students in their first, third, sixth, or ninth semester of medical school or in practical medical training. The self-administered questionnaire included items addressing demographic information, the Perceived Stress Questionnaire, and items addressing potential private and training-related stressors. Results confirmed a substantial burden of perceived stress in students at different stages of their medical education. In particular, 10-28% of students in their third or ninth semesters of medical school showed the highest values for perceived stress. Training-related stressors were most strongly associated with perceived stress, although specific stressors that determined perceived stress varied across different stages of students' medical education. High-school graduates highly interested in pursuing medical education showed specific stressors similar to those of medical students in their third, sixth, or ninth semesters of medical school, as well as stress structures with heights of general stress rates similar to those of medical students at the beginning of practical medical training. High-school graduates offer new, interesting information about students' fears and needs before they begin medical school. Medical students and high-school graduates need open, comprehensive information about possible stressors at the outset of and during medical education. Programmes geared toward improving resilience behaviour and teaching new, functional coping strategies are recommended.
La Porta, Fabio; Caselli, Serena; Ianes, Aladar Bruno; Cameli, Olivia; Lino, Mario; Piperno, Roberto; Sighinolfi, Antonella; Lombardi, Francesco; Tennant, Alan
2013-03-01
(1) To appraise, by the means of Rasch analysis, the internal validity and reliability of the Coma Recovery Scale-Revised (CRS-R) in a sample of patients with disorder of consciousness (DOC); and (2) to provide information about the comparability of CRS-R scores across persons with DOC across different settings and groups, including different etiologies. Multicenter observational prospective study. Two rehabilitation wards, 1 intermediate care facility, and 2 nursing homes in Italy. Consecutively admitted patients (N=129) for which assessments at 2 different time points were available, giving a total sample of 258 observations. Not applicable. CRS-R. After controlling for any possible dependency between persons' measures collected at different time points, and for uniform differential item functioning by etiology showed by the visual subscale, Rasch analysis demonstrated adequate satisfaction of all the model's requirements, including adequate ordering of scoring categories, unidimensionality, local independence, invariance (χ(2)21=27.798, P=.146), and absence of differential item functioning across patients' sex, age, time, and setting. The reliability (person separation index=.896) was adequate for individual person measurement. We devised a practical raw score to measure conversion tables based on the CRS-R calibrations. The CRS-R is a psychometrically sound and robust measurement tool. The linear measures of ability derived from the CRS-R total scores do satisfy all the principles of scientific measurement and are sufficiently reliable for high stakes assessments, such as the diagnosis of the level of consciousness in individual patients. Future studies are needed to directly explore the capabilities of the CRS-R measures to reduce the risk of vegetative state misdiagnosis. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Rasch analysis of the Edmonton Symptom Assessment System and research implications.
Cheifetz, O; Packham, T L; Macdermid, J C
2014-04-01
Reliable and valid assessment of the disease burden across all forms of cancer is critical to the evaluation of treatment effectiveness and patient progress. The Edmonton Symptom Assessment System (esas) is used for routine evaluation of people attending for cancer care. In the present study, we used Rasch analysis to explore the measurement properties of the esas and to determine the effect of using Rasch-proposed interval-level esas scoring compared with traditional scoring when evaluating the effects of an exercise program for cancer survivors. Polytomous Rasch analysis (Andrich's rating-scale model) was applied to data from 26,645 esas questionnaires completed at the Juravinski Cancer Centre. The fit of the esas to the polytomous Rasch model was investigated, including evaluations of differential item functioning for sex, age, and disease group. The research implication was investigated by comparing the results of an observational research study previously analysed using a traditional approach with the results obtained by Rasch-proposed interval-level esas scoring. The Rasch reliability index was 0.73, falling short of the desired 0.80-0.90 level. However, the esas was found to fit the Rasch model, including the criteria for uni-dimensional data. The analysis suggests that the current esas scoring system of 0-10 could be collapsed to a 6-point scale. Use of the Rasch-proposed interval-level scoring yielded results that were different from those calculated using summarized ordinal-level esas scores. Differential item functioning was not found for sex, age, or diagnosis groups. The esas is a moderately reliable uni-dimensional measure of cancer disease burden and can provide interval-level scaling with Rasch-based scoring. Further, our study indicates that, compared with the traditional scoring metric, Rasch-based scoring could result in substantive changes to conclusions.
Herrmann–Werner, Anne; Keifenheim, Katharina Eva; Loda, Teresa; Bugaj, Till Johannes; Nikendei, Christoph; Lammerding–Köppel, Maria; Zipfel, Stephan; Junne, Florian
2018-01-01
Objective Numerous studies from diverse contexts have confirmed high stress levels and stress-associated health impairment in medical students. This study aimed to explore the differential association of perceived stress with private and training-related stressors in medical students according to their stage of medical education. Methods Participants were high-school graduates who plan to study medicine and students in their first, third, sixth, or ninth semester of medical school or in practical medical training. The self-administered questionnaire included items addressing demographic information, the Perceived Stress Questionnaire, and items addressing potential private and training-related stressors. Results Results confirmed a substantial burden of perceived stress in students at different stages of their medical education. In particular, 10–28% of students in their third or ninth semesters of medical school showed the highest values for perceived stress. Training-related stressors were most strongly associated with perceived stress, although specific stressors that determined perceived stress varied across different stages of students’ medical education. High-school graduates highly interested in pursuing medical education showed specific stressors similar to those of medical students in their third, sixth, or ninth semesters of medical school, as well as stress structures with heights of general stress rates similar to those of medical students at the beginning of practical medical training. Conclusions High-school graduates offer new, interesting information about students’ fears and needs before they begin medical school. Medical students and high-school graduates need open, comprehensive information about possible stressors at the outset of and during medical education. Programmes geared toward improving resilience behaviour and teaching new, functional coping strategies are recommended. PMID:29385180
ERIC Educational Resources Information Center
Ögretmen, Tuncay
2015-01-01
The purpose of this study is to carry out differential item functioning (DIF) analysis for content areas of a reading comprehension subtest using four area indices within Item Response Theory (IRT) framework. The differences in the magnitudes of the area indices were compared based on the subject areas. The DIF analysis was carried out across…
ERIC Educational Resources Information Center
Woods, Carol M.; Cai, Li; Wang, Mian
2013-01-01
Differential item functioning (DIF) occurs when the probability of responding in a particular category to an item differs for members of different groups who are matched on the construct being measured. The identification of DIF is important for valid measurement. This research evaluates an improved version of Lord's chi [superscript 2]…
ERIC Educational Resources Information Center
Pae, Hye K.; Greenberg, Daphne; Williams, Rihana S.
2012-01-01
This study examines the Peabody Picture Vocabulary Test-IIIB (PPVT-IIIB) performance of 130 adults identified as struggling readers, in comparison to 175 third-grade children. Response patterns to the items on the PPVT-IIIB by these two groups were investigated, focusing on items, semantic categories, and lexical features, including word length,…
ERIC Educational Resources Information Center
Paek, Insu
2010-01-01
Conservative bias in rejection of a null hypothesis from using the continuity correction in the Mantel-Haenszel (MH) procedure was examined through simulation in a differential item functioning (DIF) investigation context in which statistical testing uses a prespecified level [alpha] for the decision on an item with respect to DIF. The standard MH…
ERIC Educational Resources Information Center
Samejima, Fumiko
A method is proposed that increases the accuracies of estimation of the operating characteristics of discrete item responses, especially when the true operating characteristic is represented by a steep curve, and also at the lower and upper ends of the ability distribution where the estimation tends to be inaccurate because of the smaller number…
Latent Class Analysis of Differential Item Functioning on the Peabody Picture Vocabulary Test-III
ERIC Educational Resources Information Center
Webb, Mi-young Lee; Cohen, Allan S.; Schwanenflugel, Paula J.
2008-01-01
This study investigated the use of latent class analysis for the detection of differences in item functioning on the Peabody Picture Vocabulary Test-Third Edition (PPVT-III). A two-class solution for a latent class model appeared to be defined in part by ability because Class 1 was lower in ability than Class 2 on both the PPVT-III and the…
ERIC Educational Resources Information Center
Criss, Amy H.
2006-01-01
When items on one list receive more encoding than items on another list, the improvement in performance usually manifests as an increase in the hit rate and a decrease in the false alarm rate (FAR). A common account of this strength based mirror effect is that participants adopt a more strict criterion following a strongly than weakly encoded list…
ERIC Educational Resources Information Center
Puhan, Gautam; Boughton, Keith A.; Kim, Sooyeon
2005-01-01
The study evaluated the comparability of two versions of a teacher certification test: a paper-and-pencil test (PPT) and computer-based test (CBT). Standardized mean difference (SMD) and differential item functioning (DIF) analyses were used as measures of comparability at the test and item levels, respectively. Results indicated that effect sizes…
ERIC Educational Resources Information Center
Seo, Dong Gi; Hao, Shiqi
2016-01-01
Differential item/test functioning (DIF/DTF) are routine procedures to detect item/test unfairness as an explanation for group performance difference. However, unequal sample sizes and small sample sizes have an impact on the statistical power of the DIF/DTF detection procedures. Furthermore, DIF/DTF cannot be used for two test forms without…
Silva, Soraia Micaela; Corrêa, Fernanda Ishida; Pereira, Gabriela Santos; Faria, Christina Danielli Coelho de Morais; Corrêa, João Carlos Ferrari
2018-01-01
Analyze the construct validity and internal consistency of the Stroke Specific Quality of Life (SS-QOL) items that address the participation component of the ICF as well as analyze the ceiling and floor effects. One hundred subjects were analyzed: 85 community-dwelling and 15 institutionalized individuals. The analysis of construct validity was performed using classic psychometrics: (1) the comparison of known groups (individuals without restriction to participation vs. those with restriction to participation) using the Mann-Whitney test and (2) convergent validity - correlation between the scores on the SS-QOL items that address participation and the subscale scores of measures used to evaluate the similar constructs and concepts [the Short-Form Health Survey (SF-36), Functional Independence Measure (FIM) and grip strength test]. Spearman's correlation coefficients were calculated for this analysis. Cronbach's α was used for the analysis of internal consistency and both the ceiling and floor effects were analyzed. The level of significance for all analyses was α = 0.05. The a priori hypotheses regarding construct validity were partially demonstrated, as only five of the eight domains exhibited positive moderate to strong correlations (r > 0.40) with measures that address constructs similar to those addressed on the SS-QOL questionnaire. The items demonstrated adequate internal consistency and are capable of differentiating individuals with and without restriction to participation. The ceiling and floor effects were considered adequate for the total SS-QOL score, but beyond acceptable standards for some domains. The 26 items of the SS-QOL questionnaire measure a multidimensional construct and therefore do not only address participation. However, the items demonstrated adequate internal consistency and are capable of differentiating individuals with and without restriction to participation. Implications for rehabilitation The 26 items of the SS-QOL questionnaire demonstrated adequate internal consistency and are capable of differentiating individuals with and without restriction to participation. The present findings can guide healthcare professionals regarding the selection of an assessment tool for the evaluation of post-stroke participation. The findings can lead to consistent and standardization evaluations, which facilitates comparisons and discussion on functional health and social participation after stroke.
A Primer on the 2- and 3-Parameter Item Response Theory Models.
ERIC Educational Resources Information Center
Thornton, Artist
Item response theory (IRT) is a useful and effective tool for item response measurement if used in the proper context. This paper discusses the sets of assumptions under which responses can be modeled while exploring the framework of the IRT models relative to response testing. The one parameter model, or one parameter logistic model, is perhaps…
ERIC Educational Resources Information Center
Planinic, Maja; Boone, William J.; Krsnik, Rudolf; Beilfuss, Meredith L.
2006-01-01
Croatian 1st-year and 3rd-year high-school students (N = 170) completed a conceptual physics test. Students were evaluated with regard to two physics topics: Newtonian dynamics and simple DC circuits. Students answered test items and also indicated their confidence in each answer. Rasch analysis facilitated the calculation of three linear…
NASA Astrophysics Data System (ADS)
Schultz, Madeleine; Lawrie, Gwendolyn A.; Bailey, Chantal H.; Bedford, Simon B.; Dargaville, Tim R.; O'Brien, Glennys; Tasker, Roy; Thompson, Christopher D.; Williams, Mark; Wright, Anthony H.
2017-03-01
A multi-institution collaborative team of Australian chemistry education researchers, teaching a total of over 3000 first year chemistry students annually, has explored a tool for diagnosing students' prior conceptions as they enter tertiary chemistry courses. Five core topics were selected and clusters of diagnostic items were assembled linking related concepts in each topic together. An ordered multiple choice assessment strategy was adopted to enable provision of formative feedback to students through combination of the specific distractors that they chose. Concept items were either sourced from existing research instruments or developed by the project team. The outcome is a diagnostic tool consisting of five topic clusters of five concept items that has been delivered in large introductory chemistry classes at five Australian institutions. Statistical analysis of data has enabled exploration of the composition and validity of the instrument including a comparison between delivery of the complete 25 item instrument with subsets of five items, clustered by topic. This analysis revealed that most items retained their validity when delivered in small clusters. Tensions between the assembly, validation and delivery of diagnostic instruments for the purposes of acquiring robust psychometric research data versus their pragmatic use are considered in this study.
Pollard, Beth; Dixon, Diane; Dieppe, Paul; Johnston, Marie
2009-01-01
Background The International Classification of Functioning, Disability and Health (ICF) proposes three main health outcomes, Impairment (I), Activity Limitation (A) and Participation Restriction (P), but good measures of these constructs are needed The aim of this study was to use both Classical Test Theory (CTT) and Item Response Theory (IRT) methods to carry out an item analysis to improve measurement of these three components in patients having joint replacement surgery mainly for osteoarthritis (OA). Methods A geographical cohort of patients about to undergo lower limb joint replacement was invited to participate. Five hundred and twenty four patients completed ICF items that had been previously identified as measuring only a single ICF construct in patients with osteoarthritis. There were 13 I, 26 A and 20 P items. The SF-36 was used to explore the construct validity of the resultant I, A and P measures. The CTT and IRT analyses were run separately to identify items for inclusion or exclusion in the measurement of each construct. The results from both analyses were compared and contrasted. Results Overall, the item analysis resulted in the removal of 4 I items, 9 A items and 11 P items. CTT and IRT identified the same 14 items for removal, with CTT additionally excluding 3 items, and IRT a further 7 items. In a preliminary exploration of reliability and validity, the new measures appeared acceptable. Conclusion New measures were developed that reflect the ICF components of Impairment, Activity Limitation and Participation Restriction for patients with advanced arthritis. The resulting Aberdeen IAP measures (Ab-IAP) comprising I (Ab-I, 9 items), A (Ab-A, 17 items), and P (Ab-P, 9 items) met the criteria of conventional psychometric (CTT) analyses and the additional criteria (information and discrimination) of IRT. The use of both methods was more informative than the use of only one of these methods. Thus combining CTT and IRT appears to be a valuable tool in the development of measures. PMID:19422677
2017-01-01
Background The Center for Epidemiologic Studies Depression Scale (CES-D) is a measure of depressive symptomatology which is widely used internationally. Though previous attempts were made to shorten the CES-D scale, few have attempted to develop a Computerized Adaptive Test (CAT) version for the CES-D. Objective The aim of this study was to provide evidence on the efficiency and accuracy of the CES-D when administered using CAT using an American sample group. Methods We obtained a sample of 2060 responses to the CESD-D from US participants using the myPersonality application. The average age of participants was 26 years (range 19-77). We randomly split the sample into two groups to evaluate and validate the psychometric models. We used evaluation group data (n=1018) to assess dimensionality with both confirmatory factor and Mokken analysis. We conducted further psychometric assessments using item response theory (IRT), including assessments of item and scale fit to Samejima’s graded response model (GRM), local dependency and differential item functioning. We subsequently conducted two CAT simulations to evaluate the CES-D CAT using the validation group (n=1042). Results Initial CFA results indicated a poor fit to the model and Mokken analysis revealed 3 items which did not conform to the same dimension as the rest of the items. We removed the 3 items and fit the remaining 17 items to GRM. We found no evidence of differential item functioning (DIF) between age and gender groups. Estimates of the level of CES-D trait score provided by the simulated CAT algorithm and the original CES-D trait score derived from original scale were correlated highly. The second CAT simulation conducted using real participant data demonstrated higher precision at the higher levels of depression spectrum. Conclusions Depression assessments using the CES-D CAT can be more accurate and efficient than those made using the fixed-length assessment. PMID:28931496
Hamilton, Clayon B; Chesworth, Bert M
2013-11-01
The original 20-item Upper Extremity Functional Index (UEFI) has not undergone Rasch validation. The purpose of this study was to determine whether Rasch analysis supports the UEFI as a measure of a single construct (ie, upper extremity function) and whether a Rasch-validated UEFI has adequate reproducibility for individual-level patient evaluation. This was a secondary analysis of data from a repeated-measures study designed to evaluate the measurement properties of the UEFI over a 3-week period. Patients (n=239) with musculoskeletal upper extremity disorders were recruited from 17 physical therapy clinics across 4 Canadian provinces. Rasch analysis of the UEFI measurement properties was performed. If the UEFI did not fit the Rasch model, misfitting patients were deleted, items with poor response structure were corrected, and misfitting items and redundant items were deleted. The impact of differential item functioning on the ability estimate of patients was investigated. A 15-item modified UEFI was derived to achieve fit to the Rasch model where the total score was supported as a measure of upper extremity function only. The resultant UEFI-15 interval-level scale (0-100, worst to best state) demonstrated excellent internal consistency (person separation index=0.94) and test-retest reliability (intraclass correlation coefficient [2,1]=.95). The minimal detectable change at the 90% confidence interval was 8.1. Patients who were ambidextrous or bilaterally affected were excluded to allow for the analysis of differential item functioning due to limb involvement and arm dominance. Rasch analysis did not support the validity of the 20-item UEFI. However, the UEFI-15 was a valid and reliable interval-level measure of a single dimension: upper extremity function. Rasch analysis supports using the UEFI-15 in physical therapist practice to quantify upper extremity function in patients with musculoskeletal disorders of the upper extremity.
Chesworth, Bert M.
2013-01-01
Background The original 20-item Upper Extremity Functional Index (UEFI) has not undergone Rasch validation. Objective The purpose of this study was to determine whether Rasch analysis supports the UEFI as a measure of a single construct (ie, upper extremity function) and whether a Rasch-validated UEFI has adequate reproducibility for individual-level patient evaluation. Design This was a secondary analysis of data from a repeated-measures study designed to evaluate the measurement properties of the UEFI over a 3-week period. Methods Patients (n=239) with musculoskeletal upper extremity disorders were recruited from 17 physical therapy clinics across 4 Canadian provinces. Rasch analysis of the UEFI measurement properties was performed. If the UEFI did not fit the Rasch model, misfitting patients were deleted, items with poor response structure were corrected, and misfitting items and redundant items were deleted. The impact of differential item functioning on the ability estimate of patients was investigated. Results A 15-item modified UEFI was derived to achieve fit to the Rasch model where the total score was supported as a measure of upper extremity function only. The resultant UEFI-15 interval-level scale (0–100, worst to best state) demonstrated excellent internal consistency (person separation index=0.94) and test-retest reliability (intraclass correlation coefficient [2,1]=.95). The minimal detectable change at the 90% confidence interval was 8.1. Limitations Patients who were ambidextrous or bilaterally affected were excluded to allow for the analysis of differential item functioning due to limb involvement and arm dominance. Conclusion Rasch analysis did not support the validity of the 20-item UEFI. However, the UEFI-15 was a valid and reliable interval-level measure of a single dimension: upper extremity function. Rasch analysis supports using the UEFI-15 in physical therapist practice to quantify upper extremity function in patients with musculoskeletal disorders of the upper extremity. PMID:23813086
Generalized Full-Information Item Bifactor Analysis
Cai, Li; Yang, Ji Seung; Hansen, Mark
2011-01-01
Full-information item bifactor analysis is an important statistical method in psychological and educational measurement. Current methods are limited to single group analysis and inflexible in the types of item response models supported. We propose a flexible multiple-group item bifactor analysis framework that supports a variety of multidimensional item response theory models for an arbitrary mixing of dichotomous, ordinal, and nominal items. The extended item bifactor model also enables the estimation of latent variable means and variances when data from more than one group are present. Generalized user-defined parameter restrictions are permitted within or across groups. We derive an efficient full-information maximum marginal likelihood estimator. Our estimation method achieves substantial computational savings by extending Gibbons and Hedeker’s (1992) bifactor dimension reduction method so that the optimization of the marginal log-likelihood only requires two-dimensional integration regardless of the dimensionality of the latent variables. We use simulation studies to demonstrate the flexibility and accuracy of the proposed methods. We apply the model to study cross-country differences, including differential item functioning, using data from a large international education survey on mathematics literacy. PMID:21534682