Sample records for detect differential item

  1. DIF Trees: Using Classification Trees to Detect Differential Item Functioning

    ERIC Educational Resources Information Center

    Vaughn, Brandon K.; Wang, Qiu

    2010-01-01

    A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…

  2. A Monte Carlo Study Investigating the Influence of Item Discrimination, Category Intersection Parameters, and Differential Item Functioning Patterns on the Detection of Differential Item Functioning in Polytomous Items

    ERIC Educational Resources Information Center

    Thurman, Carol

    2009-01-01

    The increased use of polytomous item formats has led assessment developers to pay greater attention to the detection of differential item functioning (DIF) in these items. DIF occurs when an item performs differently for two contrasting groups of respondents (e.g., males versus females) after controlling for differences in the abilities of the…

  3. Detection of Differential Item Functioning Using the Lasso Approach

    ERIC Educational Resources Information Center

    Magis, David; Tuerlinckx, Francis; De Boeck, Paul

    2015-01-01

    This article proposes a novel approach to detect differential item functioning (DIF) among dichotomously scored items. Unlike standard DIF methods that perform an item-by-item analysis, we propose the "LR lasso DIF method": logistic regression (LR) model is formulated for all item responses. The model contains item-specific intercepts,…

  4. A Comparison of Two Area Measures for Detecting Differential Item Functioning.

    ERIC Educational Resources Information Center

    Kim, Seock-Ho; Cohen, Allan S.

    1991-01-01

    The exact and closed-interval area measures for detecting differential item functioning are compared for actual data from 1,000 African-American and 1,000 white college students taking a vocabulary test with items intentionally constructed to favor 1 set of examinees. No real differences in detection of biased items were found. (SLD)

  5. Detection of Differential Item Functioning with Nonlinear Regression: A Non-IRT Approach Accounting for Guessing

    ERIC Educational Resources Information Center

    Drabinová, Adéla; Martinková, Patrícia

    2017-01-01

    In this article we present a general approach not relying on item response theory models (non-IRT) to detect differential item functioning (DIF) in dichotomous items with presence of guessing. The proposed nonlinear regression (NLR) procedure for DIF detection is an extension of method based on logistic regression. As a non-IRT approach, NLR can…

  6. Decisions that Make a Difference in Detecting Differential Item Functioning

    ERIC Educational Resources Information Center

    Sireci, Stephen G.; Rios, Joseph A.

    2013-01-01

    There are numerous statistical procedures for detecting items that function differently across subgroups of examinees that take a test or survey. However, in endeavouring to detect items that may function differentially, selection of the statistical method is only one of many important decisions. In this article, we discuss the important decisions…

  7. Iterative Purification and Effect Size Use with Logistic Regression for Differential Item Functioning Detection

    ERIC Educational Resources Information Center

    French, Brian F.; Maller, Susan J.

    2007-01-01

    Two unresolved implementation issues with logistic regression (LR) for differential item functioning (DIF) detection include ability purification and effect size use. Purification is suggested to control inaccuracies in DIF detection as a result of DIF items in the ability estimate. Additionally, effect size use may be beneficial in controlling…

  8. Detection of Uniform and Nonuniform Differential Item Functioning by Item-Focused Trees

    ERIC Educational Resources Information Center

    Berger, Moritz; Tutz, Gerhard

    2016-01-01

    Detection of differential item functioning (DIF) by use of the logistic modeling approach has a long tradition. One big advantage of the approach is that it can be used to investigate nonuniform (NUDIF) as well as uniform DIF (UDIF). The classical approach allows one to detect DIF by distinguishing between multiple groups. We propose an…

  9. The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning

    ERIC Educational Resources Information Center

    Finch, W. Holmes

    2011-01-01

    Missing information is a ubiquitous aspect of data analysis, including responses to items on cognitive and affective instruments. Although the broader statistical literature describes missing data methods, relatively little work has focused on this issue in the context of differential item functioning (DIF) detection. Such prior research has…

  10. Differential Item Functioning Detection Using the Multiple Indicators, Multiple Causes Method with a Pure Short Anchor

    ERIC Educational Resources Information Center

    Shih, Ching-Lin; Wang, Wen-Chung

    2009-01-01

    The multiple indicators, multiple causes (MIMIC) method with a pure short anchor was proposed to detect differential item functioning (DIF). A simulation study showed that the MIMIC method with an anchor of 1, 2, 4, or 10 DIF-free items yielded a well-controlled Type I error rate even when such tests contained as many as 40% DIF items. In general,…

  11. Examining Differential Math Performance by Gender and Opportunity to Learn

    ERIC Educational Resources Information Center

    Albano, Anthony D.; Rodriguez, Michael C.

    2013-01-01

    Although a substantial amount of research has been conducted on differential item functioning in testing, studies have focused on detecting differential item functioning rather than on explaining how or why it may occur. Some recent work has explored sources of differential functioning using explanatory and multilevel item response models. This…

  12. Effects of Anchor Item Methods on the Detection of Differential Item Functioning within the Family of Rasch Models

    ERIC Educational Resources Information Center

    Wang, Wen-Chung

    2004-01-01

    Scale indeterminacy in analysis of differential item functioning (DIF) within the framework of item response theory can be resolved by imposing 3 anchor item methods: the equal-mean-difficulty method, the all-other anchor item method, and the constant anchor item method. In this article, applicability and limitations of these 3 methods are…

  13. A Bifactor Multidimensional Item Response Theory Model for Differential Item Functioning Analysis on Testlet-Based Items

    ERIC Educational Resources Information Center

    Fukuhara, Hirotaka; Kamata, Akihito

    2011-01-01

    A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into…

  14. The MIMIC Model as a Tool for Differential Bundle Functioning Detection

    ERIC Educational Resources Information Center

    Finch, W. Holmes

    2012-01-01

    Increasingly, researchers interested in identifying potentially biased test items are encouraged to use a confirmatory, rather than exploratory, approach. One such method for confirmatory testing is rooted in differential bundle functioning (DBF), where hypotheses regarding potential differential item functioning (DIF) for sets of items (bundles)…

  15. Real and Artificial Differential Item Functioning in Polytomous Items

    ERIC Educational Resources Information Center

    Andrich, David; Hagquist, Curt

    2015-01-01

    Differential item functioning (DIF) for an item between two groups is present if, for the same person location on a variable, persons from different groups have different expected values for their responses. Applying only to dichotomously scored items in the popular Mantel-Haenszel (MH) method for detecting DIF in which persons are classified by…

  16. Item Purification Does Not Always Improve DIF Detection: A Counterexample with Angoff's Delta Plot

    ERIC Educational Resources Information Center

    Magis, David; Facon, Bruno

    2013-01-01

    Item purification is an iterative process that is often advocated as improving the identification of items affected by differential item functioning (DIF). With test-score-based DIF detection methods, item purification iteratively removes the items currently flagged as DIF from the test scores to get purified sets of items, unaffected by DIF. The…

  17. Multidimensional Extension of Multiple Indicators Multiple Causes Models to Detect DIF

    ERIC Educational Resources Information Center

    Lee, Soo; Bulut, Okan; Suh, Youngsuk

    2017-01-01

    A number of studies have found multiple indicators multiple causes (MIMIC) models to be an effective tool in detecting uniform differential item functioning (DIF) for individual items and item bundles. A recently developed MIMIC-interaction model is capable of detecting both uniform and nonuniform DIF in the unidimensional item response theory…

  18. Binary Logistic Regression Analysis for Detecting Differential Item Functioning: Effectiveness of R[superscript 2] and Delta Log Odds Ratio Effect Size Measures

    ERIC Educational Resources Information Center

    Hidalgo, Mª Dolores; Gómez-Benito, Juana; Zumbo, Bruno D.

    2014-01-01

    The authors analyze the effectiveness of the R[superscript 2] and delta log odds ratio effect size measures when using logistic regression analysis to detect differential item functioning (DIF) in dichotomous items. A simulation study was carried out, and the Type I error rate and power estimates under conditions in which only statistical testing…

  19. Detection of Gender-Based Differential Item Functioning in a Mathematics Performance Assessment.

    ERIC Educational Resources Information Center

    Wang, Ning; Lane, Suzanne

    This study used three different differential item functioning (DIF) procedures to examine the extent to which items in a mathematics performance assessment functioned differently for matched gender groups. In addition to examining the appropriateness of individual items in terms of DIF with respect to gender, an attempt was made to identify…

  20. Assessment of Differential Item Functioning in Testlet-Based Items Using the Rasch Testlet Model

    ERIC Educational Resources Information Center

    Wang, Wen-Chung; Wilson, Mark

    2005-01-01

    This study presents a procedure for detecting differential item functioning (DIF) for dichotomous and polytomous items in testlet-based tests, whereby DIF is taken into account by adding DIF parameters into the Rasch testlet model. Simulations were conducted to assess recovery of the DIF and other parameters. Two independent variables, test type…

  1. Identifying Differential Item Functioning in Multi-Stage Computer Adaptive Testing

    ERIC Educational Resources Information Center

    Gierl, Mark J.; Lai, Hollis; Li, Johnson

    2013-01-01

    The purpose of this study is to evaluate the performance of CATSIB (Computer Adaptive Testing-Simultaneous Item Bias Test) for detecting differential item functioning (DIF) when items in the matching and studied subtest are administered adaptively in the context of a realistic multi-stage adaptive test (MST). MST was simulated using a 4-item…

  2. Mixture Item Response Theory-MIMIC Model: Simultaneous Estimation of Differential Item Functioning for Manifest Groups and Latent Classes

    ERIC Educational Resources Information Center

    Bilir, Mustafa Kuzey

    2009-01-01

    This study uses a new psychometric model (mixture item response theory-MIMIC model) that simultaneously estimates differential item functioning (DIF) across manifest groups and latent classes. Current DIF detection methods investigate DIF from only one side, either across manifest groups (e.g., gender, ethnicity, etc.), or across latent classes…

  3. Utility of the Mantel-Haenszel Procedure for Detecting Differential Item Functioning in Small Samples

    ERIC Educational Resources Information Center

    Fidalgo, Angel M.; Ferreres, Doris; Muniz, Jose

    2004-01-01

    Sample-size restrictions limit the contingency table approaches based on asymptotic distributions, such as the Mantel-Haenszel (MH) procedure, for detecting differential item functioning (DIF) in many practical applications. Within this framework, the present study investigated the power and Type I error performance of empirical and inferential…

  4. The MIMIC Method with Scale Purification for Detecting Differential Item Functioning

    ERIC Educational Resources Information Center

    Wang, Wen-Chung; Shih, Ching-Lin; Yang, Chih-Chien

    2009-01-01

    This study implements a scale purification procedure onto the standard MIMIC method for differential item functioning (DIF) detection and assesses its performance through a series of simulations. It is found that the MIMIC method with scale purification (denoted as M-SP) outperforms the standard MIMIC method (denoted as M-ST) in controlling…

  5. For Which Boys and Which Girls Are Reading Assessment Items Biased Against? Detection of Differential Item Functioning in Heterogeneous Gender Populations

    ERIC Educational Resources Information Center

    Grover, Raman K.; Ercikan, Kadriye

    2017-01-01

    In gender differential item functioning (DIF) research it is assumed that all members of a gender group have similar item response patterns and therefore generalizations from group level to subgroup and individual levels can be made accurately. However DIF items do not necessarily disadvantage every member of a gender group to the same degree,…

  6. Identification of Differential Item Functioning in Multiple-Group Settings: A Multivariate Outlier Detection Approach

    ERIC Educational Resources Information Center

    Magis, David; De Boeck, Paul

    2011-01-01

    We focus on the identification of differential item functioning (DIF) when more than two groups of examinees are considered. We propose to consider items as elements of a multivariate space, where DIF items are outlying elements. Following this approach, the situation of multiple groups is a quite natural case. A robust statistics technique is…

  7. Effect of Multiple Testing Adjustment in Differential Item Functioning Detection

    ERIC Educational Resources Information Center

    Kim, Jihye; Oshima, T. C.

    2013-01-01

    In a typical differential item functioning (DIF) analysis, a significance test is conducted for each item. As a test consists of multiple items, such multiple testing may increase the possibility of making a Type I error at least once. The goal of this study was to investigate how to control a Type I error rate and power using adjustment…

  8. An Empirical Comparison of DDF Detection Methods for Understanding the Causes of DIF in Multiple-Choice Items

    ERIC Educational Resources Information Center

    Suh, Youngsuk; Talley, Anna E.

    2015-01-01

    This study compared and illustrated four differential distractor functioning (DDF) detection methods for analyzing multiple-choice items. The log-linear approach, two item response theory-model-based approaches with likelihood ratio tests, and the odds ratio approach were compared to examine the congruence among the four DDF detection methods.…

  9. Detecting DIF in Polytomous Items Using MACS, IRT and Ordinal Logistic Regression

    ERIC Educational Resources Information Center

    Elosua, Paula; Wells, Craig

    2013-01-01

    The purpose of the present study was to compare the Type I error rate and power of two model-based procedures, the mean and covariance structure model (MACS) and the item response theory (IRT), and an observed-score based procedure, ordinal logistic regression, for detecting differential item functioning (DIF) in polytomous items. A simulation…

  10. Evaluation of Two Types of Differential Item Functioning in Factor Mixture Models with Binary Outcomes

    ERIC Educational Resources Information Center

    Lee, HwaYoung; Beretvas, S. Natasha

    2014-01-01

    Conventional differential item functioning (DIF) detection methods (e.g., the Mantel-Haenszel test) can be used to detect DIF only across observed groups, such as gender or ethnicity. However, research has found that DIF is not typically fully explained by an observed variable. True sources of DIF may include unobserved, latent variables, such as…

  11. Differential Item Functioning Detection with the Mantel-Haenszel Procedure: The Effects of Matching Types and Other Factors

    ERIC Educational Resources Information Center

    Socha, Alan; DeMars, Christine E.; Zilberberg, Anna; Phan, Ha

    2015-01-01

    The Mantel-Haenszel (MH) procedure is commonly used to detect items that function differentially for groups of examinees from various demographic and linguistic backgrounds--for example, in international assessments. As in some other DIF methods, the total score is used to match examinees on ability. In thin matching, each of the total score…

  12. Impact of Missing Data on the Detection of Differential Item Functioning: The Case of Mantel-Haenszel and Logistic Regression Analysis

    ERIC Educational Resources Information Center

    Robitzsch, Alexander; Rupp, Andre A.

    2009-01-01

    This article describes the results of a simulation study to investigate the impact of missing data on the detection of differential item functioning (DIF). Specifically, it investigates how four methods for dealing with missing data (listwise deletion, zero imputation, two-way imputation, response function imputation) interact with two methods of…

  13. Differential Item Functioning Detection across Two Methods of Defining Group Comparisons: Pairwise and Composite Group Comparisons

    ERIC Educational Resources Information Center

    Sari, Halil Ibrahim; Huggins, Anne Corinne

    2015-01-01

    This study compares two methods of defining groups for the detection of differential item functioning (DIF): (a) pairwise comparisons and (b) composite group comparisons. We aim to emphasize and empirically support the notion that the choice of pairwise versus composite group definitions in DIF is a reflection of how one defines fairness in DIF…

  14. The Value of the Studied Item in the Matching Criterion in Differential Item Functioning (DIF) Analysis. Research Report. ETS RR-10-13

    ERIC Educational Resources Information Center

    Tan, Xuan; Xiang, Bihua; Dorans, Neil J.; Qu, Yanxuan

    2010-01-01

    The nature of the matching criterion (usually the total score) in the study of differential item functioning (DIF) has been shown to impact the accuracy of different DIF detection procedures. One of the topics related to the nature of the matching criterion is whether the studied item should be included. Although many studies exist that suggest…

  15. Examining Differential Item Functioning: IRT-Based Detection in the Framework of Confirmatory Factor Analysis

    ERIC Educational Resources Information Center

    Dimitrov, Dimiter M.

    2017-01-01

    This article offers an approach to examining differential item functioning (DIF) under its item response theory (IRT) treatment in the framework of confirmatory factor analysis (CFA). The approach is based on integrating IRT- and CFA-based testing of DIF and using bias-corrected bootstrap confidence intervals with a syntax code in Mplus.

  16. Stepwise Analysis of Differential Item Functioning Based on Multiple-Group Partial Credit Model.

    ERIC Educational Resources Information Center

    Muraki, Eiji

    1999-01-01

    Extended an Item Response Theory (IRT) method for detection of differential item functioning to the partial credit model and applied the method to simulated data using a stepwise procedure. Then applied the stepwise DIF analysis based on the multiple-group partial credit model to writing trend data from the National Assessment of Educational…

  17. Item response theory detects differential item functioning between healthy and ill children in QoL measures

    PubMed Central

    Langer, Michelle M.; Hill, Cheryl D.; Thissen, David; Burwinkle, Tasha M.; Varni, James W.; DeWalt, Darren A.

    2008-01-01

    Objective To demonstrate the value of item response theory (IRT) and differential item functioning (DIF) methods in examining a health-related quality of life (HRQOL) measure in children and adolescents. Study Design and Setting This illustration uses data from 5,429 children using the four subscales of the PedsQL™ 4.0 Generic Core Scales. The IRT model-based likelihood ratio test was used to detect and evaluate DIF between healthy children and children with a chronic condition. Results DIF was detected for a majority of items but cancelled out at the total test score level due to opposing directions of DIF. Post-hoc analysis indicated that this pattern of results may be due to multidimensionality. We discuss issues in detecting and handling DIF. Conclusion This paper describes how to perform DIF analyses in validating a questionnaire to ensure that scores have equivalent meaning across subgroups. It offers insight into ways information gained through the analysis can be used to evaluate an existing scale. PMID:18226750

  18. A Generalized Logistic Regression Procedure to Detect Differential Item Functioning among Multiple Groups

    ERIC Educational Resources Information Center

    Magis, David; Raiche, Gilles; Beland, Sebastien; Gerard, Paul

    2011-01-01

    We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual framework of a single focal group, we propose a general approach to estimate the item response functions in each group and to test for the presence…

  19. DIF Detection Using Multiple-Group Categorical CFA with Minimum Free Baseline Approach

    ERIC Educational Resources Information Center

    Chang, Yu-Wei; Huang, Wei-Kang; Tsai, Rung-Ching

    2015-01-01

    The aim of this study is to assess the efficiency of using the multiple-group categorical confirmatory factor analysis (MCCFA) and the robust chi-square difference test in differential item functioning (DIF) detection for polytomous items under the minimum free baseline strategy. While testing for DIF items, despite the strong assumption that all…

  20. Detecting a Gender-Related Differential Item Functioning Using Transformed Item Difficulty

    ERIC Educational Resources Information Center

    Abedalaziz, Nabeel; Leng, Chin Hai; Alahmadi, Ahlam

    2014-01-01

    The purpose of the study was to examine gender differences in performance on multiple-choice mathematical ability test, administered within the context of high school graduation test that was designed to match eleventh grade curriculum. The transformed item difficulty (TID) was used to detect a gender related DIF. A random sample of 1400 eleventh…

  1. Detecting Differential Item Discrimination (DID) and the Consequences of Ignoring DID in Multilevel Item Response Models

    ERIC Educational Resources Information Center

    Lee, Woo-yeol; Cho, Sun-Joo

    2017-01-01

    Cross-level invariance in a multilevel item response model can be investigated by testing whether the within-level item discriminations are equal to the between-level item discriminations. Testing the cross-level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model…

  2. The Matching Criterion Purification for Differential Item Functioning Analyses in a Large-Scale Assessment

    ERIC Educational Resources Information Center

    Lee, HyeSun; Geisinger, Kurt F.

    2016-01-01

    The current study investigated the impact of matching criterion purification on the accuracy of differential item functioning (DIF) detection in large-scale assessments. The three matching approaches for DIF analyses (block-level matching, pooled booklet matching, and equated pooled booklet matching) were employed with the Mantel-Haenszel…

  3. Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning

    ERIC Educational Resources Information Center

    Li, Zhushan

    2014-01-01

    Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…

  4. Fighting bias with statistics: Detecting gender differences in responses to items on a preschool science assessment

    NASA Astrophysics Data System (ADS)

    Greenberg, Ariela Caren

    Differential item functioning (DIF) and differential distractor functioning (DDF) are methods used to screen for item bias (Camilli & Shepard, 1994; Penfield, 2008). Using an applied empirical example, this mixed-methods study examined the congruency and relationship of DIF and DDF methods in screening multiple-choice items. Data for Study I were drawn from item responses of 271 female and 236 male low-income children on a preschool science assessment. Item analyses employed a common statistical approach of the Mantel-Haenszel log-odds ratio (MH-LOR) to detect DIF in dichotomously scored items (Holland & Thayer, 1988), and extended the approach to identify DDF (Penfield, 2008). Findings demonstrated that the using MH-LOR to detect DIF and DDF supported the theoretical relationship that the magnitude and form of DIF and are dependent on the DDF effects, and demonstrated the advantages of studying DIF and DDF in multiple-choice items. A total of 4 items with DIF and DDF and 5 items with only DDF were detected. Study II incorporated an item content review, an important but often overlooked and under-published step of DIF and DDF studies (Camilli & Shepard). Interviews with 25 female and 22 male low-income preschool children and an expert review helped to interpret the DIF and DDF results and their comparison, and determined that a content review process of studied items can reveal reasons for potential item bias that are often congruent with the statistical results. Patterns emerged and are discussed in detail. The quantitative and qualitative analyses were conducted in an applied framework of examining the validity of the preschool science assessment scores for evaluating science programs serving low-income children, however, the techniques can be generalized for use with measures across various disciplines of research.

  5. Detecting Gender Bias Through Test Item Analysis

    NASA Astrophysics Data System (ADS)

    González-Espada, Wilson J.

    2009-03-01

    Many physical science and physics instructors might not be trained in pedagogically appropriate test construction methods. This could lead to test items that do not measure what they are intended to measure. A subgroup of these items might show bias against some groups of students. This paper describes how the author became aware of potentially biased items against females in his examinations, which led to the exploration of fundamental issues related to item validity, gender bias, and differential item functioning, or DIF. A brief discussion of DIF in the context of university courses, as well as practical suggestions to detect possible gender-biased items, follows.

  6. A Methodology for Zumbo's Third Generation DIF Analyses and the Ecology of Item Responding

    ERIC Educational Resources Information Center

    Zumbo, Bruno D.; Liu, Yan; Wu, Amery D.; Shear, Benjamin R.; Olvera Astivia, Oscar L.; Ark, Tavinder K.

    2015-01-01

    Methods for detecting differential item functioning (DIF) and item bias are typically used in the process of item analysis when developing new measures; adapting existing measures for different populations, languages, or cultures; or more generally validating test score inferences. In 2007 in "Language Assessment Quarterly," Zumbo…

  7. Effects of Average Signed Area Between Two Item Characteristic Curves and Test Purification Procedures on the DIF Detection via the Mantel-Haenszel Method

    ERIC Educational Resources Information Center

    Wang, Wen-Chung; Su, Ya-Hui

    2004-01-01

    In this study we investigated the effects of the average signed area (ASA) between the item characteristic curves of the reference and focal groups and three test purification procedures on the uniform differential item functioning (DIF) detection via the Mantel-Haenszel (M-H) method through Monte Carlo simulations. The results showed that ASA,…

  8. Assessment of Differential Item Functioning in Health-Related Outcomes: A Simulation and Empirical Analysis with Hierarchical Polytomous Data

    PubMed Central

    Sharafi, Zahra

    2017-01-01

    Background The purpose of this study was to evaluate the effectiveness of two methods of detecting differential item functioning (DIF) in the presence of multilevel data and polytomously scored items. The assessment of DIF with multilevel data (e.g., patients nested within hospitals, hospitals nested within districts) from large-scale assessment programs has received considerable attention but very few studies evaluated the effect of hierarchical structure of data on DIF detection for polytomously scored items. Methods The ordinal logistic regression (OLR) and hierarchical ordinal logistic regression (HOLR) were utilized to assess DIF in simulated and real multilevel polytomous data. Six factors (DIF magnitude, grouping variable, intraclass correlation coefficient, number of clusters, number of participants per cluster, and item discrimination parameter) with a fully crossed design were considered in the simulation study. Furthermore, data of Pediatric Quality of Life Inventory™ (PedsQL™) 4.0 collected from 576 healthy school children were analyzed. Results Overall, results indicate that both methods performed equivalently in terms of controlling Type I error and detection power rates. Conclusions The current study showed negligible difference between OLR and HOLR in detecting DIF with polytomously scored items in a hierarchical structure. Implications and considerations while analyzing real data were also discussed. PMID:29312463

  9. Assessment of Differential Item Functioning in Health-Related Outcomes: A Simulation and Empirical Analysis with Hierarchical Polytomous Data.

    PubMed

    Sharafi, Zahra; Mousavi, Amin; Ayatollahi, Seyyed Mohammad Taghi; Jafari, Peyman

    2017-01-01

    The purpose of this study was to evaluate the effectiveness of two methods of detecting differential item functioning (DIF) in the presence of multilevel data and polytomously scored items. The assessment of DIF with multilevel data (e.g., patients nested within hospitals, hospitals nested within districts) from large-scale assessment programs has received considerable attention but very few studies evaluated the effect of hierarchical structure of data on DIF detection for polytomously scored items. The ordinal logistic regression (OLR) and hierarchical ordinal logistic regression (HOLR) were utilized to assess DIF in simulated and real multilevel polytomous data. Six factors (DIF magnitude, grouping variable, intraclass correlation coefficient, number of clusters, number of participants per cluster, and item discrimination parameter) with a fully crossed design were considered in the simulation study. Furthermore, data of Pediatric Quality of Life Inventory™ (PedsQL™) 4.0 collected from 576 healthy school children were analyzed. Overall, results indicate that both methods performed equivalently in terms of controlling Type I error and detection power rates. The current study showed negligible difference between OLR and HOLR in detecting DIF with polytomously scored items in a hierarchical structure. Implications and considerations while analyzing real data were also discussed.

  10. Comparison of IRT Likelihood Ratio Test and Logistic Regression DIF Detection Procedures

    ERIC Educational Resources Information Center

    Atar, Burcu; Kamata, Akihito

    2011-01-01

    The Type I error rates and the power of IRT likelihood ratio test and cumulative logit ordinal logistic regression procedures in detecting differential item functioning (DIF) for polytomously scored items were investigated in this Monte Carlo simulation study. For this purpose, 54 simulation conditions (combinations of 3 sample sizes, 2 sample…

  11. A Monte Carlo Study of an Iterative Wald Test Procedure for DIF Analysis

    ERIC Educational Resources Information Center

    Cao, Mengyang; Tay, Louis; Liu, Yaowu

    2017-01-01

    This study examined the performance of a proposed iterative Wald approach for detecting differential item functioning (DIF) between two groups when preknowledge of anchor items is absent. The iterative approach utilizes the Wald-2 approach to identify anchor items and then iteratively tests for DIF items with the Wald-1 approach. Monte Carlo…

  12. Using a Multidimensional IRT Framework to Better Understand Differential Item Functioning (DIF): A Tale of Three DIF Detection Procedures

    ERIC Educational Resources Information Center

    Walker, Cindy M.; Gocer Sahin, Sakine

    2017-01-01

    The theoretical reason for the presence of differential item functioning (DIF) is that data are multidimensional and two groups of examinees differ in their underlying ability distribution for the secondary dimension(s). Therefore, the purpose of this study was to determine how much the secondary ability distributions must differ before DIF is…

  13. A Study on Detecting of Differential Item Functioning of PISA 2006 Science Literacy Items in Turkish and American Samples

    ERIC Educational Resources Information Center

    Çikirikçi Demirtasli, Nükhet; Ulutas, Seher

    2015-01-01

    Problem Statement: Item bias occurs when individuals from different groups (different gender, cultural background, etc.) have different probabilities of responding correctly to a test item despite having the same skill levels. It is important that tests or items do not have bias in order to ensure the accuracy of decisions taken according to test…

  14. Different Approaches to Covariate Inclusion in the Mixture Rasch Model

    ERIC Educational Resources Information Center

    Li, Tongyun; Jiao, Hong; Macready, George B.

    2016-01-01

    The present study investigates different approaches to adding covariates and the impact in fitting mixture item response theory models. Mixture item response theory models serve as an important methodology for tackling several psychometric issues in test development, including the detection of latent differential item functioning. A Monte Carlo…

  15. Item-focussed Trees for the Identification of Items in Differential Item Functioning.

    PubMed

    Tutz, Gerhard; Berger, Moritz

    2016-09-01

    A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method.

  16. Using DIF Dissection Method to Assess Effects of Item Deletion. Research Report No. 2005-10. ETS RR-05-23

    ERIC Educational Resources Information Center

    Zhang, Yanling; Dorans, Neil J.; Matthews-López, Joy L.

    2005-01-01

    Statistical procedures for detecting differential item functioning (DIF) are often used as an initial step to screen items for construct irrelevant variance. This research applies a DIF dissection method and a two-way classification scheme to SAT Reasoning Test™ verbal section data and explores the effects of deleting sizable DIF items on reported…

  17. GMHDIF: A Computer Program for Detecting DIF in Dichotomous and Polytomous Items Using Generalized Mantel-Haenszel Statistics

    ERIC Educational Resources Information Center

    Fidalgo, Angel M.

    2011-01-01

    Mantel-Haenszel (MH) methods constitute one of the most popular nonparametric differential item functioning (DIF) detection procedures. GMHDIF has been developed to provide an easy-to-use program for conducting DIF analyses. Some of the advantages of this program are that (a) it performs two-stage DIF analyses in multiple groups simultaneously;…

  18. Analysis of Nonequivalent Assessments across Different Linguistic Groups Using a Mixed Methods Approach: Understanding the Causes of Differential Item Functioning by Cognitive Interviewing

    ERIC Educational Resources Information Center

    Benítez, Isabel; Padilla, José-Luis

    2014-01-01

    Differential item functioning (DIF) can undermine the validity of cross-lingual comparisons. While a lot of efficient statistics for detecting DIF are available, few general findings have been found to explain DIF results. The objective of the article was to study DIF sources by using a mixed method design. The design involves a quantitative phase…

  19. Scale Comparability between Nonaccommodated and Accommodated Forms of a Statewide High School Assessment: Assessment Using "l[subscript z]" Person-Fit

    ERIC Educational Resources Information Center

    Seo, Dong Gi; Hao, Shiqi

    2016-01-01

    Differential item/test functioning (DIF/DTF) are routine procedures to detect item/test unfairness as an explanation for group performance difference. However, unequal sample sizes and small sample sizes have an impact on the statistical power of the DIF/DTF detection procedures. Furthermore, DIF/DTF cannot be used for two test forms without…

  20. Psychometric Properties of the Quantitative Myasthenia Gravis Score and the Myasthenia Gravis Composite Scale.

    PubMed

    Barnett, Carolina; Merkies, Ingemar S J; Katzberg, Hans; Bril, Vera

    2015-09-02

    The Quantitative Myasthenia Gravis Score and the Myasthenia Gravis Composite are two commonly used outcome measures in Myasthenia Gravis. So far, their measurement properties have not been compared, so we aimed to study their psychometric properties using the Rasch model. 251 patients with stable myasthenia gravis were assessed with both scales, and 211 patients returned for a second assessment. We studied fit to the Rasch model at the first visit, and compared item fit, thresholds, differential item functioning, local dependence, person separation index, and tests for unidimensionality. We also assessed test-retest reliability and estimated the Minimal Detectable Change. Neither scale fit the Rasch model (X2p <  0.05). The Myasthenia Gravis Composite had lower discrimination properties than the Quantitative Myasthenia Gravis Scale (Person Separation Index: 0.14 and 0.7). There was local dependence in both scales, as well as differential item functioning for ocular and generalized disease. Disordered thresholds were found in 6(60%) items of the Myasthenia Gravis Composite and in 4(31%) of the Quantitative Myasthenia Gravis Score. Both tools had adequate test-retest reliability (ICCs >0.8). The minimally detectable change was 4.9 points for the Myasthenia Gravis Composite and 4.3 points for the Quantitative Myasthenia Gravis Score. Neither scale fulfilled Rasch model expectations. The Quantitative Myasthenia Gravis Score has higher discrimination than the Myasthenia Gravis Composite. Both tools have items with disordered thresholds, differential item functioning and local dependency. There was evidence of multidimensionality in the QMGS. The minimal detectable change values are higher than previous studies on the minimal significant change. These findings might inform future modifications of these tools.

  1. Item Discrimination and Type I Error in the Detection of Differential Item Functioning

    ERIC Educational Resources Information Center

    Li, Yanju; Brooks, Gordon P.; Johanson, George A.

    2012-01-01

    In 2009, DeMars stated that when impact exists there will be Type I error inflation, especially with larger sample sizes and larger discrimination parameters for items. One purpose of this study is to present the patterns of Type I error rates using Mantel-Haenszel (MH) and logistic regression (LR) procedures when the mean ability between the…

  2. The Use of Multiple Imputation for Missing Data in Uniform DIF Analysis: Power and Type I Error Rates

    ERIC Educational Resources Information Center

    Finch, Holmes

    2011-01-01

    Methods of uniform differential item functioning (DIF) detection have been extensively studied in the complete data case. However, less work has been done examining the performance of these methods when missing item responses are present. Research that has been done in this regard appears to indicate that treating missing item responses as…

  3. Item Parameter Invariance of the Kaufman Adolescent and Adult Intelligence Test across Male and Female Samples

    ERIC Educational Resources Information Center

    Immekus, Jason C.; Maller, Susan J.

    2009-01-01

    The Kaufman Adolescent and Adult Intelligence Test (KAIT[TM]) is an individually administered test of intelligence for individuals ranging in age from 11 to 85+ years. The item response theory-likelihood ratio procedure, based on the two-parameter logistic model, was used to detect differential item functioning (DIF) in the KAIT across males and…

  4. The Act of Answering Questions Elicited Differentiated Responses in a Concealed Information Test.

    PubMed

    Otsuka, Takuro; Mizutani, Mitsuyoshi; Yagi, Akihiro; Katayama, Jun'ichi

    2018-04-17

    The concealed information test (CIT), a psychophysiological detection of deception test, compares physiological responses between crime-related and crime-unrelated items. In previous studies, whether the act of answering questions affected physiological responses was unclear. This study examined effects of both question-related and answer-related processes on physiological responses. Twenty participants received a modified CIT, in which the interval between presentation of questions and answering them was 27 s. Differentiated respiratory movements and cardiovascular responses between items were observed for both questions (items) and answers, while differentiated skin conductance response was observed only for questions. These results suggest that physiological responses to questions reflected orientation to a crime-related item, while physiological responses during answering reflected inhibition of psychological arousal caused by orienting. Regarding the CIT's accuracy, participants' perception of the questions themselves more strongly influenced physiological responses than answering them. © 2018 American Academy of Forensic Sciences.

  5. Evaluating linguistic equivalence of patient-reported outcomes in a cancer clinical trial.

    PubMed

    Hahn, Elizabeth A; Bode, Rita K; Du, Hongyan; Cella, David

    2006-01-01

    In order to make meaningful cross-cultural or cross-linguistic comparisons of health-related quality of life (HRQL) or to pool international research data, it is essential to create unbiased measures that can detect clinically important differences. When HRQL scores differ between cultural/linguistic groups, it is important to determine whether this reflects real group differences, or is the result of systematic measurement variability. To investigate the linguistic measurement equivalence of a cancer-specific HRQL questionnaire, and to conduct a sensitivity analysis of treatment differences in HRQL in a clinical trial. Patients with newly diagnosed chronic myelogenous leukemia (n = 1049) completed serial HRQL assessments in an international Phase III trial. Two types of differential item functioning (uniform and non-uniform) were evaluated using item response theory and classical test theory approaches. A sensitivity analysis was conducted to compare HRQL between treatment arms using items without evidence of differential functioning. Among 27 items, nine (33%) did not exhibit any evidence of differential functioning in both linguistic comparisons (English versus French, English versus German). Although 18 items functioned differently, there was no evidence of systematic bias. In a sensitivity analysis, adjustment for differential functioning affected the magnitude, but not the direction or interpretation of clinical trial treatment arm differences. Sufficient sample sizes were available for only three of the eight language groups. Identification of differential functioning in two-thirds of the items suggests that current psychometric methods may be too sensitive. Enhanced methodologies are needed to differentiate trivial from substantive differential item functioning. Systematic variability in HRQL across different groups can be evaluated for its effect upon clinical trial results; a practice recommended when data are pooled across cultural or linguistic groups to make conclusions about treatment effects.

  6. A Rasch Analysis of the Junior Metacognitive Awareness Inventory with Singapore Students

    ERIC Educational Resources Information Center

    Ning, Hoi Kwan

    2018-01-01

    The psychometric properties of the 2 versions of the Junior Metacognitive Awareness Inventory were examined with Singapore student samples. Other than 2 misfitting items and an underutilized response scale, Rasch analysis demonstrated that the instruments have good measurement precision, and no differential item functioning was detected across…

  7. Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar.

    PubMed

    Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald

    2006-11-01

    We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.

  8. Complex versus Simple Modeling for DIF Detection: When the Intraclass Correlation Coefficient (?) of the Studied Item Is Less Than the ? of the Total Score

    ERIC Educational Resources Information Center

    Jin, Ying; Myers, Nicholas D.; Ahn, Soyeon

    2014-01-01

    Previous research has demonstrated that differential item functioning (DIF) methods that do not account for multilevel data structure could result in too frequent rejection of the null hypothesis (i.e., no DIF) when the intraclass correlation coefficient (?) of the studied item was the same as the ? of the total score. The current study extended…

  9. Evaluation of measurement equivalence of the Family Satisfaction with the End-of-Life Care in an ethnically diverse cohort: Tests of differential item functioning

    PubMed Central

    Teresi, Jeanne A; Ocepek-Welikson, Katja; Ramirez, Mildred; Kleinman, Marjorie; Ornstein, Katherine; Siu, Albert

    2016-01-01

    Background The Family Satisfaction with End-of-Life Care is an internationally used measure of satisfaction with cancer care. However, the Family Satisfaction with End-of-Life Care has not been studied for equivalence of item endorsement across different socio-demographic groups using differential item functioning. Aims The aims of this secondary data analysis were (1) to examine potential differential item functioning in the family satisfaction item set with respect to type of caregiver, race, and patient age, gender, and education and (2) to provide parameters and documentation of differential item functioning for an item bank. Design A mixed qualitative and quantitative analysis was conducted. A priori hypotheses regarding potential group differences in item response were established. Item response theory and Wald tests were used for the analyses of differential item functioning, accompanied by magnitude and impact measures. Results Very little significant differential item functioning was observed for patient's age and gender. For race, 13 items showed differential item functioning after multiple comparison adjustment, 10 with non-uniform differential item functioning. No items evidenced differential item functioning of high magnitude, and the impact was negligible. For education, 5 items evidenced uniform differential item functioning after adjustment, none of high magnitude. Differential item functioning impact was trivial. One item evidenced differential item functioning for the caregiver relationship variable. Conclusion Differential item functioning was observed primarily for race and education. No differential item functioning of high magnitude was observed for any item, and the overall impact of differential item functioning was negligible. One item, satisfaction with “the patient's pain relief,” might be singled out for further study, given that this item was both hypothesized and observed to show differential item functioning for race and education. PMID:25160692

  10. Detecting Differential Person Functioning in Emotional Intelligence

    ERIC Educational Resources Information Center

    Alsmadi, Yahia M.; Alsmadi, Abdalla A.

    2009-01-01

    Differential Item Functioning (DIF) is a widely used term in test development literature. It is very important to analyze test's data for DIF because It is a serious threat to validity. If the same data matrix was transposed, similar analysis can be carried for Differential Person Functioning (DPF). The purpose of this paper is to introduce and…

  11. Food Catches the Eye but Not for Everyone: A BMI–Contingent Attentional Bias in Rapid Detection of Nutriments

    PubMed Central

    Nummenmaa, Lauri; Hietanen, Jari K.; Calvo, Manuel G.; Hyönä, Jukka

    2011-01-01

    An organism's survival depends crucially on its ability to detect and acquire nutriment. Attention circuits interact with cognitive and motivational systems to facilitate detection of salient sensory events in the environment. Here we show that the human attentional system is tuned to detect food targets among nonfood items. In two visual search experiments participants searched for discrepant food targets embedded in an array of nonfood distracters or vice versa. Detection times were faster when targets were food rather than nonfood items, and the detection advantage for food items showed a significant negative correlation with Body Mass Index (BMI). Also, eye tracking during searching within arrays of visually homogenous food and nonfood targets demonstrated that the BMI-contingent attentional bias was due to rapid capturing of the eyes by food items in individuals with low BMI. However, BMI was not associated with decision times after the discrepant food item was fixated. The results suggest that visual attention is biased towards foods, and that individual differences in energy consumption - as indexed by BMI - are associated with differential attentional effects related to foods. We speculate that such differences may constitute an important risk factor for gaining weight. PMID:21603657

  12. Fairness in Computerized Testing: Detecting Item Bias Using CATSIB with Impact Present

    ERIC Educational Resources Information Center

    Chu, Man-Wai; Lai, Hollis

    2013-01-01

    In educational assessment, there is an increasing demand for tailoring assessments to individual examinees through computer adaptive tests (CAT). As such, it is particularly important to investigate the fairness of these adaptive testing processes, which require the investigation of differential item function (DIF) to yield information about item…

  13. A Comparison of Lord's Chi Square and Raju's Area Measures in Detection of DIF.

    ERIC Educational Resources Information Center

    Cohen, Allan S.; Kim, Seock-Ho

    1993-01-01

    The effectiveness of two statistical tests of the area between item response functions (exact signed area and exact unsigned area) estimated in different samples, a measure of differential item functioning (DIF), was compared with Lord's chi square. Lord's chi square was found the most effective in determining DIF. (SLD)

  14. Are cross-cultural comparisons of personality profiles meaningful? Differential item and facet functioning in the Revised NEO Personality Inventory.

    PubMed

    Church, A Timothy; Alvarez, Juan M; Mai, Nhu T Q; French, Brian F; Katigbak, Marcia S; Ortiz, Fernando A

    2011-11-01

    Measurement invariance is a prerequisite for confident cross-cultural comparisons of personality profiles. Multigroup confirmatory factor analysis was used to detect differential item functioning (DIF) in factor loadings and intercepts for the Revised NEO Personality Inventory (P. T. Costa, Jr., & R. R. McCrae, 1992) in comparisons of college students in the United States (N = 261), Philippines (N = 268), and Mexico (N = 775). About 40%-50% of the items exhibited some form of DIF and item-level noninvariance often carried forward to the facet level at which scores are compared. After excluding DIF items, some facet scales were too short or unreliable for cross-cultural comparisons, and for some other facets, cultural mean differences were reduced or eliminated. The results indicate that considerable caution is warranted in cross-cultural comparisons of personality profiles.

  15. Differential Item Functioning Detection Across Two Methods of Defining Group Comparisons

    PubMed Central

    Sari, Halil Ibrahim

    2014-01-01

    This study compares two methods of defining groups for the detection of differential item functioning (DIF): (a) pairwise comparisons and (b) composite group comparisons. We aim to emphasize and empirically support the notion that the choice of pairwise versus composite group definitions in DIF is a reflection of how one defines fairness in DIF studies. In this study, a simulation was conducted based on data from a 60-item ACT Mathematics test (ACT; Hanson & Béguin). The unsigned area measure method (Raju) was used as the DIF detection method. An application to operational data was also completed in the study, as well as a comparison of observed Type I error rates and false discovery rates across the two methods of defining groups. Results indicate that the amount of flagged DIF or interpretations about DIF in all conditions were not the same across the two methods, and there may be some benefits to using composite group approaches. The results are discussed in connection to differing definitions of fairness. Recommendations for practice are made. PMID:29795837

  16. Vegetable parenting practices scale. Item response modeling analyses

    PubMed Central

    Chen, Tzu-An; O’Connor, Teresia; Hughes, Sheryl; Beltran, Alicia; Baranowski, Janice; Diep, Cassandra; Baranowski, Tom

    2015-01-01

    Objective To evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We also tested for differences in the ways item function (called differential item functioning) across child’s gender, ethnicity, age, and household income groups. Method Parents of 3–5 year old children completed a self-reported vegetable parenting practices scale online. Vegetable parenting practices consisted of 14 effective vegetable parenting practices and 12 ineffective vegetable parenting practices items, each with three subscales (responsiveness, structure, and control). Multidimensional polytomous item response modeling was conducted separately on effective vegetable parenting practices and ineffective vegetable parenting practices. Results One effective vegetable parenting practice item did not fit the model well in the full sample or across demographic groups, and another was a misfit in differential item functioning analyses across child’s gender. Significant differential item functioning was detected across children’s age and ethnicity groups, and more among effective vegetable parenting practices than ineffective vegetable parenting practices items. Wright maps showed items only covered parts of the latent trait distribution. The harder- and easier-to-respond ends of the construct were not covered by items for effective vegetable parenting practices and ineffective vegetable parenting practices, respectively. Conclusions Several effective vegetable parenting practices and ineffective vegetable parenting practices scale items functioned differently on the basis of child’s demographic characteristics; therefore, researchers should use these vegetable parenting practices scales with caution. Item response modeling should be incorporated in analyses of parenting practice questionnaires to better assess differences across demographic characteristics. PMID:25895694

  17. Understanding Rasch Measurement: Rasch Techniques for Detecting Bias in Performance Assessments: An Example Comparing the Performance of Native and Non-native Speakers on a Test of Academic English.

    ERIC Educational Resources Information Center

    Elder, Catherine; McNamara, Tim; Congdon, Peter

    2003-01-01

    Used Rasch analytic procedures to study item bias or differential item functioning in both dichotomous and scalar items on a test of English for academic purposes. Results for 139 college students on a pilot English language test model the approach and illustrate the measurement challenges posed by a diagnostic instrument to measure English…

  18. Generalized Mantel-Haenszel Methods for Differential Item Functioning Detection

    ERIC Educational Resources Information Center

    Fidalgo, Angel M.; Madeira, Jaqueline M.

    2008-01-01

    Mantel-Haenszel methods comprise a highly flexible methodology for assessing the degree of association between two categorical variables, whether they are nominal or ordinal, while controlling for other variables. The versatility of Mantel-Haenszel analytical approaches has made them very popular in the assessment of the differential functioning…

  19. A Comparison of Uniform DIF Effect Size Estimators under the MIMIC and Rasch Models

    ERIC Educational Resources Information Center

    Jin, Ying; Myers, Nicholas D.; Ahn, Soyeon; Penfield, Randall D.

    2013-01-01

    The Rasch model, a member of a larger group of models within item response theory, is widely used in empirical studies. Detection of uniform differential item functioning (DIF) within the Rasch model typically employs null hypothesis testing with a concomitant consideration of effect size (e.g., signed area [SA]). Parametric equivalence between…

  20. Lord's Wald Test for Detecting Dif in Multidimensional Irt Models: A Comparison of Two Estimation Approaches

    ERIC Educational Resources Information Center

    Lee, Soo; Suh, Youngsuk

    2018-01-01

    Lord's Wald test for differential item functioning (DIF) has not been studied extensively in the context of the multidimensional item response theory (MIRT) framework. In this article, Lord's Wald test was implemented using two estimation approaches, marginal maximum likelihood estimation and Bayesian Markov chain Monte Carlo estimation, to detect…

  1. Strategies for Testing Statistical and Practical Significance in Detecting DIF with Logistic Regression Models

    ERIC Educational Resources Information Center

    Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza

    2014-01-01

    This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…

  2. Use of multilevel logistic regression to identify the causes of differential item functioning.

    PubMed

    Balluerka, Nekane; Gorostiaga, Arantxa; Gómez-Benito, Juana; Hidalgo, María Dolores

    2010-11-01

    Given that a key function of tests is to serve as evaluation instruments and for decision making in the fields of psychology and education, the possibility that some of their items may show differential behaviour is a major concern for psychometricians. In recent decades, important progress has been made as regards the efficacy of techniques designed to detect this differential item functioning (DIF). However, the findings are scant when it comes to explaining its causes. The present study addresses this problem from the perspective of multilevel analysis. Starting from a case study in the area of transcultural comparisons, multilevel logistic regression is used: 1) to identify the item characteristics associated with the presence of DIF; 2) to estimate the proportion of variation in the DIF coefficients that is explained by these characteristics; and 3) to evaluate alternative explanations of the DIF by comparing the explanatory power or fit of different sequential models. The comparison of these models confirmed one of the two alternatives (familiarity with the stimulus) and rejected the other (the topic area) as being a cause of differential functioning with respect to the compared groups.

  3. Odds Ratio, Delta, ETS Classification, and Standardization Measures of DIF Magnitude for Binary Logistic Regression

    ERIC Educational Resources Information Center

    Monahan, Patrick O.; McHorney, Colleen A.; Stump, Timothy E.; Perkins, Anthony J.

    2007-01-01

    Previous methodological and applied studies that used binary logistic regression (LR) for detection of differential item functioning (DIF) in dichotomously scored items either did not report an effect size or did not employ several useful measures of DIF magnitude derived from the LR model. Equations are provided for these effect size indices.…

  4. Identification of measurement differences between English and Spanish language versions of the Mini-Mental State Examination. Detecting differential item functioning using MIMIC modeling.

    PubMed

    Jones, Richard N

    2006-11-01

    Knowledge of the extent to which measurement of adult cognitive functioning differs between Spanish and English language administrations of the Mini-Mental State Examination (MMSE) is critical for inclusive, representative, and valid research of older adults in the United States. We sought to demonstrate the use of an item response theory (IRT) based structural equation model, that is, the MIMIC model (multiple indicators, multiple causes), to evaluate MMSE responses for evidence of differential item functioning (DIF) attributable to language of administration. We studied participants in a dementia case registry study (n = 1546), 42% of whom were examined with the Spanish language MMSE. Twelve of 21 items were identified as having significant uniform DIF. The 4 most discrepant included orientation to season, orientation to state, repeat phrase, and follow command. DIF accounted for two-thirds of the observed difference in underlying level of cognitive functioning between Spanish- and English-language administration groups. Failing to account for measurement differences may lead to spurious inferences regarding language group differences in level of underlying level of cognitive functioning. The MIMIC model can be used to detect and adjust for such measurement differences in substantive research.

  5. Effects of Linking Methods on Detection of DIF.

    ERIC Educational Resources Information Center

    Kim, Seock-Ho; Cohen, Allan S.

    1992-01-01

    Effects of the following methods for linking metrics on detection of differential item functioning (DIF) were compared: (1) test characteristic curve method (TCC); (2) weighted mean and sigma method; and (3) minimum chi-square method. With large samples, results were essentially the same. With small samples, TCC was most accurate. (SLD)

  6. Unexpected Direction of Differential Item Functioning

    ERIC Educational Resources Information Center

    Park, Sangwook

    2011-01-01

    Many studies have been conducted to evaluate the performance of DIF detection methods, when two groups have different ability distributions. Such studies typically have demonstrated factors that are associated with inflation of Type I error rates in DIF detection, such as mean ability differences. However, no study has examined how the direction…

  7. Gender-, age-, and race/ethnicity-based differential item functioning analysis of the movement disorder society-sponsored revision of the Unified Parkinson's disease rating scale.

    PubMed

    Goetz, Christopher G; Liu, Yuanyuan; Stebbins, Glenn T; Wang, Lu; Tilley, Barbara C; Teresi, Jeanne A; Merkitch, Douglas; Luo, Sheng

    2016-12-01

    Assess MDS-UPDRS items for gender-, age-, and race/ethnicity-based differential item functioning. Assessing differential item functioning is a core rating scale validation step. For the MDS-UPDRS, differential item functioning occurs if item-score probability among people with similar levels of parkinsonism differ according to selected covariates (gender, age, race/ethnicity). If the magnitude of differential item functioning is clinically relevant, item-score interpretation must consider influences by these covariates. Differential item functioning can be nonuniform (covariate variably influences an item-score across different levels of parkinsonism) or uniform (covariate influences an item-score consistently over all levels of parkinsonism). Using the MDS-UPDRS translation database of more than 5,000 PD patients from 14 languages, we tested gender-, age-, and race/ethnicity-based differential item functioning. To designate an item as having clinically relevant differential item functioning, we required statistical confirmation by 2 independent methods, along with a McFadden pseudo-R 2 magnitude statistic greater than "negligible." Most items showed no gender-, age- or race/ethnicity-based differential item functioning. When differential item functioning was identified, the magnitude statistic was always in the "negligible" range, and the scale-level impact was minimal. The absence of clinically relevant differential item functioning across all items and all parts of the MDS-UPDRS is strong evidence that the scale can be used confidently. As studies of Parkinson's disease increasingly involve multinational efforts and the MDS-UPDRS has several validated non-English translations, the findings support the scale's broad applicability in populations with varying gender, age, and race/ethnicity distributions. © 2016 International Parkinson and Movement Disorder Society. © 2016 International Parkinson and Movement Disorder Society.

  8. Latent Class Analysis of Differential Item Functioning on the Peabody Picture Vocabulary Test-III

    ERIC Educational Resources Information Center

    Webb, Mi-young Lee; Cohen, Allan S.; Schwanenflugel, Paula J.

    2008-01-01

    This study investigated the use of latent class analysis for the detection of differences in item functioning on the Peabody Picture Vocabulary Test-Third Edition (PPVT-III). A two-class solution for a latent class model appeared to be defined in part by ability because Class 1 was lower in ability than Class 2 on both the PPVT-III and the…

  9. Measurement properties of painDETECT: Rasch analysis of responses from community-dwelling adults with neuropathic pain.

    PubMed

    Packham, Tara L; Cappelleri, Joseph C; Sadosky, Alesia; MacDermid, Joy C; Brunner, Florian

    2017-03-04

    painDETECT (PD-Q) is a self-reported assessment of pain qualities developed as a screening tool for pain of neuropathic origin. Rasch analysis is a strategy for examining the measurement characteristics of a scale using a form of item response theory. We conducted a Rasch analysis to consider if the scoring and measurement properties of PD-Q would support its use as an outcome measure. Rasch analysis was conducted on PD-Q scores drawn from a cross-sectional study of the burden and costs of NeP. The analysis followed an iterative process based on recommendations in the literature, including examination of sequential scoring categories, unidimensionality, reliability and differential item function. Data from 624 persons with a diagnosis of painful diabetic polyneuropathy, small fibre neuropathy, and neuropathic pain associated with chronic low back pain, spinal cord injury, HIV-related pain, or chronic post-surgical pain was used for this analysis. PD-Q demonstrated fit to the Rasch model after adjustments of scoring categories for four items, and omission of the time course and radiating questions. The resulting seven-item scale of pain qualities demonstrated good reliability with a person-separation index of 0.79. No scoring bias (differential item functioning) was found for this version. Rasch modelling suggests the seven pain-qualities items from PD-Q may be used as an outcome measure. Further research is required to confirm validity and responsiveness in a clinical setting.

  10. Transforming SIBTEST to Account for Multilevel Data Structures

    ERIC Educational Resources Information Center

    French, Brian F.; Finch, W. Holmes

    2015-01-01

    SIBTEST is a differential item functioning (DIF) detection method that is accurate and effective with small samples, in the presence of group mean differences, and for assessment of both uniform and nonuniform DIF. The presence of multilevel data with DIF detection has received increased attention. Ignoring such structure can inflate Type I error.…

  11. Differential Item Functioning Analysis Using a Mixture 3-Parameter Logistic Model with a Covariate on the TIMSS 2007 Mathematics Test

    ERIC Educational Resources Information Center

    Choi, Youn-Jeng; Alexeev, Natalia; Cohen, Allan S.

    2015-01-01

    The purpose of this study was to explore what may be contributing to differences in performance in mathematics on the Trends in International Mathematics and Science Study 2007. This was done by using a mixture item response theory modeling approach to first detect latent classes in the data and then to examine differences in performance on items…

  12. Two Simple Approaches to Overcome a Problem with the Mantel-Haenszel Statistic: Comments on Wang, Bradlow, Wainer, and Muller (2008)

    ERIC Educational Resources Information Center

    Sinharay, Sandip; Dorans, Neil J.

    2010-01-01

    The Mantel-Haenszel (MH) procedure (Mantel and Haenszel) is a popular method for estimating and testing a common two-factor association parameter in a 2 x 2 x K table. Holland and Holland and Thayer described how to use the procedure to detect differential item functioning (DIF) for tests with dichotomously scored items. Wang, Bradlow, Wainer, and…

  13. A Concealed Information Test with multimodal measurement.

    PubMed

    Ambach, Wolfgang; Bursch, Stephanie; Stark, Rudolf; Vaitl, Dieter

    2010-03-01

    A Concealed Information Test (CIT) investigates differential physiological responses to deed-related (probe) vs. irrelevant items. The present study focused on the detection of concealed information using simultaneous recordings of autonomic and brain electrical measures. As a secondary issue, verbal and pictorial presentations were compared with respect to their influence on the recorded measures. Thirty-one participants underwent a mock-crime scenario with a combined verbal and pictorial presentation of nine items. The subsequent CIT, designed with respect to event-related potential (ERP) measurement, used a 3-3.5s interstimulus interval. The item presentation modality, i.e. pictures or written words, was varied between subjects; no response was required from the participants. In addition to electroencephalogram (EEG), electrodermal activity (EDA), electrocardiogram (ECG), respiratory activity, and finger plethysmogram were recorded. A significant probe-vs.-irrelevant effect was found for each of the measures. Compared to sole ERP measurement, the combination of ERP and EDA yielded incremental information for detecting concealed information. Although, EDA per se did not reach the predictive value known from studies primarily designed for peripheral physiological measurement. Presentation modality neither influenced the detection accuracy for autonomic measures nor EEG measures; this underpins the equivalence of verbal and pictorial item presentation in a CIT, regardless of the physiological measures recorded. Future studies should further clarify whether the incremental validity observed in the present study reflects a differential sensitivity of ERP and EDA to different sub-processes in a CIT. Copyright 2009 Elsevier B.V. All rights reserved.

  14. An Effect Size Measure for Raju's Differential Functioning for Items and Tests

    ERIC Educational Resources Information Center

    Wright, Keith D.; Oshima, T. C.

    2015-01-01

    This study established an effect size measure for differential functioning for items and tests' noncompensatory differential item functioning (NCDIF). The Mantel-Haenszel parameter served as the benchmark for developing NCDIF's effect size measure for reporting moderate and large differential item functioning in test items. The effect size of…

  15. An Odds Ratio Approach for Detecting DDF under the Nested Logit Modeling Framework

    ERIC Educational Resources Information Center

    Terzi, Ragip; Suh, Youngsuk

    2015-01-01

    An odds ratio approach (ORA) under the framework of a nested logit model was proposed for evaluating differential distractor functioning (DDF) in multiple-choice items and was compared with an existing ORA developed under the nominal response model. The performances of the two ORAs for detecting DDF were investigated through an extensive…

  16. Assessing whether parents and children perceive the meaning of the items in the PedsQLTM 4.0 quality of life instrument consistently: a differential item functioning analysis.

    PubMed

    Jafari, Peyman; Bagheri, Zahra; Hashemi, Seyyedeh Zahra; Shalileh, Keivan

    2013-06-06

    Limited studies have examined the effect of differential item functioning (DIF) on comparing health related quality of life (HRQoL) scores across child self-reports and parent proxy-reports. This study aims to determine whether parents and children respond differently to the items in the Persian version of the PedsQoLTM 4.0 measure. The PedsQLTM 4.0 Generic Core Scales was completed by 938 child-parent dyads. The graded response model (GRM) was used to detect DIF between parents and children. The IRT analyses were conducted using IRTPRO 2.1.On the whole, our findings showed that 50% (4 out of 8) of the items in the physical subscale and 40% (2 out of 5) in both emotional and school subscales were flagged with DIF. Among the DIF items, 62.5% (5 out of 8) were uniform and the remaining 37.5% (3 out of 8) were non-uniform. Parents and children interpret certain items of the PedsQLTM 4.0 in a different ways, except for the social subscale. Hence, we should be cautious about using parent proxy-report as a substitute for a child's ratings.

  17. Sensitivity and specificity of a briefer version of the Cambridge Cognitive Examination (CAMCog-Short) in the detection of cognitive decline in the elderly: An exploratory study.

    PubMed

    Radanovic, Marcia; Facco, Giuliana; Forlenza, Orestes V

    2018-05-01

    To create a reduced and briefer version of the widely used Cambridge Cognitive Examination (CAMCog) battery as a concise cognitive test to be used in primary and secondary levels of health care to detect cognitive decline. Our aim was to reduce the administration time of the original test while maintaining its diagnostic accuracy. On the basis of the analysis of 835 CAMCog tests performed by 429 subjects (107 controls, 192 mild cognitive impairment [MCI], and 130 dementia patients), we extracted items that most contributed to intergroup differentiation, according to 2 educational levels (≤8 and >8 y of formal schooling). The final 33-item "low education" and 24-item"high education" CAMCog-Short correspond to 48.5% and 35% of the original version and yielded similar rates of accuracy: area under ROC curves (AUC) > 0.9 in the differentiation between controls × dementia and MCI × dementia (sensitivities > 75%; specificities > 90%); AUC > 0.7 for the differentiation between controls and MCI (sensitivities > 65%; specificities > 75%). The CAMCog-Short emerges as a promising tool for a brief, yet sufficiently accurate, screening tool for use in clinical settings. Further prospective studies designed to validate its diagnostic accuracy are needed. Copyright © 2018 John Wiley & Sons, Ltd.

  18. A signal detection-item response theory model for evaluating neuropsychological measures.

    PubMed

    Thomas, Michael L; Brown, Gregory G; Gur, Ruben C; Moore, Tyler M; Patt, Virginie M; Risbrough, Victoria B; Baker, Dewleen G

    2018-02-05

    Models from signal detection theory are commonly used to score neuropsychological test data, especially tests of recognition memory. Here we show that certain item response theory models can be formulated as signal detection theory models, thus linking two complementary but distinct methodologies. We then use the approach to evaluate the validity (construct representation) of commonly used research measures, demonstrate the impact of conditional error on neuropsychological outcomes, and evaluate measurement bias. Signal detection-item response theory (SD-IRT) models were fitted to recognition memory data for words, faces, and objects. The sample consisted of U.S. Infantry Marines and Navy Corpsmen participating in the Marine Resiliency Study. Data comprised item responses to the Penn Face Memory Test (PFMT; N = 1,338), Penn Word Memory Test (PWMT; N = 1,331), and Visual Object Learning Test (VOLT; N = 1,249), and self-report of past head injury with loss of consciousness. SD-IRT models adequately fitted recognition memory item data across all modalities. Error varied systematically with ability estimates, and distributions of residuals from the regression of memory discrimination onto self-report of past head injury were positively skewed towards regions of larger measurement error. Analyses of differential item functioning revealed little evidence of systematic bias by level of education. SD-IRT models benefit from the measurement rigor of item response theory-which permits the modeling of item difficulty and examinee ability-and from signal detection theory-which provides an interpretive framework encompassing the experimentally validated constructs of memory discrimination and response bias. We used this approach to validate the construct representation of commonly used research measures and to demonstrate how nonoptimized item parameters can lead to erroneous conclusions when interpreting neuropsychological test data. Future work might include the development of computerized adaptive tests and integration with mixture and random-effects models.

  19. 17 CFR 260.7a-16 - Inclusion of items, differentiation between items and answers, omission of instructions.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 17 Commodity and Securities Exchanges 3 2012-04-01 2012-04-01 false Inclusion of items, differentiation between items and answers, omission of instructions. 260.7a-16 Section 260.7a-16 Commodity and... INDENTURE ACT OF 1939 Formal Requirements § 260.7a-16 Inclusion of items, differentiation between items and...

  20. 17 CFR 260.7a-16 - Inclusion of items, differentiation between items and answers, omission of instructions.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 17 Commodity and Securities Exchanges 4 2014-04-01 2014-04-01 false Inclusion of items, differentiation between items and answers, omission of instructions. 260.7a-16 Section 260.7a-16 Commodity and... INDENTURE ACT OF 1939 Formal Requirements § 260.7a-16 Inclusion of items, differentiation between items and...

  1. 17 CFR 260.7a-16 - Inclusion of items, differentiation between items and answers, omission of instructions.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 17 Commodity and Securities Exchanges 3 2013-04-01 2013-04-01 false Inclusion of items, differentiation between items and answers, omission of instructions. 260.7a-16 Section 260.7a-16 Commodity and... INDENTURE ACT OF 1939 Formal Requirements § 260.7a-16 Inclusion of items, differentiation between items and...

  2. 17 CFR 260.7a-16 - Inclusion of items, differentiation between items and answers, omission of instructions.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 17 Commodity and Securities Exchanges 3 2010-04-01 2010-04-01 false Inclusion of items, differentiation between items and answers, omission of instructions. 260.7a-16 Section 260.7a-16 Commodity and... INDENTURE ACT OF 1939 Formal Requirements § 260.7a-16 Inclusion of items, differentiation between items and...

  3. 17 CFR 260.7a-16 - Inclusion of items, differentiation between items and answers, omission of instructions.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 17 Commodity and Securities Exchanges 3 2011-04-01 2011-04-01 false Inclusion of items, differentiation between items and answers, omission of instructions. 260.7a-16 Section 260.7a-16 Commodity and... INDENTURE ACT OF 1939 Formal Requirements § 260.7a-16 Inclusion of items, differentiation between items and...

  4. Examining the validity and reliability of the Taita symptom checklist using Rasch analysis.

    PubMed

    Chen, Yun-Ling; Pan, Ay-Woan; Chung, LyInn; Chen, Tsyr-Jang

    2015-03-01

    The Taita symptom checklist (TSCL) is a standardized self-rating psychiatric symptom scale for outpatients with mental illness in Taiwan. This study aimed to examine the validity and reliability of the TSCL using Rasch analysis. The TSCL was given to 583 healthy people and 479 people with mental illness. Rasch analysis was used to examine the appropriateness of the rating scale, the unidimensionality of the scale, the differential item functioning across sex and diagnosis, and the Rasch cut-off score of the scale. Rasch analysis confirmed that the revised 37 items with a three-point rating scale of the TSCL demonstrated good internal consistency and met criteria for unidimensionality. The person and item reliability indices were high. The TSCL could reliably measure healthy participants and patients with mental illness. Differential item functioning due to sex or psychiatric diagnosis was evident for three items. A Rasch cut-off score for TSCL was produced for detecting participants' psychiatric symptoms based on an eight-level classification. The TSCL is a reliable and valid assessment to evaluate the participants' perceived disturbance of psychiatric symptoms based on Rasch analysis. Copyright © 2013. Published by Elsevier B.V.

  5. A cross-cultural study to assess measurement invariance of the KIDSCREEN-27 questionnaire across Serbian and Iranian children and adolescents.

    PubMed

    Stevanovic, Dejan; Jafari, Peyman

    2015-01-01

    The KIDSCREEN questionnaire for health-related quality of life (HRQOL) assessments in children and adolescents was simultaneously developed across 13 European countries, and it was subsequently translated and culturally adapted to over 30 different languages across the world. The aim of this study was to evaluate the measurement equivalence of the KIDSCREEN-27 across Serbian and Iranian children and adolescents. The items in the KIDSCREEN-27 were analyzed for differential item functioning (DIF) across Iranian and Serbian populations using ordinal logistic regression with three different criteria. The sample included 330 Iranian and 329 Serbian children and adolescents and 330 and 314 of their parents, respectively. Across the two samples, DIF was detected in 16 (59 %) of 27 items in the child self-reports and in 20 (74 %) of 27 items in the parent/proxy report. However, using alternative criteria based on magnitude detected for DIF, only three items in the parent/proxy report showed significant DIF. Our study provided more evidence that the KIDSCREEN-27 possesses DIF items across different cultures, but their impact is probably small, and the questionnaire could be used for cross-cultural HRQOL comparisons.

  6. Item response modeling: a psychometric assessment of the children's fruit, vegetable, water, and physical activity self-efficacy scales among Chinese children.

    PubMed

    Wang, Jing-Jing; Chen, Tzu-An; Baranowski, Tom; Lau, Patrick W C

    2017-09-16

    This study aimed to evaluate the psychometric properties of four self-efficacy scales (i.e., self-efficacy for fruit (FSE), vegetable (VSE), and water (WSE) intakes, and physical activity (PASE)) and to investigate their differences in item functioning across sex, age, and body weight status groups using item response modeling (IRM) and differential item functioning (DIF). Four self-efficacy scales were administrated to 763 Hong Kong Chinese children (55.2% boys) aged 8-13 years. Classical test theory (CTT) was used to examine the reliability and factorial validity of scales. IRM was conducted and DIF analyses were performed to assess the characteristics of item parameter estimates on the basis of children's sex, age and body weight status. All self-efficacy scales demonstrated adequate to excellent internal consistency reliability (Cronbach's α: 0.79-0.91). One FSE misfit item and one PASE misfit item were detected. Small DIF were found for all the scale items across children's age groups. Items with medium to large DIF were detected in different sex and body weight status groups, which will require modification. A Wright map revealed that items covered the range of the distribution of participants' self-efficacy for each scale except VSE. Several self-efficacy scales' items functioned differently by children's sex and body weight status. Additional research is required to modify the four self-efficacy scales to minimize these moderating influences for application.

  7. A Comparison of the Logistic Regression and Contingency Table Methods for Simultaneous Detection of Uniform and Nonuniform DIF

    ERIC Educational Resources Information Center

    Guler, Nese; Penfield, Randall D.

    2009-01-01

    In this study, we investigate the logistic regression (LR), Mantel-Haenszel (MH), and Breslow-Day (BD) procedures for the simultaneous detection of both uniform and nonuniform differential item functioning (DIF). A simulation study was used to assess and compare the Type I error rate and power of a combined decision rule (CDR), which assesses DIF…

  8. Do people with and without medical conditions respond similarly to the short health anxiety inventory? An assessment of differential item functioning using item response theory.

    PubMed

    LeBouthillier, Daniel M; Thibodeau, Michel A; Alberts, Nicole M; Hadjistavropoulos, Heather D; Asmundson, Gordon J G

    2015-04-01

    Individuals with medical conditions are likely to have elevated health anxiety; however, research has not demonstrated how medical status impacts response patterns on health anxiety measures. Measurement bias can undermine the validity of a questionnaire by overestimating or underestimating scores in groups of individuals. We investigated whether the Short Health Anxiety Inventory (SHAI), a widely-used measure of health anxiety, exhibits medical condition-based bias on item and subscale levels, and whether the SHAI subscales adequately assess the health anxiety continuum. Data were from 963 individuals with diabetes, breast cancer, or multiple sclerosis, and 372 healthy individuals. Mantel-Haenszel tests and item characteristic curves were used to classify the severity of item-level differential item functioning in all three medical groups compared to the healthy group. Test characteristic curves were used to assess scale-level differential item functioning and whether the SHAI subscales adequately assess the health anxiety continuum. Nine out of 14 items exhibited differential item functioning. Two items exhibited differential item functioning in all medical groups compared to the healthy group. In both Thought Intrusion and Fear of Illness subscales, differential item functioning was associated with mildly deflated scores in medical groups with very high levels of the latent traits. Fear of Illness items poorly discriminated between individuals with low and very low levels of the latent trait. While individuals with medical conditions may respond differentially to some items, clinicians and researchers can confidently use the SHAI with a variety of medical populations without concern of significant bias. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. Gender Differential Item Functioning on a National Field-Specific Test: The Case of PhD Entrance Exam of TEFL in Iran

    ERIC Educational Resources Information Center

    Ahmadi, Alireza; Bazvand, Ali Darabi

    2016-01-01

    Differential Item Functioning (DIF) exists when examinees of equal ability from different groups have different probabilities of successful performance in a certain item. This study examined gender differential item functioning across the PhD Entrance Exam of TEFL (PEET) in Iran, using both logistic regression (LR) and one-parameter item response…

  10. Measurement equivalence of the KINDL questionnaire across child self-reports and parent proxy-reports: a comparison between item response theory and ordinal logistic regression.

    PubMed

    Jafari, Peyman; Sharafi, Zahra; Bagheri, Zahra; Shalileh, Sara

    2014-06-01

    Measurement equivalence is a necessary assumption for meaningful comparison of pediatric quality of life rated by children and parents. In this study, differential item functioning (DIF) analysis is used to examine whether children and their parents respond consistently to the items in the KINDer Lebensqualitätsfragebogen (KINDL; in German, Children Quality of Life Questionnaire). Two DIF detection methods, graded response model (GRM) and ordinal logistic regression (OLR), were applied for comparability. The KINDL was completed by 1,086 school children and 1,061 of their parents. While the GRM revealed that 12 out of the 24 items were flagged with DIF, the OLR identified 14 out of the 24 items with DIF. Seven items with DIF and five items without DIF were common across the two methods, yielding a total agreement rate of 50 %. This study revealed that parent proxy-reports cannot be used as a substitute for a child's ratings in the KINDL.

  11. Differential Item Functioning of the Boston Naming Test in Cognitively Normal African American and Caucasian Older Adults

    PubMed Central

    Pedraza, Otto; Graff-Radford, Neill R.; Smith, Glenn E.; Ivnik, Robert J.; Willis, Floyd B.; Petersen, Ronald C.; Lucas, John A.

    2010-01-01

    Scores on the Boston Naming Test (BNT) are frequently lower for African American when compared to Caucasian adults. Although demographically-based norms can mitigate the impact of this discrepancy on the likelihood of erroneous diagnostic impressions, a growing consensus suggests that group norms do not sufficiently address or advance our understanding of the underlying psychometric and sociocultural factors that lead to between-group score discrepancies. Using item response theory and methods to detect differential item functioning (DIF), the current investigation moves beyond comparisons of the summed total score to examine whether the conditional probability of responding correctly to individual BNT items differs between African American and Caucasian adults. Participants included 670 adults age 52 and older who took part in Mayo's Older Americans and Older African Americans Normative Studies. Under a 2-parameter logistic IRT framework and after correction for the false discovery rate, 12 items where shown to demonstrate DIF. Six of these 12 items (“dominoes,” “escalator,” “muzzle,” “latch,” “tripod,” and “palette”) were also identified in additional analyses using hierarchical logistic regression models and represent the strongest evidence for race/ethnicity-based DIF. These findings afford a finer characterization of the psychometric properties of the BNT and expand our understanding of between-group performance. PMID:19570311

  12. The Egocentric Reference for Visual Exploration and Orientation

    ERIC Educational Resources Information Center

    Nico, Daniele; Daprati, Elena

    2009-01-01

    Clinical signs of damage to the egocentric reference system range from the inability to detect stimuli in the real environment to a defect in recovering items from an internal representation. Despite clinical dissociations, current interpretations consider all symptoms as due to a single perturbation, differentially expressed according to the…

  13. Measuring pregnancy planning: An assessment of the London Measure of Unplanned Pregnancy among urban, south Indian women

    PubMed Central

    Rocca, Corinne H.; Krishnan, Suneeta; Barrett, Geraldine; Wilson, Mark

    2010-01-01

    We evaluated the psychometric properties of the London Measure of Unplanned Pregnancy among Indian women using classical methods and Item Response Modeling. The scale exhibited good internal consistency and internal structure, with overall scores correlating well with each item’s response categories. Items performed similarly for pregnant and non-pregnant women, and scores decreased with increasing parity, providing evidence for validity. Analyses also detected limitations, including infrequent selection of middle response categories and some evidence of differential item functioning by parity. We conclude that the LMUP represents an improvement over existing measures but recommend steps for enhancing scale performance for this cultural context. PMID:21170147

  14. Examining Differential Item Functions of Different Item Ordered Test Forms According to Item Difficulty Levels

    ERIC Educational Resources Information Center

    Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem

    2016-01-01

    The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…

  15. Using a Taxonomy of Differential Step Functioning to Improve the Interpretation of DIF in Polytomous Items: An Illustration

    ERIC Educational Resources Information Center

    Penfield, Randall D.; Alvarez, Karina; Lee, Okhee

    2009-01-01

    The assessment of differential item functioning (DIF) in polytomous items addresses between-group differences in measurement properties at the item level, but typically does not inform which score levels may be involved in the DIF effect. The framework of differential step functioning (DSF) addresses this issue by examining between-group…

  16. Using Differential Item Functioning Procedures to Explore Sources of Item Difficulty and Group Performance Characteristics.

    ERIC Educational Resources Information Center

    Scheuneman, Janice Dowd; Gerritz, Kalle

    1990-01-01

    Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)

  17. Assessing the Item Response Theory with Covariate (IRT-C) Procedure for Ascertaining Differential Item Functioning

    ERIC Educational Resources Information Center

    Tay, Louis; Vermunt, Jeroen K.; Wang, Chun

    2013-01-01

    We evaluate the item response theory with covariates (IRT-C) procedure for assessing differential item functioning (DIF) without preknowledge of anchor items (Tay, Newman, & Vermunt, 2011). This procedure begins with a fully constrained baseline model, and candidate items are tested for uniform and/or nonuniform DIF using the Wald statistic.…

  18. Item Purification in Differential Item Functioning Using Generalized Linear Mixed Models

    ERIC Educational Resources Information Center

    Liu, Qian

    2011-01-01

    For this dissertation, four item purification procedures were implemented onto the generalized linear mixed model for differential item functioning (DIF) analysis, and the performance of these item purification procedures was investigated through a series of simulations. Among the four procedures, forward and generalized linear mixed model (GLMM)…

  19. A Note on Three Statistical Tests in the Logistic Regression DIF Procedure

    ERIC Educational Resources Information Center

    Paek, Insu

    2012-01-01

    Although logistic regression became one of the well-known methods in detecting differential item functioning (DIF), its three statistical tests, the Wald, likelihood ratio (LR), and score tests, which are readily available under the maximum likelihood, do not seem to be consistently distinguished in DIF literature. This paper provides a clarifying…

  20. Ordinal Logistic Regression to Detect Differential Item Functioning for Gender in the Institutional Integration Scale

    ERIC Educational Resources Information Center

    Breidenbach, Daniel H.; French, Brian F.

    2011-01-01

    Many factors can influence a student's decision to withdraw from college. Intervention programs aimed at retention can benefit from understanding the factors related to such decisions, especially in underrepresented groups. The Institutional Integration Scale (IIS) has been suggested as a predictor of student persistence. Accurate prediction of…

  1. The Effect of Missing Data Treatment on Mantel-Haenszel DIF Detection

    ERIC Educational Resources Information Center

    Emenogu, Barnabas C.; Falenchuk, Olesya; Childs, Ruth A.

    2010-01-01

    Most implementations of the Mantel-Haenszel differential item functioning procedure delete records with missing responses or replace missing responses with scores of 0. These treatments of missing data make strong assumptions about the causes of the missing data. Such assumptions may be particularly problematic when groups differ in their patterns…

  2. Using Loss Functions for DIF Detection: An Empirical Bayes Approach.

    ERIC Educational Resources Information Center

    Zwick, Rebecca; Thayer, Dorothy; Lewis, Charles

    2000-01-01

    Studied a method for flagging differential item functioning (DIF) based on loss functions. Builds on earlier research that led to the development of an empirical Bayes enhancement to the Mantel-Haenszel DIF analysis. Tested the method through simulation and found its performance better than some commonly used DIF classification systems. (SLD)

  3. The Communicative Participation Item Bank (CPIB): Item bank calibration and development of a disorder-generic short form

    PubMed Central

    Baylor, Carolyn; Yorkston, Kathryn; Eadie, Tanya; Kim, Jiseon; Chung, Hyewon; Amtmann, Dagmar

    2015-01-01

    Purpose The purpose of this study was to calibrate the items for the Communicative Participation Item Bank (CPIB) using Item Response Theory (IRT). One overriding objective was to examine if the IRT item parameters would be consistent across different diagnostic groups, thereby allowing creation of a disorder-generic instrument. The intended outcomes were the final item bank and a short form ready for clinical and research applications. Methods Self-report data were collected from 701 individuals representing four diagnoses: multiple sclerosis, Parkinson’s disease, amyotrophic lateral sclerosis and head and neck cancer. Participants completed the CPIB and additional self-report questionnaires. CPIB data were analyzed using the IRT Graded Response Model (GRM). Results The initial set of 94 candidate CPIB items were reduced to an item bank of 46 items demonstrating unidimensionality, local independence, good item fit, and good measurement precision. Differential item function (DIF) analyses detected no meaningful differences across diagnostic groups. A 10-item, disorder-generic short form was generated. Conclusions The CPIB provides speech-language pathologists with a unidimensional, self-report outcomes measurement instrument dedicated to the construct of communicative participation. This instrument may be useful to clinicians and researchers wanting to implement measures of communicative participation in their work. PMID:23816661

  4. Differential Item Functioning in Primary Healthcare Evaluation Instruments by French/English Version, Educational Level and Urban/Rural Location

    PubMed Central

    Haggerty, Jeannie L.; Bouharaoui, Fatima; Santor, Darcy A.

    2011-01-01

    Evaluating the extent to which groups or subgroups of individuals differ with respect to primary healthcare experience depends on first ruling out the possibility of bias. Objective: To determine whether item or subscale performance differs systematically between French/English, high/low education subgroups and urban/rural residency. Method: A sample of 645 adult users balanced by French/English language (in Quebec and Nova Scotia, respectively), high/low education and urban/rural residency responded to six validated instruments: the Primary Care Assessment Survey (PCAS); the Primary Care Assessment Tool – Short Form (PCAT-S); the Components of Primary Care Index (CPCI); the first version of the EUROPEP (EUROPEP-I); the Interpersonal Processes of Care Survey, version II (IPC-II); and part of the Veterans Affairs National Outpatient Customer Satisfaction Survey (VANOCSS). We normalized subscale scores to a 0-to-10 scale and tested for between-group differences using ANOVA tests. We used a parametric item response model to test for differences between subgroups in item discriminability and item difficulty. We re-examined group differences after removing items with differential item functioning. Results: Experience of care was assessed more positively in the English-speaking (Nova Scotia) than in the French-speaking (Quebec) respondents. We found differential English/French item functioning in 48% of the 153 items: discriminability in 20% and differential difficulty in 28%. English items were more discriminating generally than the French. Removing problematic items did not change the differences in French/English assessments. Differential item functioning by high/low education status affected 27% of items, with items being generally more discriminating in high-education groups. Between-group comparisons were unchanged. In contrast, only 9% of items showed differential item functioning by geography, affecting principally the accessibility attribute. Removing problematic items reversed a previously non-significant finding, revealing poorer first-contact access in rural than in urban areas. Conclusion: Differential item functioning does not bias or invalidate French/English comparisons on subscales, but additional development is required to make French and English items equivalent. These instruments are relatively robust by educational status and geography, but results suggest potential differences in the underlying construct in low-education and rural respondents. PMID:23205035

  5. The Mediated MIMIC Model for Understanding the Underlying Mechanism of DIF.

    PubMed

    Cheng, Ying; Shao, Can; Lathrop, Quinn N

    2016-02-01

    Due to its flexibility, the multiple-indicator, multiple-causes (MIMIC) model has become an increasingly popular method for the detection of differential item functioning (DIF). In this article, we propose the mediated MIMIC model method to uncover the underlying mechanism of DIF. This method extends the usual MIMIC model by including one variable or multiple variables that may completely or partially mediate the DIF effect. If complete mediation effect is found, the DIF effect is fully accounted for. Through our simulation study, we find that the mediated MIMIC model is very successful in detecting the mediation effect that completely or partially accounts for DIF, while keeping the Type I error rate well controlled for both balanced and unbalanced sample sizes between focal and reference groups. Because it is successful in detecting such mediation effects, the mediated MIMIC model may help explain DIF and give guidance in the revision of a DIF item.

  6. The Mediated MIMIC Model for Understanding the Underlying Mechanism of DIF

    PubMed Central

    Cheng, Ying; Shao, Can; Lathrop, Quinn N.

    2015-01-01

    Due to its flexibility, the multiple-indicator, multiple-causes (MIMIC) model has become an increasingly popular method for the detection of differential item functioning (DIF). In this article, we propose the mediated MIMIC model method to uncover the underlying mechanism of DIF. This method extends the usual MIMIC model by including one variable or multiple variables that may completely or partially mediate the DIF effect. If complete mediation effect is found, the DIF effect is fully accounted for. Through our simulation study, we find that the mediated MIMIC model is very successful in detecting the mediation effect that completely or partially accounts for DIF, while keeping the Type I error rate well controlled for both balanced and unbalanced sample sizes between focal and reference groups. Because it is successful in detecting such mediation effects, the mediated MIMIC model may help explain DIF and give guidance in the revision of a DIF item.

  7. Gender-Based Differential Item Performance in Mathematics Achievement Items.

    ERIC Educational Resources Information Center

    Doolittle, Allen E.; Cleary, T. Anne

    1987-01-01

    Eight randomly equivalent samples of high school seniors were each given a unique form of the ACT Assessment Mathematics Usage Test (ACTM). Signed measures of differential item performance (DIP) were obtained for each item in the eight ACTM forms. DIP estimates were analyzed and a significant item category effect was found. (Author/LMO)

  8. Gender and Ethnicity Differences on the Abridged Big Five Circumplex (AB5C) of Personality Traits: A Differential Item Functioning Analysis

    ERIC Educational Resources Information Center

    Mitchelson, Jacqueline K.; Wicher, Eliza W.; LeBreton, James M.; Craig, S. Bartholomew

    2009-01-01

    The current study evaluates the measurement precision of the Abridged Big Five Circumplex (AB5C) of personality traits by identifying those items that demonstrate differential item functioning by gender and ethnicity. Differential item functioning is found in 33 of 45 (73%) of the AB5C scales, across gender and ethnic groups (Caucasian vs. African…

  9. A symptom profile of depression among Asian Americans: is there evidence for differential item functioning of depressive symptoms?

    PubMed

    Kalibatseva, Z; Leong, F T L; Ham, E H

    2014-09-01

    Theoretical and clinical publications suggest the existence of cultural differences in the expression and experience of depression. Measurement non-equivalence remains a potential methodological explanation for the lower prevalence of depression among Asian Americans compared to European Americans. This study compared DSM-IV depressive symptoms among Asian Americans and European Americans using secondary data analysis of the Collaborative Psychiatric Epidemiology Surveys (CPES). The Composite International Diagnostic Interview (CIDI) was used for the assessment of depressive symptoms. Of the entire sample, 310 Asian Americans and 1974 European Americans reported depressive symptoms and were included in the analyses. Measurement variance was examined with an item response theory differential item functioning (IRT DIF) analysis. χ2 analyses indicated that, compared to Asian Americans, European American participants more frequently endorsed affective symptoms such as 'feeling depressed', 'feeling discouraged' and 'cried more often'. The IRT analysis detected DIF for four out of the 15 depression symptom items. At equal levels of depression, Asian Americans endorsed feeling worthless and appetite changes more easily than European Americans, and European Americans endorsed feeling nervous and crying more often than Asian Americans. Asian Americans did not seem to over-report somatic symptoms; however, European Americans seemed to report more affective symptoms than Asian Americans. The results suggest that there was measurement variance in a few of the depression items.

  10. Recent advances in analysis of differential item functioning in health research using the Rasch model.

    PubMed

    Hagquist, Curt; Andrich, David

    2017-09-19

    Rasch analysis with a focus on Differential Item Functioning (DIF) is increasingly used for examination of psychometric properties of health outcome measures. To take account of DIF in order to retain precision of measurement, split of DIF-items into separate sample specific items has become a frequently used technique. The purpose of the paper is to present and summarise recent advances of analysis of DIF in a unified methodology. In particular, the paper focuses on the use of analysis of variance (ANOVA) as a method to simultaneously detect uniform and non-uniform DIF, the need to distinguish between real and artificial DIF and the trade-off between reliability and validity. An illustrative example from health research is used to demonstrate how DIF, in this case between genders, can be identified, quantified and under specific circumstances accounted for using the Rasch model. Rasch analyses of DIF were conducted of a composite measure of psychosomatic problems using Swedish data from the Health Behaviour in School-aged Children study for grade 9 students collected during the 1985-2014 time periods. The procedures demonstrate how DIF can be identified efficiently by ANOVA of residuals, and how the magnitude of DIF can be quantified and potentially accounted for by resolving items according to identifiable groups and using principles of test equating on the resolved items. The results of the analysis also show that the real DIF in some items does affect person measurement estimates. Firstly, in order to distinguish between real and artificial DIF, the items showing DIF initially should not be resolved simultaneously but sequentially. Secondly, while resolving instead of deleting a DIF item may retain reliability, both options may affect the content validity negatively. Resolving items with DIF is not justified if the source of the DIF is relevant for the content of the variable; then resolving DIF may deteriorate the validity of the instrument. Generally, decisions on resolving items to deal with DIF should also rely on external information.

  11. Using Multidimensional Rasch Analysis to Validate the Chinese Version of the Motivated Strategies for Learning Questionnaire (MSLQ-CV)

    ERIC Educational Resources Information Center

    Lee, John Chi-Kin; Zhang, Zhonghua; Yin, Hongbiao

    2010-01-01

    This article used the multidimensional random coefficients multinomial logit model to examine the construct validity and detect the substantial differential item functioning (DIF) of the Chinese version of motivated strategies for learning questionnaire (MSLQ-CV). A total of 1,354 Hong Kong junior high school students were administered the…

  12. The Mediated MIMIC Model for Understanding the Underlying Mechanism of DIF

    ERIC Educational Resources Information Center

    Cheng, Ying; Shao, Can; Lathrop, Quinn N.

    2016-01-01

    Due to its flexibility, the multiple-indicator, multiple-causes (MIMIC) model has become an increasingly popular method for the detection of differential item functioning (DIF). In this article, we propose the mediated MIMIC model method to uncover the underlying mechanism of DIF. This method extends the usual MIMIC model by including one variable…

  13. Rasch Mixture Models for DIF Detection: A Comparison of Old and New Score Specifications

    ERIC Educational Resources Information Center

    Frick, Hannah; Strobl, Carolin; Zeileis, Achim

    2015-01-01

    Rasch mixture models can be a useful tool when checking the assumption of measurement invariance for a single Rasch model. They provide advantages compared to manifest differential item functioning (DIF) tests when the DIF groups are only weakly correlated with the manifest covariates available. Unlike in single Rasch models, estimation of Rasch…

  14. Type I Error Inflation for Detecting DIF in the Presence of Impact

    ERIC Educational Resources Information Center

    DeMars, Christine E.

    2010-01-01

    In this brief explication, two challenges for using differential item functioning (DIF) measures when there are large group differences in true proficiency are illustrated. Each of these difficulties may lead to inflated Type I error rates, for very different reasons. One problem is that groups matched on observed score are not necessarily well…

  15. Item and scale differential functioning of the Mini-Mental State Exam assessed using the Differential Item and Test Functioning (DFIT) Framework.

    PubMed

    Morales, Leo S; Flowers, Claudia; Gutierrez, Peter; Kleinman, Marjorie; Teresi, Jeanne A

    2006-11-01

    To illustrate the application of the Differential Item and Test Functioning (DFIT) method using English and Spanish versions of the Mini-Mental State Examination (MMSE). Study participants were 65 years of age or older and lived in North Manhattan, New York. Of the 1578 study participants who were administered the MMSE 665 completed it in Spanish. : The MMSE contains 20 items that measure the degree of cognitive impairment in the areas of orientation, attention and calculation, registration, recall and language, as well as the ability to follow verbal and written commands. After assessing the dimensionality of the MMSE scale, item response theory person and item parameters were estimated separately for the English and Spanish sample using Samejima's 2-parameter graded response model. Then the DFIT framework was used to assess differential item functioning (DIF) and differential test functioning (DTF). Nine items were found to show DIF; these were items that ask the respondent to name the correct season, day of the month, city, state, and 2 nearby streets, recall 3 objects, repeat the phrase no ifs, no ands, no buts, follow the command, "close your eyes," and the command, "take the paper in your right hand, fold the paper in half with both hands, and put the paper down in your lap." At the scale level, however, the MMSE did not show differential functioning. Respondents to the English and Spanish versions of the MMSE are comparable on the basis of scale scores. However, assessments based on individual MMSE items may be misleading.

  16. Improving measurement of injection drug risk behavior using item response theory.

    PubMed

    Janulis, Patrick

    2014-03-01

    Recent research highlights the multiple steps to preparing and injecting drugs and the resultant viral threats faced by drug users. This research suggests that more sensitive measurement of injection drug HIV risk behavior is required. In addition, growing evidence suggests there are gender differences in injection risk behavior. However, the potential for differential item functioning between genders has not been explored. To explore item response theory as an improved measurement modeling technique that provides empirically justified scaling of injection risk behavior and to examine for potential gender-based differential item functioning. Data is used from three studies in the National Institute on Drug Abuse's Criminal Justice Drug Abuse Treatment Studies. A two-parameter item response theory model was used to scale injection risk behavior and logistic regression was used to examine for differential item functioning. Item fit statistics suggest that item response theory can be used to scale injection risk behavior and these models can provide more sensitive estimates of risk behavior. Additionally, gender-based differential item functioning is present in the current data. Improved measurement of injection risk behavior using item response theory should be encouraged as these models provide increased congruence between construct measurement and the complexity of injection-related HIV risk. Suggestions are made to further improve injection risk behavior measurement. Furthermore, results suggest direct comparisons of composite scores between males and females may be misleading and future work should account for differential item functioning before comparing levels of injection risk behavior.

  17. Item Response Theory with Covariates (IRT-C): Assessing Item Recovery and Differential Item Functioning for the Three-Parameter Logistic Model

    ERIC Educational Resources Information Center

    Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.

    2016-01-01

    In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…

  18. Applying a Mixed Methods Framework to Differential Item Function Analyses

    ERIC Educational Resources Information Center

    Hitchcock, John H.; Johanson, George A.

    2015-01-01

    Understanding the reason(s) for Differential Item Functioning (DIF) in the context of measurement is difficult. Although identifying potential DIF items is typically a statistical endeavor, understanding the reasons for DIF (and item repair or replacement) might require investigations that can be informed by qualitative work. Such work is…

  19. Effect of Differential Item Functioning on Test Equating

    ERIC Educational Resources Information Center

    Kabasakal, Kübra Atalay; Kelecioglu, Hülya

    2015-01-01

    This study examines the effect of differential item functioning (DIF) items on test equating through multilevel item response models (MIRMs) and traditional IRMs. The performances of three different equating models were investigated under 24 different simulation conditions, and the variables whose effects were examined included sample size, test…

  20. Ramsay-Curve Differential Item Functioning

    ERIC Educational Resources Information Center

    Woods, Carol M.

    2011-01-01

    Differential item functioning (DIF) occurs when an item on a test, questionnaire, or interview has different measurement properties for one group of people versus another, irrespective of true group-mean differences on the constructs being measured. This article is focused on item response theory based likelihood ratio testing for DIF (IRT-LR or…

  1. Differential Item Functioning Analysis Using Rasch Item Information Functions

    ERIC Educational Resources Information Center

    Wyse, Adam E.; Mapuranga, Raymond

    2009-01-01

    Differential item functioning (DIF) analysis is a statistical technique used for ensuring the equity and fairness of educational assessments. This study formulates a new DIF analysis method using the information similarity index (ISI). ISI compares item information functions when data fits the Rasch model. Through simulations and an international…

  2. A Generalized DIF Effect Variance Estimator for Measuring Unsigned Differential Test Functioning in Mixed Format Tests

    ERIC Educational Resources Information Center

    Penfield, Randall D.; Algina, James

    2006-01-01

    One approach to measuring unsigned differential test functioning is to estimate the variance of the differential item functioning (DIF) effect across the items of the test. This article proposes two estimators of the DIF effect variance for tests containing dichotomous and polytomous items. The proposed estimators are direct extensions of the…

  3. The utility of a classificatory decision tree approach to assist clinical differentiation of melancholic and non-melancholic depression.

    PubMed

    Parker, G; McCraw, S; Hadzi-Pavlovic, D

    2015-07-15

    Studies suggest that differentiating melancholic from non-melancholic depressive disorders is advanced by use of illness course as well as symptom variables but, in practice, potentially differentiating variables are generally positioned as having equal value. Judging that differentiating features are more likely to vary in their signal intensity, we sought to determine the number of features required to effect differentiation and their hierarchical order. The 24-item clinician-rated Sydney Melancholia Prototype Index (SMPI-CR) was completed for 364 unipolar depressed patients. The sample was divided into two cohorts according to the recruitment period. An RPART classification tree analysis identified the most discriminating SMPI items in the development sample of 197 patients, and examined the sensitivity and specificity of the diagnostic decisions, then sought to replicate findings in a validation sample of 169 patients. Independent analyses of putative SMPI items identified only seven items as required to discriminate those with clinically-diagnosed melancholic or non-melancholic depression when the conditions were examined separately. An RPART analysis considering differentiation of melancholic and non-melancholic depression in the total samples retained five of those items in the classification tree, three of which were non-symptom items, and with 92% sensitivity and 80% specificity in the development sample. This reduced item set showed 93% sensitivity and 82% specificity in the validation sample. Our clinical judgment of melancholic or non-melancholic depression may not correspond with the clinical logic employed by other clinicians. Only five SMPI items were required to derive a succinct and efficient decision tree, comprising high sensitivity and specificity in differentiating melancholic and non-melancholic depression. Current study findings provide an empirical model that could enrich clinicians׳ approach to differentiating melancholic and non-melancholic depression. Copyright © 2015 Elsevier B.V. All rights reserved.

  4. Screening Test Items for Differential Item Functioning

    ERIC Educational Resources Information Center

    Longford, Nicholas T.

    2014-01-01

    A method for medical screening is adapted to differential item functioning (DIF). Its essential elements are explicit declarations of the level of DIF that is acceptable and of the loss function that quantifies the consequences of the two kinds of inappropriate classification of an item. Instead of a single level and a single function, sets of…

  5. Differential Item Functioning: Its Consequences. Research Report. ETS RR-10-01

    ERIC Educational Resources Information Center

    Lee, Yi-Hsuan; Zhang, Jinming

    2010-01-01

    This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…

  6. Real and Artificial Differential Item Functioning

    ERIC Educational Resources Information Center

    Andrich, David; Hagquist, Curt

    2012-01-01

    The literature in modern test theory on procedures for identifying items with differential item functioning (DIF) among two groups of persons includes the Mantel-Haenszel (MH) procedure. Generally, it is not recognized explicitly that if there is real DIF in some items which favor one group, then as an artifact of this procedure, artificial DIF…

  7. Seeking missing pieces in science concept assessments: Reevaluating the Brief Electricity and Magnetism Assessment through Rasch analysis

    NASA Astrophysics Data System (ADS)

    Ding, Lin

    2014-02-01

    Discipline-based science concept assessments are powerful tools to measure learners' disciplinary core ideas. Among many such assessments, the Brief Electricity and Magnetism Assessment (BEMA) has been broadly used to gauge student conceptions of key electricity and magnetism (E&M) topics in college-level introductory physics courses. Differing from typical concept inventories that focus only on one topic of a subject area, BEMA covers a broad range of topics in the electromagnetism domain. In spite of this fact, prior studies exclusively used a single aggregate score to represent individual students' overall understanding of E&M without explicating the construct of this assessment. Additionally, BEMA has been used to compare traditional physics courses with a reformed course entitled Matter and Interactions (M&I). While prior findings were in favor of M&I, no empirical evidence was sought to rule out possible differential functioning of BEMA that may have inadvertently advantaged M&I students. In this study, we used Rasch analysis to seek two missing pieces regarding the construct and differential functioning of BEMA. Results suggest that although BEMA items generally can function together to measure the same construct of application and analysis of E&M concepts, several items may need further revision. Additionally, items that demonstrate differential functioning for the two courses are detected. Issues such as item contextual features and student familiarity with question settings may underlie these findings. This study highlights often overlooked threats in science concept assessments and provides an exemplar for using evidence-based reasoning to make valid inferences and arguments.

  8. Concreteness effects in short-term memory: a test of the item-order hypothesis.

    PubMed

    Roche, Jaclynn; Tolan, G Anne; Tehan, Gerald

    2011-12-01

    The following experiments explore word length and concreteness effects in short-term memory within an item-order processing framework. This framework asserts order memory is better for those items that are relatively easy to process at the item level. However, words that are difficult to process benefit at the item level for increased attention/resources being applied. The prediction of the model is that differential item and order processing can be detected in episodic tasks that differ in the degree to which item or order memory are required by the task. The item-order account has been applied to the word length effect such that there is a short word advantage in serial recall but a long word advantage in item recognition. The current experiment considered the possibility that concreteness effects might be explained within the same framework. In two experiments, word length (Experiment 1) and concreteness (Experiment 2) are examined using forward serial recall, backward serial recall, and item recognition. These results for word length replicate previous studies showing the dissociation in item and order tasks. The same was not true for the concreteness effect. In all three tasks concrete words were better remembered than abstract words. The concreteness effect cannot be explained in terms of an item-order trade off. PsycINFO Database Record (c) 2011 APA, all rights reserved.

  9. The Effects of Testlets on Reliability and Differential Item Functioning

    ERIC Educational Resources Information Center

    Teker, Gulsen Tasdelen; Dogan, Nuri

    2015-01-01

    Reliability and differential item functioning (DIF) analyses were conducted on testlets displaying local item dependence in this study. The data set employed in the research was obtained from the answers given by 1,500 students to the 20 items included in six testlets given in English Proficiency Exam by the School of Foreign Languages of a state…

  10. MIMIC Methods for Assessing Differential Item Functioning in Polytomous Items

    ERIC Educational Resources Information Center

    Wang, Wen-Chung; Shih, Ching-Lin

    2010-01-01

    Three multiple indicators-multiple causes (MIMIC) methods, namely, the standard MIMIC method (M-ST), the MIMIC method with scale purification (M-SP), and the MIMIC method with a pure anchor (M-PA), were developed to assess differential item functioning (DIF) in polytomous items. In a series of simulations, it appeared that all three methods…

  11. Identifying Differential Item Functioning of Rating Scale Items with the Rasch Model: An Introduction and an Application

    ERIC Educational Resources Information Center

    Myers, Nicholas D.; Wolfe, Edward W.; Feltz, Deborah L.; Penfield, Randall D.

    2006-01-01

    This study (a) provided a conceptual introduction to differential item functioning (DIF), (b) introduced the multifaceted Rasch rating scale model (MRSM) and an associated statistical procedure for identifying DIF in rating scale items, and (c) applied this procedure to previously collected data from American coaches who responded to the coaching…

  12. Differential Item Functioning Analysis of the 2003-04 NHANES Physical Activity Questionnaire

    ERIC Educational Resources Information Center

    Gao, Yong; Zhu, Weimo

    2011-01-01

    Using differential item functioning (DIF) analyses, this study examined whether there were any DIF items in the National Health and Nutrition Examination Survey (NHANES) physical activity (PA) questionnaire. A subset of adult data from the 2003-04 NHANES study (n = 3,083) was used. PA items related to respondents' occupational, transportation,…

  13. The importance of considering differential item functioning in investigating the impact of chronic conditions on health-related quality of life in a multi-ethnic Asian population.

    PubMed

    Abdin, Edimansyah; Subramaniam, Mythily; Picco, Louisa; Pang, Shirlene; Vaingankar, Janhavi Ajit; Shahwan, Shazana; Sagayadevan, Vathsala; Zhang, Yunjue; Chong, Siow Ann

    2017-04-01

    The present study aims to examine the impact of chronic conditions after adjusting for differential item functioning (DIF) on the various aspects of health-related quality of life (HRQoL) in a multi-ethnic Asian population in Singapore. Data on 3006 participants from a nation-wide cross-sectional survey of mental health literacy conducted in Singapore were used. Multiple Indicators Multiple Causes model was used to investigate the effects of chronic medical conditions on various HRQoL dimensions assessed with the 36-item Medical Outcomes Study Short Form Health Survey (SF-36) after adjusting for DIF. Twenty out of 36 items were detected with DIF for chronic conditions including high blood pressure, cardiovascular disorders, diabetes, cancer, neurological disorders and ulcer as well as for a few demographic factors such age, gender and marital status. Twenty significant associations between chronic conditions and SF-36 domains were observed. After controlling for all chronic conditions, socio-demographic and DIF items, a significant association emerged between cardiovascular disorders and physical functioning, while the association between diabetes and ulcer and general health became nonsignificant. All other associations remained statistically significant. Our findings provide useful information and important implications of DIF on the impact of chronic conditions on HRQoL. We found the impact of DIF with respect to the impact of chronic conditions on HRQoL to be minimal after accounting for measurement bias in this multiracial Asian population.

  14. Differential Item Functioning by Gender on a Large-Scale Science Performance Assessment: A Comparison across Grade Levels.

    ERIC Educational Resources Information Center

    Holweger, Nancy; Taylor, Grace

    The fifth-grade and eighth-grade science items on a state performance assessment were compared for differential item functioning (DIF) due to gender. The grade 5 sample consisted of 8,539 females and 8,029 males and the grade 8 sample consisted of 7,477 females and 7,891 males. A total of 30 fifth grade items and 26 eighth grade items were…

  15. Exploring differential item functioning (DIF) with the Rasch model: a comparison of gender differences on eighth grade science items in the United States and Spain.

    PubMed

    Babiar, Tasha Calvert

    2011-01-01

    Traditionally, women and minorities have not been fully represented in science and engineering. Numerous studies have attributed these differences to gaps in science achievement as measured by various standardized tests. Rather than describe mean group differences in science achievement across multiple cultures, this study focused on an in-depth item-level analysis across two countries: Spain and the United States. This study investigated eighth-grade gender differences on science items across the two countries. A secondary purpose of the study was to explore the nature of gender differences using the many-faceted Rasch Model as a way to estimate gender DIF. A secondary analysis of data from the Third International Mathematics and Science Study (TIMSS) was used to address three questions: 1) Does gender DIF in science achievement exist? 2) Is there a relationship between gender DIF and characteristics of the science items? 3) Do the relationships between item characteristics and gender DIF in science items replicate across countries. Participants included 7,087 eight grade students from the United States and 3,855 students from Spain who participated in TIMSS. The Facets program (Linacre and Wright, 1992) was used to estimate gender DIF. The results of the analysis indicate that the content of the item seemed to be related to gender DIF. The analysis also suggests that there is a relationship between gender DIF and item format. No pattern of gender DIF related to cognitive demand was found. The general pattern of gender DIF was similar across the two countries used in the analysis. The strength of item-level analysis as opposed to group mean difference analysis is that gender differences can be detected at the item level, even when no mean differences can be detected at the group level.

  16. Assessing Unidimensionality and Differential Item Functioning in Qualifying Examination for Senior Secondary School Students, Osun State, Nigeria

    ERIC Educational Resources Information Center

    Ajeigbe, Taiwo Oluwafemi; Afolabi, Eyitayo Rufus Ifedayo

    2017-01-01

    This study assessed unidimensionality and occurrence of Differential Item Functioning (DIF) in Mathematics and English Language items of Osun State Qualifying Examination. The study made use of secondary data. The results showed that OSQ Mathematics (-0.094 = r = 0.236) and English Language items (-0.095 = r = 0.228) were unidimensional. Also,…

  17. Ability or Access-Ability: Differential Item Functioning of Items on Alternate Performance-Based Assessment Tests for Students with Visual Impairments

    ERIC Educational Resources Information Center

    Zebehazy, Kim T.; Zigmond, Naomi; Zimmerman, George J.

    2012-01-01

    Introduction: This study investigated differential item functioning (DIF) of test items on Pennsylvania's Alternate System of Assessment (PASA) for students with visual impairments and severe cognitive disabilities and what the reasons for the differences may be. Methods: The Wilcoxon signed ranks test was used to analyze differences in the scores…

  18. A Rasch Differential Item Functioning Analysis of the Massachusetts Youth Screening Instrument: Identifying Race and Gender Differential Item Functioning among Juvenile Offenders

    ERIC Educational Resources Information Center

    Cauffman, Elizabeth; MacIntosh, Randall

    2006-01-01

    The juvenile justice system needs a tool that can identify and assess mental health problems among youths quickly with validity and reliability. The goal of this article is to evaluate the racial/ethnic and gender differential item functioning (DIF) of the Massachusetts Youth Screening Instrument-Second Version (MAYSI-2) using the Rasch Model.…

  19. Group-Specific Effects of Matching Subtest Contamination on the Identification of Differential Item Functioning

    ERIC Educational Resources Information Center

    Keiffer, Elizabeth Ann

    2011-01-01

    A differential item functioning (DIF) simulation study was conducted to explore the type and level of impact that contamination had on type I error and power rates in DIF analyses when the suspect item favored the same or opposite group as the DIF items in the matching subtest. Type I error and power rates were displayed separately for the…

  20. Responding to Claims of Misrepresentation

    ERIC Educational Resources Information Center

    Santelices, Maria Veronica; Wilson, Mark

    2010-01-01

    In their paper "Unfair Treatment? The Case of Freedle, the SAT, and the Standardization Approach to Differential Item Functioning" (Santelices & Wilson, 2010), the authors studied claims of differential effects of the SAT on Latinos and African Americans through the methodology of differential item functioning (DIF). Previous…

  1. A Rasch-validated version of the upper extremity functional index for interval-level measurement of upper extremity function.

    PubMed

    Hamilton, Clayon B; Chesworth, Bert M

    2013-11-01

    The original 20-item Upper Extremity Functional Index (UEFI) has not undergone Rasch validation. The purpose of this study was to determine whether Rasch analysis supports the UEFI as a measure of a single construct (ie, upper extremity function) and whether a Rasch-validated UEFI has adequate reproducibility for individual-level patient evaluation. This was a secondary analysis of data from a repeated-measures study designed to evaluate the measurement properties of the UEFI over a 3-week period. Patients (n=239) with musculoskeletal upper extremity disorders were recruited from 17 physical therapy clinics across 4 Canadian provinces. Rasch analysis of the UEFI measurement properties was performed. If the UEFI did not fit the Rasch model, misfitting patients were deleted, items with poor response structure were corrected, and misfitting items and redundant items were deleted. The impact of differential item functioning on the ability estimate of patients was investigated. A 15-item modified UEFI was derived to achieve fit to the Rasch model where the total score was supported as a measure of upper extremity function only. The resultant UEFI-15 interval-level scale (0-100, worst to best state) demonstrated excellent internal consistency (person separation index=0.94) and test-retest reliability (intraclass correlation coefficient [2,1]=.95). The minimal detectable change at the 90% confidence interval was 8.1. Patients who were ambidextrous or bilaterally affected were excluded to allow for the analysis of differential item functioning due to limb involvement and arm dominance. Rasch analysis did not support the validity of the 20-item UEFI. However, the UEFI-15 was a valid and reliable interval-level measure of a single dimension: upper extremity function. Rasch analysis supports using the UEFI-15 in physical therapist practice to quantify upper extremity function in patients with musculoskeletal disorders of the upper extremity.

  2. A Rasch-Validated Version of the Upper Extremity Functional Index for Interval-Level Measurement of Upper Extremity Function

    PubMed Central

    Chesworth, Bert M.

    2013-01-01

    Background The original 20-item Upper Extremity Functional Index (UEFI) has not undergone Rasch validation. Objective The purpose of this study was to determine whether Rasch analysis supports the UEFI as a measure of a single construct (ie, upper extremity function) and whether a Rasch-validated UEFI has adequate reproducibility for individual-level patient evaluation. Design This was a secondary analysis of data from a repeated-measures study designed to evaluate the measurement properties of the UEFI over a 3-week period. Methods Patients (n=239) with musculoskeletal upper extremity disorders were recruited from 17 physical therapy clinics across 4 Canadian provinces. Rasch analysis of the UEFI measurement properties was performed. If the UEFI did not fit the Rasch model, misfitting patients were deleted, items with poor response structure were corrected, and misfitting items and redundant items were deleted. The impact of differential item functioning on the ability estimate of patients was investigated. Results A 15-item modified UEFI was derived to achieve fit to the Rasch model where the total score was supported as a measure of upper extremity function only. The resultant UEFI-15 interval-level scale (0–100, worst to best state) demonstrated excellent internal consistency (person separation index=0.94) and test-retest reliability (intraclass correlation coefficient [2,1]=.95). The minimal detectable change at the 90% confidence interval was 8.1. Limitations Patients who were ambidextrous or bilaterally affected were excluded to allow for the analysis of differential item functioning due to limb involvement and arm dominance. Conclusion Rasch analysis did not support the validity of the 20-item UEFI. However, the UEFI-15 was a valid and reliable interval-level measure of a single dimension: upper extremity function. Rasch analysis supports using the UEFI-15 in physical therapist practice to quantify upper extremity function in patients with musculoskeletal disorders of the upper extremity. PMID:23813086

  3. The Usefulness of Differential Item Functioning Methodology in Longitudinal Intervention Studies

    USDA-ARS?s Scientific Manuscript database

    Perceived self-efficacy (SE) for engaging in physical activity (PA) is a key variable mediating PA change in interventions. The purpose of this study is to demonstrate the usefulness of item response modeling-based (IRM) differential item functioning (DIF) in the investigation of group differences ...

  4. DIFAS: Differential Item Functioning Analysis System. Computer Program Exchange

    ERIC Educational Resources Information Center

    Penfield, Randall D.

    2005-01-01

    Differential item functioning (DIF) is an important consideration in assessing the validity of test scores (Camilli & Shepard, 1994). A variety of statistical procedures have been developed to assess DIF in tests of dichotomous (Hills, 1989; Millsap & Everson, 1993) and polytomous (Penfield & Lam, 2000; Potenza & Dorans, 1995) items. Some of these…

  5. Does Gender-Specific Differential Item Functioning Affect the Structure in Vocational Interest Inventories?

    ERIC Educational Resources Information Center

    Beinicke, Andrea; Pässler, Katja; Hell, Benedikt

    2014-01-01

    The study investigates consequences of eliminating items showing gender-specific differential item functioning (DIF) on the psychometric structure of a standard RIASEC interest inventory. Holland's hexagonal model was tested for structural invariance using a confirmatory methodological approach (confirmatory factor analysis and randomization…

  6. Testing parent dyad interchangeability in the parent proxy-report of PedsQL™ 4.0: a differential item functioning analysis.

    PubMed

    Doostfatemeh, Marziyeh; Ayatollahi, Seyyed Mohammad Taghi; Jafari, Peyman

    2015-08-01

    In child-parent agreement studies in the field of paediatric health-related quality of life (HRQoL), little attention has been paid to the effect of gender in parental proxy rating of children's HRQoL. This study aims to test the potential interchangeability of parent dyads in reporting children's HRQoL on both item and scale levels of the PedsQL™ 4.0 instrument, using the approach of differential item functioning (DIF). The PedsQL™ 4.0 Generic Core Scales were completed by 576 father-and-mother dyads. A polytomous item response theory model, graded response model, was used to detect DIF across fathers and mothers. Assessment at item level showed that fathers and mothers perceived the meaning of items of the PedsQL™ 4.0 consistently. Regarding the scale level, a moderate to high level of agreement was observed between mothers' and fathers' reports on all similar subscales. Although the significant mean score differences in total, physical and emotional functioning indicated that fathers gave higher scores to their children, the small effect size implied that this difference may not be practically meaningful. Our findings revealed that discrepancy in parent dyads in rating children's HRQoL is a "real" difference and not an artefact due to measurement non-invariance. Fathers were seen to have slightly different insights into their children, especially for emotional functioning, but overall the results were not all that different. This suggests that paternal proxy-reports can be included in studies along with maternal proxy-reports, and the two may be combined when looking at parent-child agreement. Parent-child agreement studies in Iran are not affected by parents' gender, and therefore, researchers may rely on the assumption of the interchangeability of fathers and mothers in these studies.

  7. Cross-cultural differences in knee functional status outcomes in a polyglot society represented true disparities not biased by differential item functioning.

    PubMed

    Deutscher, Daniel; Hart, Dennis L; Crane, Paul K; Dickstein, Ruth

    2010-12-01

    Comparative effectiveness research across cultures requires unbiased measures that accurately detect clinical differences between patient groups. The purpose of this study was to assess the presence and impact of differential item functioning (DIF) in knee functional status (FS) items administered using computerized adaptive testing (CAT) as a possible cause for observed differences in outcomes between 2 cultural patient groups in a polyglot society. This study was a secondary analysis of prospectively collected data. We evaluated data from 9,134 patients with knee impairments from outpatient physical therapy clinics in Israel. Items were analyzed for DIF related to sex, age, symptom acuity, surgical history, exercise history, and language used to complete the functional survey (Hebrew versus Russian). Several items exhibited DIF, but unadjusted FS estimates and FS estimates that accounted for DIF were essentially equal (intraclass correlation coefficient [2,1]>.999). No individual patient had a difference between unadjusted and adjusted FS estimates as large as the median standard error of the unadjusted estimates. Differences between groups defined by any of the covariates considered were essentially unchanged when using adjusted instead of unadjusted FS estimates. The greatest group-level impact was <0.3% of 1 standard deviation of the unadjusted FS estimates. Complete data where patients answered all items in the scale would have been preferred for DIF analysis, but only CAT data were available. Differences in FS outcomes between groups of patients with knee impairments who answered the knee CAT in Hebrew or Russian in Israel most likely reflected true differences that may reflect societal disparities in this health outcome.

  8. Gender differences in national assessment of educational progress science items: What does i don't know really mean?

    NASA Astrophysics Data System (ADS)

    Linn, Marcia C.; de Benedictis, Tina; Delucchi, Kevin; Harris, Abigail; Stage, Elizabeth

    The National Assessment of Educational Progress Science Assessment has consistently revealed small gender differences on science content items but not on science inquiry items. This assessment differs from others in that respondents can choose I don't know rather than guessing. This paper examines explanations for the gender differences including (a) differential prior instruction, (b) differential response to uncertainty and use of the I don't know response, (c) differential response to figurally presented items, and (d) different attitudes towards science. Of these possible explanations, the first two received support. Females are more likely to use the I don't know response, especially for items with physical science content or masculine themes such as football. To ameliorate this situation we need more effective science instruction and more gender-neutral assessment items.

  9. Temporal and Spatial Predictability of an Irrelevant Event Differently Affect Detection and Memory of Items in a Visual Sequence

    PubMed Central

    Ohyama, Junji; Watanabe, Katsumi

    2016-01-01

    We examined how the temporal and spatial predictability of a task-irrelevant visual event affects the detection and memory of a visual item embedded in a continuously changing sequence. Participants observed 11 sequentially presented letters, during which a task-irrelevant visual event was either present or absent. Predictabilities of spatial location and temporal position of the event were controlled in 2 × 2 conditions. In the spatially predictable conditions, the event occurred at the same location within the stimulus sequence or at another location, while, in the spatially unpredictable conditions, it occurred at random locations. In the temporally predictable conditions, the event timing was fixed relative to the order of the letters, while in the temporally unpredictable condition; it could not be predicted from the letter order. Participants performed a working memory task and a target detection reaction time (RT) task. Memory accuracy was higher for a letter simultaneously presented at the same location as the event in the temporally unpredictable conditions, irrespective of the spatial predictability of the event. On the other hand, the detection RTs were only faster for a letter simultaneously presented at the same location as the event when the event was both temporally and spatially predictable. Thus, to facilitate ongoing detection processes, an event must be predictable both in space and time, while memory processes are enhanced by temporally unpredictable (i.e., surprising) events. Evidently, temporal predictability has differential effects on detection and memory of a visual item embedded in a sequence of images. PMID:26869966

  10. Temporal and Spatial Predictability of an Irrelevant Event Differently Affect Detection and Memory of Items in a Visual Sequence.

    PubMed

    Ohyama, Junji; Watanabe, Katsumi

    2016-01-01

    We examined how the temporal and spatial predictability of a task-irrelevant visual event affects the detection and memory of a visual item embedded in a continuously changing sequence. Participants observed 11 sequentially presented letters, during which a task-irrelevant visual event was either present or absent. Predictabilities of spatial location and temporal position of the event were controlled in 2 × 2 conditions. In the spatially predictable conditions, the event occurred at the same location within the stimulus sequence or at another location, while, in the spatially unpredictable conditions, it occurred at random locations. In the temporally predictable conditions, the event timing was fixed relative to the order of the letters, while in the temporally unpredictable condition; it could not be predicted from the letter order. Participants performed a working memory task and a target detection reaction time (RT) task. Memory accuracy was higher for a letter simultaneously presented at the same location as the event in the temporally unpredictable conditions, irrespective of the spatial predictability of the event. On the other hand, the detection RTs were only faster for a letter simultaneously presented at the same location as the event when the event was both temporally and spatially predictable. Thus, to facilitate ongoing detection processes, an event must be predictable both in space and time, while memory processes are enhanced by temporally unpredictable (i.e., surprising) events. Evidently, temporal predictability has differential effects on detection and memory of a visual item embedded in a sequence of images.

  11. Examining Power and Type 1 Error for Step and Item Level Tests of Invariance: Investigating the Effect of the Number of Item Score Levels

    ERIC Educational Resources Information Center

    Ayodele, Alicia Nicole

    2017-01-01

    Within polytomous items, differential item functioning (DIF) can take on various forms due to the number of response categories. The lack of invariance at this level is referred to as differential step functioning (DSF). The most common DSF methods in the literature are the adjacent category log odds ratio (AC-LOR) estimator and cumulative…

  12. Using the Cumulative Common Log-Odds Ratio to Identify Differential Item Functioning of Rating Scale Items in the Exercise and Sport Sciences

    ERIC Educational Resources Information Center

    Penfield, Randall D.; Giacobbi, Peter R., Jr.; Myers, Nicholas D.

    2007-01-01

    One aspect of construct validity is the extent to which the measurement properties of a rating scale are invariant across the groups being compared. An increasingly used method for assessing between-group differences in the measurement properties of items of a scale is the framework of differential item functioning (DIF). In this paper we…

  13. Differential Item Functioning Analysis of the "Preschool Language Scale-4" between English-Speaking Hispanic and European American Children from Low-Income Families

    ERIC Educational Resources Information Center

    Qi, Cathy Huaqing; Marley, Scott C.

    2009-01-01

    The study examined whether item bias is present in the "Preschool Language Scale-4" (PLS-4). Participants were 440 children (3-5 years old; 86% English-speaking Hispanic and 14% European American) who were enrolled in Head Start programs. The PLS-4 items were analyzed for differential item functioning (DIF) using logistic regression and…

  14. Testing item response theory invariance of the standardized Quality-of-life Disease Impact Scale (QDIS(®)) in acute coronary syndrome patients: differential functioning of items and test.

    PubMed

    Deng, Nina; Anatchkova, Milena D; Waring, Molly E; Han, Kyung T; Ware, John E

    2015-08-01

    The Quality-of-life (QOL) Disease Impact Scale (QDIS(®)) standardizes the content and scoring of QOL impact attributed to different diseases using item response theory (IRT). This study examined the IRT invariance of the QDIS-standardized IRT parameters in an independent sample. The differential functioning of items and test (DFIT) of a static short-form (QDIS-7) was examined across two independent sources: patients hospitalized for acute coronary syndrome (ACS) in the TRACE-CORE study (N = 1,544) and chronically ill US adults in the QDIS standardization sample. "ACS-specific" IRT item parameters were calibrated and linearly transformed to compare to "standardized" IRT item parameters. Differences in IRT model-expected item, scale and theta scores were examined. The DFIT results were also compared in a standard logistic regression differential item functioning analysis. Item parameters estimated in the ACS sample showed lower discrimination parameters than the standardized discrimination parameters, but only small differences were found for thresholds parameters. In DFIT, results on the non-compensatory differential item functioning index (range 0.005-0.074) were all below the threshold of 0.096. Item differences were further canceled out at the scale level. IRT-based theta scores for ACS patients using standardized and ACS-specific item parameters were highly correlated (r = 0.995, root-mean-square difference = 0.09). Using standardized item parameters, ACS patients scored one-half standard deviation higher (indicating greater QOL impact) compared to chronically ill adults in the standardization sample. The study showed sufficient IRT invariance to warrant the use of standardized IRT scoring of QDIS-7 for studies comparing the QOL impact attributed to acute coronary disease and other chronic conditions.

  15. Developmentally sensitive markers of personality functioning in adolescents: Age-specific and age-neutral expressions.

    PubMed

    Debast, Inge; Rossi, Gina; Feenstra, Dineke; Hutsebaut, Joost

    2017-04-01

    Criterion D of the Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM-5 ; American Psychiatric Association [APA], 2013) refers to a possible onset of personality disorders (PDs) in adolescence and in Section II the development/course in adolescence is described by some typical characteristics for several PDs. Yet, age-specific expressions of PDs are lacking in Section III. We urgently need a developmentally sensitive assessment instrument that differentiates developmental and contextual changes on the one hand from expressions of personality pathology on the other hand. Therefore we investigated which items of the Severity Indices for Personality Problems-118 (SIPP-118) were developmentally sensitive throughout adolescence and adulthood and which could be considered more age-specific markers requiring other content or thresholds over age groups. Applying item response theory (IRT) we detected differential item functioning (DIF) in 36% of the items in matched samples of 639 adolescents versus 639 adults. The DIF across age groups mainly reflected a different degree of symptom expressions for the same underlying level of functioning. The threshold for exhibiting symptoms given a certain degree of personality dysfunction was lower in adolescence for areas of personality functioning related to the Self and Interpersonal domains. Some items also measured a latent construct of personality functioning differently across adolescents and adults. This suggests that several facets of the SIPP-118 do not solely measure aspects of personality pathology in adolescents, but likely include more developmental issues. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  16. Effect Size Measures for Differential Item Functioning in a Multidimensional IRT Model

    ERIC Educational Resources Information Center

    Suh, Youngsuk

    2016-01-01

    This study adapted an effect size measure used for studying differential item functioning (DIF) in unidimensional tests and extended the measure to multidimensional tests. Two effect size measures were considered in a multidimensional item response theory model: signed weighted P-difference and unsigned weighted P-difference. The performance of…

  17. Explaining Crossing DIF in Polytomous Items Using Differential Step Functioning Effects

    ERIC Educational Resources Information Center

    Penfield, Randall D.

    2010-01-01

    Crossing, or intersecting, differential item functioning (DIF) is a form of nonuniform DIF that exists when the sign of the between-group difference in expected item performance changes across the latent trait continuum. The presence of crossing DIF presents a problem for many statistics developed for evaluating DIF because positive and negative…

  18. Testing for Differential Item Functioning with Measures of Partial Association

    ERIC Educational Resources Information Center

    Woods, Carol M.

    2009-01-01

    Differential item functioning (DIF) occurs when an item on a test or questionnaire has different measurement properties for one group of people versus another, irrespective of mean differences on the construct. There are many methods available for DIF assessment. The present article is focused on indices of partial association. A family of average…

  19. Testing for Nonuniform Differential Item Functioning with Multiple Indicator Multiple Cause Models

    ERIC Educational Resources Information Center

    Woods, Carol M.; Grimm, Kevin J.

    2011-01-01

    In extant literature, multiple indicator multiple cause (MIMIC) models have been presented for identifying items that display uniform differential item functioning (DIF) only, not nonuniform DIF. This article addresses, for apparently the first time, the use of MIMIC models for testing both uniform and nonuniform DIF with categorical indicators. A…

  20. Staff Differentiation. An Annotated Bibliography.

    ERIC Educational Resources Information Center

    Marin County Superintendent of Schools, Corte Madera, CA.

    This annotated bibliography reviews selected literature focusing on the concept of staff differentiation. Included are 62 items (dated 1966-1970), along with a list of mailing addresses where copies of individual items can be obtained. Also a list of 31 staff differentiation projects receiving financial assistance from the U.S. Office of Education…

  1. Evaluating and Refining the Construct of Sexual Quality With Item Response Theory: Development of the Quality of Sex Inventory.

    PubMed

    Shaw, Amanda M; Rogge, Ronald D

    2016-02-01

    This study took a critical look at the construct of sexual quality. The 65 items of four well-validated self-report measures of sexual satisfaction (the Index of Sexual Satisfaction [ISS], Hudson, Harrison, & Crosscup, 1981; the Global Measure of Sexual Satisfaction [GMSEX], Lawrance & Byers, 1995; the Pinney Sexual Satisfaction Inventory [PSSI], Pinney, Gerrard, & Denney, 1987; the Young Sexual Satisfaction Scale [YSSS], Young, Denny, Luquis, & Young, 1998) and an additional 74 potential sexual quality items were given to 3060 online participants. Using Item Response Theory (IRT), we demonstrated that the ISS, YSSS, and PSSI scales provided suboptimal levels of precision in assessing sexual quality, particularly given the length of those scales. Exploratory factor analyses, IRT, differential item functioning analyses, and longitudinal responsiveness analyses were used to develop and evaluate the Quality of Sex Inventory. Results suggested that, in comparison to existing scales, the QSI (1) offers investigators and clinicians more theoretically focused scales, (2) distinguishes sexual satisfaction from sexual dissatisfaction, and (3) offers greater precision and power for detecting differences with (4) comparably high levels of responsiveness for detecting change over time despite being notably shorter than most of the existing scales. The QSI-satisfaction subscales demonstrated strong convergent validity with other measures of sexual satisfaction and excellent construct validity with anchor scales from the nomological net surrounding that construct, suggesting that they continue to assess the same theoretical construct as prior scales. Implications for research are discussed.

  2. Parent Ratings of ADHD Symptoms: Generalized Partial Credit Model Analysis of Differential Item Functioning across Gender

    ERIC Educational Resources Information Center

    Gomez, Rapson

    2012-01-01

    Objective: Generalized partial credit model, which is based on item response theory (IRT), was used to test differential item functioning (DIF) for the "Diagnostic and Statistical Manual of Mental Disorders" (4th ed.), inattention (IA), and hyperactivity/impulsivity (HI) symptoms across boys and girls. Method: To accomplish this, parents completed…

  3. Effects of Differential Item Functioning on Examinees' Test Performance and Reliability of Test

    ERIC Educational Resources Information Center

    Lee, Yi-Hsuan; Zhang, Jinming

    2017-01-01

    Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…

  4. Assessment of Differential Item Functioning under Cognitive Diagnosis Models: The DINA Model Example

    ERIC Educational Resources Information Center

    Li, Xiaomin; Wang, Wen-Chung

    2015-01-01

    The assessment of differential item functioning (DIF) is routinely conducted to ensure test fairness and validity. Although many DIF assessment methods have been developed in the context of classical test theory and item response theory, they are not applicable for cognitive diagnosis models (CDMs), as the underlying latent attributes of CDMs are…

  5. Standard Errors for National Trends in International Large-Scale Assessments in the Case of Cross-National Differential Item Functioning

    ERIC Educational Resources Information Center

    Sachse, Karoline A.; Haag, Nicole

    2017-01-01

    Standard errors computed according to the operational practices of international large-scale assessment studies such as the Programme for International Student Assessment's (PISA) or the Trends in International Mathematics and Science Study (TIMSS) may be biased when cross-national differential item functioning (DIF) and item parameter drift are…

  6. Checking Equity: Why Differential Item Functioning Analysis Should Be a Routine Part of Developing Conceptual Assessments

    ERIC Educational Resources Information Center

    Martinková, Patricia; Drabinová, Adéla; Liaw, Yuan-Ling; Sanders, Elizabeth A.; McFarland, Jenny L.; Price, Rebecca M.

    2017-01-01

    We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments. After explaining a number of methodological approaches, we test for gender bias in two scenarios that demonstrate why DIF analysis is crucial for developing assessments, particularly because…

  7. Understanding Differential Item Performance as a Consequence of Gender Differences in Academic Background.

    ERIC Educational Resources Information Center

    Doolittle, Allen E.

    Differential item performance (DIP) is discussed as a concept that does not necessarily imply item bias or unfairness to subgroups of examinees. With curriculum-based achievement tests, DIP is presented as a valid reflection of group differences in requisite skills and instruction. Using data from a national testing of the ACT Assessment, this…

  8. Accuracy of self-report in detecting taste dysfunction.

    PubMed

    Soter, Ana; Kim, John; Jackman, Alexis; Tourbier, Isabelle; Kaul, Arti; Doty, Richard L

    2008-04-01

    To determine the sensitivity, specificity, and positive and negative predictive value of responses to the following questionnaire statements in detecting taste loss: "I can detect salt in chips, pretzels, or salted nuts," "I can detect sourness in vinegar, pickles, or lemon," "I can detect sweetness in soda, cookies, or ice cream," and "I can detect bitterness, in coffee, beer, or tonic water." Responses to an additional item, "I can detect chocolate in cocoa, cake or candy," was examined to determine whether patients clearly differentiate between taste loss and flavor loss secondary to olfactory dysfunction. A total of 469 patients (207 men, mean age = 54 years, standard deviation = 15 years; and 262 women, mean age = 54 years, standard deviation = 14 years) were administered a questionnaire containing these questions with the response categories of "easily," "somewhat," and "not at all," followed by a comprehensive taste and smell test battery. The questionnaire items poorly detected bona fide taste problems. However, they were sensitive in detecting persons without such problems (i.e., they exhibited low positive but high negative predictive value). Dysfunction categories of the University of Pennsylvania Smell Identification Test (UPSIT) were not meaningfully related to subjects' responses to the questionnaire statements. Both sex and age influenced performance on most of the taste tests, with older persons performing more poorly than younger ones and women typically outperforming men. Although it is commonly assumed that straight-forward questions concerning taste may be useful in detecting taste disorders, this study suggests this is not the case. However, patients who specifically report having no problems with taste perception usually do not exhibit taste dysfunction. The difficulty in detecting true taste problems by focused questionnaire items likely reflects a combination of factors. These include the relatively low prevalence of taste deficits in the general population and the tendency of patients to confuse loss of olfaction-related flavor sensations with taste-bud mediated deficits.

  9. Exploring Differential Bundle Functioning in Mathematics by Gender: The Effect of Hierarchical Modelling

    ERIC Educational Resources Information Center

    Ong, Yoke Mooi; Williams, Julian; Lamprianou, Iasonas

    2013-01-01

    Researchers interested in exploring substantive group differences are increasingly attending to bundles of items (or testlets): the aim is to understand how gender differences, for instance, are explained by differential performances on different types or bundles of items, hence differential bundle functioning (DBF). Some previous work has…

  10. Rasch Mixture Models for DIF Detection

    PubMed Central

    Strobl, Carolin; Zeileis, Achim

    2014-01-01

    Rasch mixture models can be a useful tool when checking the assumption of measurement invariance for a single Rasch model. They provide advantages compared to manifest differential item functioning (DIF) tests when the DIF groups are only weakly correlated with the manifest covariates available. Unlike in single Rasch models, estimation of Rasch mixture models is sensitive to the specification of the ability distribution even when the conditional maximum likelihood approach is used. It is demonstrated in a simulation study how differences in ability can influence the latent classes of a Rasch mixture model. If the aim is only DIF detection, it is not of interest to uncover such ability differences as one is only interested in a latent group structure regarding the item difficulties. To avoid any confounding effect of ability differences (or impact), a new score distribution for the Rasch mixture model is introduced here. It ensures the estimation of the Rasch mixture model to be independent of the ability distribution and thus restricts the mixture to be sensitive to latent structure in the item difficulties only. Its usefulness is demonstrated in a simulation study, and its application is illustrated in a study of verbal aggression. PMID:29795819

  11. A Monte Carlo Study Investigating Missing Data, Differential Item Functioning, and Effect Size

    ERIC Educational Resources Information Center

    Garrett, Phyllis

    2009-01-01

    The use of polytomous items in assessments has increased over the years, and as a result, the validity of these assessments has been a concern. Differential item functioning (DIF) and missing data are two factors that may adversely affect assessment validity. Both factors have been studied separately, but DIF and missing data are likely to occur…

  12. Differential Item Functioning Analysis of the Mental, Emotional, and Bodily Toughness Inventory

    ERIC Educational Resources Information Center

    Gao, Yong; Mack, Mick G.; Ragan, Moira A.; Ragan, Brian

    2012-01-01

    In this study the authors used differential item functioning analysis to examine if there were items in the Mental, Emotional, and Bodily Toughness Inventory functioning differently across gender and athletic membership. A total of 444 male (56.3%) and female (43.7%) participants (30.9% athletes and 69.1% non-athletes) responded to the Mental,…

  13. Investigating Linguistic Sources of Differential Item Functioning Using Expert Think-Aloud Protocols in Science Achievement Tests

    NASA Astrophysics Data System (ADS)

    Roth, Wolff-Michael; Oliveri, Maria Elena; Dallie Sandilands, Debra; Lyons-Thomas, Juliette; Ercikan, Kadriye

    2013-03-01

    Even if national and international assessments are designed to be comparable, subsequent psychometric analyses often reveal differential item functioning (DIF). Central to achieving comparability is to examine the presence of DIF, and if DIF is found, to investigate its sources to ensure differentially functioning items that do not lead to bias. In this study, sources of DIF were examined using think-aloud protocols. The think-aloud protocols of expert reviewers were conducted for comparing the English and French versions of 40 items previously identified as DIF (N = 20) and non-DIF (N = 20). Three highly trained and experienced experts in verifying and accepting/rejecting multi-lingual versions of curriculum and testing materials for government purposes participated in this study. Although there is a considerable amount of agreement in the identification of differentially functioning items, experts do not consistently identify and distinguish DIF and non-DIF items. Our analyses of the think-aloud protocols identified particular linguistic, general pedagogical, content-related, and cognitive factors related to sources of DIF. Implications are provided for the process of arriving at the identification of DIF, prior to the actual administration of tests at national and international levels.

  14. Differential item functioning magnitude and impact measures from item response theory models.

    PubMed

    Kleinman, Marjorie; Teresi, Jeanne A

    2016-01-01

    Measures of magnitude and impact of differential item functioning (DIF) at the item and scale level, respectively are presented and reviewed in this paper. Most measures are based on item response theory models. Magnitude refers to item level effect sizes, whereas impact refers to differences between groups at the scale score level. Reviewed are magnitude measures based on group differences in the expected item scores and impact measures based on differences in the expected scale scores. The similarities among these indices are demonstrated. Various software packages are described that provide magnitude and impact measures, and new software presented that computes all of the available statistics conveniently in one program with explanations of their relationships to one another.

  15. Adjusting for cross-cultural differences in computer-adaptive tests of quality of life.

    PubMed

    Gibbons, C J; Skevington, S M

    2018-04-01

    Previous studies using the WHOQOL measures have demonstrated that the relationship between individual items and the underlying quality of life (QoL) construct may differ between cultures. If unaccounted for, these differing relationships can lead to measurement bias which, in turn, can undermine the reliability of results. We used item response theory (IRT) to assess differential item functioning (DIF) in WHOQOL data from diverse language versions collected in UK, Zimbabwe, Russia, and India (total N = 1332). Data were fitted to the partial credit 'Rasch' model. We used four item banks previously derived from the WHOQOL-100 measure, which provided excellent measurement for physical, psychological, social, and environmental quality of life domains (40 items overall). Cross-cultural differential item functioning was assessed using analysis of variance for item residuals and post hoc Tukey tests. Simulated computer-adaptive tests (CATs) were conducted to assess the efficiency and precision of the four items banks. Splitting item parameters by DIF results in four linked item banks without DIF or other breaches of IRT model assumptions. Simulated CATs were more precise and efficient than longer paper-based alternatives. Assessing differential item functioning using item response theory can identify measurement invariance between cultures which, if uncontrolled, may undermine accurate comparisons in computer-adaptive testing assessments of QoL. We demonstrate how compensating for DIF using item anchoring allowed data from all four countries to be compared on a common metric, thus facilitating assessments which were both sensitive to cultural nuance and comparable between countries.

  16. Differential item functioning of the patient-reported outcomes information system (PROMIS®) pain interference item bank by language (Spanish versus English).

    PubMed

    Paz, Sylvia H; Spritzer, Karen L; Reise, Steven P; Hays, Ron D

    2017-06-01

    About 70% of Latinos, 5 years old or older, in the United States speak Spanish at home. Measurement equivalence of the PROMIS ® pain interference (PI) item bank by language of administration (English versus Spanish) has not been evaluated. A sample of 527 adult Spanish-speaking Latinos completed the Spanish version of the 41-item PROMIS ® pain interference item bank. We evaluate dimensionality, monotonicity and local independence of the Spanish-language items. Then we evaluate differential item functioning (DIF) using ordinal logistic regression with item response theory scores estimated from DIF-free "anchor" items. One of the 41 items in the Spanish version of the PROMIS ® PI item bank was identified as having significant uniform DIF. English- and Spanish-speaking subjects with the same level of pain interference responded differently to 1 of the 41 items in the PROMIS ® PI item bank. This item was not retained due to proprietary issues. The original English language item parameters can be used when estimating PROMIS ® PI scores.

  17. Differential item functioning in the Cambridge Mental Disorders in the Elderly (CAMDEX) Depression Scale across middle age and late life.

    PubMed

    Estabrook, Ryne; Sadler, Michael E; McGue, Matt

    2015-12-01

    A long-standing and critical problem in the study of aging and depression is the comparability of measurement across age groups. While psychological measures of depression typically show increased incidence of symptoms with increasing age, rates of depression diagnosis do not show the same age trend. This analysis presents tests of differential item functioning on the depression section of the CAMDEX interview schedule, using factor analysis-derived affective and somatic subscales (McGue & Christensen, 1997). Results for the affective subscale show significant differences in item functioning in the majority of the affective items as a function of age (items "Happy Life," "Lonely," "Nervous" "Worthless," and "Future": χ6(2) = [30.193, 255.971] across items, all p < .0001). Analyses for the somatic subscale show differential item functioning is limited to a single item relating to coping (χ6(2) = 180.754, p < .0001). These results indicate that differences in depression symptoms across age groups are not entirely consistent with a unidimensional depression trait, and that the measurement structure of depression varies over the life span. (c) 2015 APA, all rights reserved).

  18. The English version of the four-dimensional symptom questionnaire (4DSQ) measures the same as the original Dutch questionnaire: a validation study.

    PubMed

    Terluin, Berend; Smits, Niels; Miedema, Baukje

    2014-12-01

    Translations of questionnaires need to be carefully validated to assure that the translation measures the same construct(s) as the original questionnaire. The four-dimensional symptom questionnaire (4DSQ) is a Dutch self-report questionnaire measuring distress, depression, anxiety and somatization. To evaluate the equivalence of the English version of the 4DSQ. 4DSQ data of English and Dutch speaking general practice attendees were analysed and compared. The English speaking group consisted of 205 attendees, aged 18-64 years, in general practice, in Canada whereas the Dutch group consisted of 302 general practice attendees in the Netherlands. Differential item functioning (DIF) analysis was conducted using the Mantel-Haenszel method and ordinal logistic regression. Differential test functioning (DTF; i.e., the scale impact of DIF) was evaluated using linear regression analysis. DIF was detected in 2/16 distress items, 2/6 depression items, 2/12 anxiety items, and 1/16 somatization items. With respect to mean scale scores, the impact of DIF on the scale level was negligible for all scales. On the anxiety scale DIF caused the English speaking patients with moderate to severe anxiety to score about one point lower than Dutch patients with the same anxiety level. The English 4DSQ measures the same constructs like the original Dutch 4DSQ. The distress, depression and somatization scales can employ the same cut-off points as the corresponding Dutch scales. However, cut-off points of the English 4DSQ anxiety scale should be lowered by one point to retain the same meaning as the Dutch anxiety cut-off points.

  19. Analysis of Bilingual Children’s Performance on the English and Spanish Versions of the Woodcock-Muñoz Language Survey-R (WMLS-R)

    PubMed Central

    Sandilos, Lia E.; Lewis, Kandia; Komaroff, Eugene; Hammer, Carol Scheffner; Scarpino, Shelley E.; Lopez, Lisa; Rodriguez, Barbara; Goldstein, Brian

    2015-01-01

    The purpose of this study was to investigate the way in which items on the Woodcock-Muñoz Language Survey Revised (WMLS-R) Spanish and English versions function for bilingual children from different ethnic subgroups who speak different dialects of Spanish. Using data from a sample of 324 bilingual Hispanic families and their children living on the United States mainland, differential item functioning (DIF) was conducted to determine if test items in English and Spanish functioned differently for Mexican, Cuban, and Puerto Rican bilingual children. Data on child and parent language characteristics and children’s scores on Picture Vocabulary and Story Recall subtests in English and Spanish were collected. DIF was not detected for items on the Spanish subtests. Results revealed that some items on English subtests displayed statistically and practically significant DIF. The findings indicate that there are differences in the difficulty level of WMLS-R English-form test items depending on the examinees’ ethnic subgroup membership. This outcome suggests that test developers need to be mindful of potential differences in performance based on ethnic subgroup and dialect when developing standardized language assessments that may be administered to bilingual students. PMID:26705400

  20. Application of Think Aloud Protocols for Examining and Confirming Sources of Differential Item Functioning Identified by Expert Reviews

    ERIC Educational Resources Information Center

    Ercikan, Kadriye; Arim, Rubab; Law, Danielle; Domene, Jose; Gagnon, France; Lacroix, Serge

    2010-01-01

    This paper demonstrates and discusses the use of think aloud protocols (TAPs) as an approach for examining and confirming sources of differential item functioning (DIF). The TAPs are used to investigate to what extent surface characteristics of the items that are identified by expert reviews as sources of DIF are supported by empirical evidence…

  1. Do Items that Measure Self-Perceived Physical Appearance Function Differentially across Gender Groups? An Application of the MACS Model

    ERIC Educational Resources Information Center

    Gonzalez-Roma, Vicente; Tomas, Ines; Ferreres, Doris; Hernandez, Ana

    2005-01-01

    The aims of this study were to investigate whether the 6 items of the Physical Appearance Scale (Marsh, Richards, Johnson, Roche, & Tremayne, 1994) show differential item functioning (DIF) across gender groups of adolescents, and to show how this can be done using the multigroup mean and covariance structure (MG-MACS) analysis model. Two samples…

  2. A Comparison of Methods for Estimating Conditional Item Score Differences in Differential Item Functioning (DIF) Assessments. Research Report. ETS RR-10-15

    ERIC Educational Resources Information Center

    Moses, Tim; Miao, Jing; Dorans, Neil

    2010-01-01

    This study compared the accuracies of four differential item functioning (DIF) estimation methods, where each method makes use of only one of the following: raw data, logistic regression, loglinear models, or kernel smoothing. The major focus was on the estimation strategies' potential for estimating score-level, conditional DIF. A secondary focus…

  3. Age-related Differential Item Functioning for the Patient-Reported Outcomes Information System (PROMIS®) Physical Functioning Items.

    PubMed

    Paz, Sylvia H; Spritzer, Karen L; Morales, Leo S; Hays, Ron D

    2013-03-29

    To evaluate the equivalence of the PROMIS® wave 1 physical functioning item bank, by age (50 years or older versus 18-49). A total of 114 physical functioning items with 5 response choices were administered to English- (n=1504) and Spanish-language (n=640) adults. Item frequencies, means and standard deviations, item-scale correlations, and internal consistency reliability were estimated. Differential Item Functioning (DIF) by age was evaluated. Thirty of the 114 items were fagged for DIF based on an R-squared of 0.02 or above criterion. The expected total score was higher for those respondents who were 18-49 than those who were 50 or older. Those who were 50 years or older versus 18-49 years old with the same level of physical functioning responded differently to 30 of the 114 items in the PROMIS® physical functioning item bank. This study yields essential information about the equivalence of the physical functioning items in older versus younger individuals.

  4. Rasch analysis of the carers quality of life questionnaire for parkinsonism.

    PubMed

    Pillas, Marios; Selai, Caroline; Schrag, Anette

    2017-03-01

    To assess the psychometric properties of the Carers Quality of Life Questionnaire for Parkinsonism using a Rasch modeling approach and determine the optimal cut-off score. We performed a Rasch analysis of the survey answers of 430 carers of patients with atypical parkinsonism. All of the scale items demonstrated acceptable goodness of fit to the Rasch model. The scale was unidimensional and no notable differential item functioning was detected in the items regarding age and disease type. Rating categories were functioning adequately in all scale items. The scale had high reliability (.95) and construct validity and a high degree of precision, distinguishing between 5 distinct groups of carers with different levels of quality of life. A cut-off score of 62 was found to have the optimal screening accuracy based on Hospital Anxiety and Depression Scale subscores. The results suggest that the Carers Quality of Life Questionnaire for Parkinsonism is a useful scale to assess carers' quality of life and allows analyses requiring interval scaling of variables. © 2016 International Parkinson and Movement Disorder Society. © 2016 International Parkinson and Movement Disorder Society.

  5. Thyroid-specific questions on work ability showed known-groups validity among Danes with thyroid diseases.

    PubMed

    Nexo, Mette Andersen; Watt, Torquil; Bonnema, Steen Joop; Hegedüs, Laszlo; Rasmussen, Åse Krogh; Feldt-Rasmussen, Ulla; Bjorner, Jakob Bue

    2015-07-01

    We aimed to identify the best approach to work ability assessment in patients with thyroid disease by evaluating the factor structure, measurement equivalence, known-groups validity, and predictive validity of a broad set of work ability items. Based on the literature and interviews with thyroid patients, 24 work ability items were selected from previous questionnaires, revised, or developed anew. Items were tested among 632 patients with thyroid disease (non-toxic goiter, toxic nodular goiter, Graves' disease (with or without orbitopathy), autoimmune hypothyroidism, and other thyroid diseases), 391 of which had participated in a study 5 years previously. Responses to select items were compared to general population data. We used confirmatory factor analyses for categorical data, logistic regression analyses and tests of differential item function, and head-to-head comparisons of relative validity in distinguishing known groups. Although all work ability items loaded on a common factor, the optimal factor solution included five factors: role physical, role emotional, thyroid-specific limitations, work limitations (without disease attribution), and work performance. The scale on thyroid-specific limitations showed the most power in distinguishing clinical groups and time since diagnosis. A global single item proved useful for comparisons with the general population, and a thyroid-specific item predicted labor market exclusion within the next 5 years (OR 5.0, 95 % CI 2.7-9.1). Items on work limitations with attribution to thyroid disease were most effective in detecting impact on work ability and showed good predictive validity. Generic work ability items remain useful for general population comparisons.

  6. Neural Differentiation of Incorrectly Predicted Memories.

    PubMed

    Kim, Ghootae; Norman, Kenneth A; Turk-Browne, Nicholas B

    2017-02-22

    When an item is predicted in a particular context but the prediction is violated, memory for that item is weakened (Kim et al., 2014). Here, we explore what happens when such previously mispredicted items are later reencountered. According to prior neural network simulations, this sequence of events-misprediction and subsequent restudy-should lead to differentiation of the item's neural representation from the previous context (on which the misprediction was based). Specifically, misprediction weakens connections in the representation to features shared with the previous context and restudy allows new features to be incorporated into the representation that are not shared with the previous context. This cycle of misprediction and restudy should have the net effect of moving the item's neural representation away from the neural representation of the previous context. We tested this hypothesis using human fMRI by tracking changes in item-specific BOLD activity patterns in the hippocampus, a key structure for representing memories and generating predictions. In left CA2/3/DG, we found greater neural differentiation for items that were repeatedly mispredicted and restudied compared with items from a control condition that was identical except without misprediction. We also measured prediction strength in a trial-by-trial fashion and found that greater misprediction for an item led to more differentiation, further supporting our hypothesis. Therefore, the consequences of prediction error go beyond memory weakening. If the mispredicted item is restudied, the brain adaptively differentiates its memory representation to improve the accuracy of subsequent predictions and to shield it from further weakening. SIGNIFICANCE STATEMENT Competition between overlapping memories leads to weakening of nontarget memories over time, making it easier to access target memories. However, a nontarget memory in one context might become a target memory in another context. How do such memories get restrengthened without increasing competition again? Computational models suggest that the brain handles this by reducing neural connections to the previous context and adding connections to new features that were not part of the previous context. The result is neural differentiation away from the previous context. Here, we provide support for this theory, using fMRI to track neural representations of individual memories in the hippocampus and how they change based on learning. Copyright © 2017 the authors 0270-6474/17/372022-10$15.00/0.

  7. Evaluating construct validity of the second version of the Copenhagen Psychosocial Questionnaire through analysis of differential item functioning and differential item effect.

    PubMed

    Bjorner, Jakob Bue; Pejtersen, Jan Hyld

    2010-02-01

    To evaluate the construct validity of the Copenhagen Psychosocial Questionnaire II (COPSOQ II) by means of tests for differential item functioning (DIF) and differential item effect (DIE). We used a Danish general population postal survey (n = 4,732 with 3,517 wage earners) with a one-year register based follow up for long-term sickness absence. DIF was evaluated against age, gender, education, social class, public/private sector employment, and job type using ordinal logistic regression. DIE was evaluated against job satisfaction and self-rated health (using ordinal logistic regression), against depressive symptoms, burnout, and stress (using multiple linear regression), and against long-term sick leave (using a proportional hazards model). We used a cross-validation approach to counter the risk of significant results due to multiple testing. Out of 1,052 tests, we found 599 significant instances of DIF/DIE, 69 of which showed both practical and statistical significance across two independent samples. Most DIF occurred for job type (in 20 cases), while we found little DIF for age, gender, education, social class and sector. DIE seemed to pertain to particular items, which showed DIE in the same direction for several outcome variables. The results allowed a preliminary identification of items that have a positive impact on construct validity and items that have negative impact on construct validity. These results can be used to develop better shortform measures and to improve the conceptual framework, items and scales of the COPSOQ II. We conclude that tests of DIF and DIE are useful for evaluating construct validity.

  8. It Might Not Make a Big DIF: Improved Differential Test Functioning Statistics That Account for Sampling Variability

    ERIC Educational Resources Information Center

    Chalmers, R. Philip; Counsell, Alyssa; Flora, David B.

    2016-01-01

    Differential test functioning, or DTF, occurs when one or more items in a test demonstrate differential item functioning (DIF) and the aggregate of these effects are witnessed at the test level. In many applications, DTF can be more important than DIF when the overall effects of DIF at the test level can be quantified. However, optimal statistical…

  9. Why Consumers Misattribute Sponsorships to Non-Sponsor Brands: Differential Roles of Item and Relational Communications.

    PubMed

    Weeks, Clinton S; Humphreys, Michael S; Cornwell, T Bettina

    2018-02-01

    Brands engaged in sponsorship of events commonly have objectives that depend on consumer memory for the sponsor-event relationship (e.g., sponsorship awareness). Consumers however, often misattribute sponsorships to nonsponsor competitor brands, indicating erroneous memory for these relationships. The current research uses an item and relational memory framework to reveal sponsor brands may inadvertently foster this misattribution when they communicate relational linkages to events. Effects can be explained via differential roles of communicating item information (information that supports processing item distinctiveness) versus relational information (information that supports processing relationships among items) in contributing to memory outcomes. Experiment 1 uses event-cued brand recall to show that correct memory retrieval is best supported by communicating relational information when sponsorship relationships are not obvious (low congruence). In contrast, correct retrieval is best supported by communicating item information when relationships are obvious (high congruence). Experiment 2 uses brand-cued event recall to show that, against conventional marketing recommendations, relational information increases misattribution, whereas item information guards against misattribution. Results suggest sponsor brands must distinguish between item and relational communications to enhance correct retrieval and limit misattribution. Methodologically, the work shows that choice of cueing direction is critical in differentially revealing patterns of correct and incorrect retrieval with pair relationships. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  10. Rasch validation of the Arabic version of the lower extremity functional scale.

    PubMed

    Alnahdi, Ali H

    2018-02-01

    The purpose of this study was to examine the internal construct validity of the Arabic version of the Lower Extremity Functional Scale (20-item Arabic LEFS) using Rasch analysis. Patients (n = 170) with lower extremity musculoskeletal dysfunction were recruited. Rasch analysis of 20-item Arabic LEFS was performed. Once the initial Rasch analysis indicated that the 20-item Arabic LEFS did not fit the Rasch model, follow-up analyses were conducted to improve the fit of the scale to the Rasch measurement model. These modifications included removing misfitting individuals, changing item scoring structure, removing misfitting items, addressing bias caused by response dependency between items and differential item functioning (DIF). Initial analysis indicated deviation of the 20-item Arabic LEFS from the Rasch model. Disordered thresholds in eight items and response dependency between six items were detected with the scale as a whole did not meet the requirement of unidimensionality. Refinements led to a 15-item Arabic LEFS that demonstrated excellent internal consistency (person separation index [PSI] = 0.92) and satisfied all the requirement of the Rasch model. Rasch analysis did not support the 20-item Arabic LEFS as a unidimensional measure of lower extremity function. The refined 15-item Arabic LEFS met all the requirement of the Rasch model and hence is a valid objective measure of lower extremity function. The Rasch-validated 15-item Arabic LEFS needs to be further tested in an independent sample to confirm its fit to the Rasch measurement model. Implications for Rehabilitation The validity of the 20-item Arabic Lower Extremity Functional Scale to measure lower extremity function is not supported. The 15-item Arabic version of the LEFS is a valid measure of lower extremity function and can be used to quantify lower extremity function in patients with lower extremity musculoskeletal disorders.

  11. A Multilevel Assessment of Differential Item Functioning.

    ERIC Educational Resources Information Center

    Shen, Linjun

    A multilevel approach was proposed for the assessment of differential item functioning and compared with the traditional logistic regression approach. Data from the Comprehensive Osteopathic Medical Licensing Examination for 2,300 freshman osteopathic medical students were analyzed. The multilevel approach used three-level hierarchical generalized…

  12. A Review of ETS Differential Item Functioning Assessment Procedures: Flagging Rules, Minimum Sample Size Requirements, and Criterion Refinement. Research Report. ETS RR-12-08

    ERIC Educational Resources Information Center

    Zwick, Rebecca

    2012-01-01

    Differential item functioning (DIF) analysis is a key component in the evaluation of the fairness and validity of educational tests. The goal of this project was to review the status of ETS DIF analysis procedures, focusing on three aspects: (a) the nature and stringency of the statistical rules used to flag items, (b) the minimum sample size…

  13. Which Statistic Should Be Used to Detect Item Preknowledge When the Set of Compromised Items Is Known?

    PubMed

    Sinharay, Sandip

    2017-09-01

    Benefiting from item preknowledge is a major type of fraudulent behavior during educational assessments. Belov suggested the posterior shift statistic for detection of item preknowledge and showed its performance to be better on average than that of seven other statistics for detection of item preknowledge for a known set of compromised items. Sinharay suggested a statistic based on the likelihood ratio test for detection of item preknowledge; the advantage of the statistic is that its null distribution is known. Results from simulated and real data and adaptive and nonadaptive tests are used to demonstrate that the Type I error rate and power of the statistic based on the likelihood ratio test are very similar to those of the posterior shift statistic. Thus, the statistic based on the likelihood ratio test appears promising in detecting item preknowledge when the set of compromised items is known.

  14. An NCME Instructional Module on Latent DIF Analysis Using Mixture Item Response Models

    ERIC Educational Resources Information Center

    Cho, Sun-Joo; Suh, Youngsuk; Lee, Woo-yeol

    2016-01-01

    The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called…

  15. Incidental learning of probability information is differentially affected by the type of visual working memory representation.

    PubMed

    van Lamsweerde, Amanda E; Beck, Melissa R

    2015-12-01

    In this study, we investigated whether the ability to learn probability information is affected by the type of representation held in visual working memory. Across 4 experiments, participants detected changes to displays of coloured shapes. While participants detected changes in 1 dimension (e.g., colour), a feature from a second, nonchanging dimension (e.g., shape) predicted which object was most likely to change. In Experiments 1 and 3, items could be grouped by similarity in the changing dimension across items (e.g., colours and shapes were repeated in the display), while in Experiments 2 and 4 items could not be grouped by similarity (all features were unique). Probability information from the predictive dimension was learned and used to increase performance, but only when all of the features within a display were unique (Experiments 2 and 4). When it was possible to group by feature similarity in the changing dimension (e.g., 2 blue objects appeared within an array), participants were unable to learn probability information and use it to improve performance (Experiments 1 and 3). The results suggest that probability information can be learned in a dimension that is not explicitly task-relevant, but only when the probability information is represented with the changing dimension in visual working memory. (c) 2015 APA, all rights reserved).

  16. Gender-Related Differential Item Functioning on a Middle-School Mathematics Performance Assessment.

    ERIC Educational Resources Information Center

    Lane, Suzanne; And Others

    This study examined gender-related differential item functioning (DIF) using a mathematics performance assessment, the QUASAR Cognitive Assessment Instrument (QCAI), administered to middle school students. The QCAI was developed for the Quantitative Understanding: Amplifying Student Achievement and Reading (QUASAR) project, which focuses on…

  17. Using Mixed Methods to Interpret Differential Item Functioning

    ERIC Educational Resources Information Center

    Benítez, Isabel; Padilla, José-Luis; Hidalgo Montesinos, María Dolores; Sireci, Stephen G.

    2016-01-01

    Analysis of differential item functioning (DIF) is often used to determine if cross-lingual assessments are equivalent across languages. However, evidence on the causes of cross-lingual DIF is still evasive. Expert appraisal is a qualitative method useful for obtaining detailed information about problematic elements in the different linguistic…

  18. Item Response Theory Using Hierarchical Generalized Linear Models

    ERIC Educational Resources Information Center

    Ravand, Hamdollah

    2015-01-01

    Multilevel models (MLMs) are flexible in that they can be employed to obtain item and person parameters, test for differential item functioning (DIF) and capture both local item and person dependence. Papers on the MLM analysis of item response data have focused mostly on theoretical issues where applications have been add-ons to simulation…

  19. A psychometric evaluation of the four-item version of the Control Attitudes Scale for patients with cardiac disease and their partners.

    PubMed

    Årestedt, Kristofer; Ågren, Susanna; Flemme, Inger; Moser, Debra K; Strömberg, Anna

    2015-08-01

    The four-item Control Attitudes Scale (CAS) was developed to measure control perceived by patients with cardiac disease and their family members, but extensive psychometric evaluation has not been performed. The aim was to translate, culturally adapt and psychometrically evaluate the CAS in a Swedish sample of implantable cardioverter defibrillator (ICD) recipients, heart failure (HF) patients and their partners. A sample (n=391) of ICD recipients, HF patients and partners were used. Descriptive statistics, item-total and inter-item correlations, exploratory factor analysis, ordinal regression modelling and Cronbach's alpha were used to validate the CAS. The findings from the factor analyses revealed that the CAS is a multidimensional scale including two factors, Control and Helplessness. The internal consistency was satisfactory for all scales (α=0.74-0.85), except the family version total scale (α=0.62). No differential item functioning was detected which implies that the CAS can be used to make invariant comparisons between groups of different age and sex. The psychometric properties, together with the simple and short format of the CAS, make it to a useful tool for measuring perceived control among patients with cardiac diseases and their family members. When using the CAS, subscale scores should be preferred. © The European Society of Cardiology 2014.

  20. Item response theory analyses of the Delis-Kaplan Executive Function System card sorting subtest.

    PubMed

    Spencer, Mercedes; Cho, Sun-Joo; Cutting, Laurie E

    2018-02-02

    In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years. We also tested for measurement invariance for these items across age and gender groups using item response theory (IRT). Results of the exploratory factor analysis indicated that a two-factor model that distinguished between verbal and perceptual items provided the best fit to the data. Although the items demonstrated measurement invariance across age groups, measurement invariance was violated for gender groups, with two items demonstrating differential item functioning for males and females. Multigroup analysis using all 16 items indicated that the items were more effective for individuals whose IRT scale scores were relatively high. A single-group explanatory IRT model using 14 non-differential item functioning items showed that for perceptual ability, females scored higher than males and that scores increased with age for both males and females; for verbal ability, the observed increase in scores across age differed for males and females. The implications of these findings are discussed.

  1. Rasch analysis of the hospital anxiety and depression scale among Chinese cataract patients.

    PubMed

    Lin, Xianchai; Chen, Ziyan; Jin, Ling; Gao, Wuyou; Qu, Bo; Zuo, Yajing; Liu, Rongjiao; Yu, Minbin

    2017-01-01

    To analyze the validity of the Hospital Anxiety and Depression Scale (HADS) among Chinese cataract population. A total of 275 participants with unilateral or bilateral cataract were recruited to complete the Chinese version of HADS. The patients' demographic and ophthalmic characteristics were documented. Rasch analysis was conducted to examine the model fit statistics, the thresholds ordering of the polytomous items, targeting, person separation index and reliability, local dependency, unidimentionality, differential item functioning (DIF) and construct validity of the HADS individual and summary measures. Rasch analysis was performed on anxiety and depression subscales as well as HADS-Total score respectively. The items of original HADS-Anxiety, HADS-Depression and HADS-Total demonstrated evidence of misfit of the Rasch model. Removing items A7 for anxiety subscale and rescoring items D14 for depression subscale significantly improved Rasch model fit. A 12-item higher order total scale with further removal of D12 was found to fit the Rasch model. The modified items had ordered response thresholds. No uniform DIF was detected, whereas notable non-uniform DIF in high-ability group was found. The revised cut-off points were given for the modified anxiety and depression subscales. The modified version of HADS with HADS-A and HADS-D as subscale and HADS-T as a higher-order measure is a reliable and valid instrument that may be useful for assessing anxiety and depression states in Chinese cataract population.

  2. Development and validation of brief scales to measure emotional and behavioural problems among Chinese adolescents

    PubMed Central

    Shen, Minxue; Hu, Ming; Sun, Zhenqiu

    2017-01-01

    Objectives To develop and validate brief scales to measure common emotional and behavioural problems among adolescents in the examination-oriented education system and collectivistic culture of China. Setting Middle schools in Hunan province. Participants 5442 middle school students aged 11–19 years were sampled. 4727 valid questionnaires were collected and used for validation of the scales. The final sample included 2408 boys and 2319 girls. Primary and secondary outcome measures The tools were assessed by the item response theory, classical test theory (reliability and construct validity) and differential item functioning. Results Four scales to measure anxiety, depression, study problem and sociality problem were established. Exploratory factor analysis showed that each scale had two solutions. Confirmatory factor analysis showed acceptable to good model fit for each scale. Internal consistency and test–retest reliability of all scales were above 0.7. Item response theory showed that all items had acceptable discrimination parameters and most items had appropriate difficulty parameters. 10 items demonstrated differential item functioning with respect to gender. Conclusions Four brief scales were developed and validated among adolescents in middle schools of China. The scales have good psychometric properties with minor differential item functioning. They can be used in middle school settings, and will help school officials to assess the students’ emotional/behavioural problems. PMID:28062469

  3. Differential item functioning by sex and race in the Hogan Personality Inventory.

    PubMed

    Sheppard, Richard; Han, Kyunghee; Colarelli, Stephen M; Dai, Guangdong; King, Daniel W

    2006-12-01

    The authors examined measurement bias in the Hogan Personality Inventory by investigating differential item functioning (DIF) across sex and two racial groups (Caucasian and Black). The sample consisted of 1,579 Caucasians (1,023 men, 556 women) and 523 Blacks (321 men, 202 women) who were applying for entry-level, unskilled jobs in factories. Although the group mean differences were trivial, more than a third of the items showed DIF by sex (38.4%) and by race (37.3%). A content analysis of potentially biased items indicated that the themes of items displaying DIF were slightly more cohesive for sex than for race. The authors discuss possible explanations for differing clustering tendencies of items displaying DIF and some practical and theoretical implications of DIF in the development and interpretation of personality inventories.

  4. Factor structure and gender stability in the multidimensional condom attitudes scale.

    PubMed

    Starosta, Amy J; Berghoff, Christopher R; Earleywine, Mitch

    2015-06-01

    Sexually transmitted infections continue to trouble the United States and can be attenuated through increased condom use. Attitudes about condoms are an important multidimensional factor that can affect sexual health choices and have been successfully measured using the Multidimensional Condom Attitudes Scale (MCAS). Such attitudes have the potential to vary between men and women, yet little work has been undertaken to identify if the MCAS accurately captures attitudes without being influenced by underlying gender biases. We examined the factor structure and gender invariance on the MCAS using confirmatory factor analysis and item response theory, within-subscale differential item functioning analyses. More than 770 participants provided data via the Internet. Results of differential item functioning analyses identified three items as differentially functioning between the genders, and removal of these items is recommended. Findings confirmed the previously hypothesized multidimensional nature of condom attitudes and the five-factor structure of the MCAS even after the removal of the three problematic items. In general, comparisons across genders using the MCAS seem reasonable from a methodological standpoint. Results are discussed in terms of improving sexual health research and interventions. © The Author(s) 2014.

  5. Recursive Partitioning to Identify Potential Causes of Differential Item Functioning in Cross-National Data

    ERIC Educational Resources Information Center

    Finch, W. Holmes; Hernández Finch, Maria E.; French, Brian F.

    2016-01-01

    Differential item functioning (DIF) assessment is key in score validation. When DIF is present scores may not accurately reflect the construct of interest for some groups of examinees, leading to incorrect conclusions from the scores. Given rising immigration, and the increased reliance of educational policymakers on cross-national assessments…

  6. Using Multiple-Variable Matching to Identify Cultural Sources of Differential Item Functioning

    ERIC Educational Resources Information Center

    Wu, Amery D.; Ercikan, Kadriye

    2006-01-01

    Identifying the sources of differential item functioning (DIF) in international assessments is very challenging, because such sources are often nebulous and intertwined. Even though researchers frequently focus on test translation and content area, few actually go beyond these factors to investigate other cultural sources of DIF. This article…

  7. A Stepwise Test Characteristic Curve Method to Detect Item Parameter Drift

    ERIC Educational Resources Information Center

    Guo, Rui; Zheng, Yi; Chang, Hua-Hua

    2015-01-01

    An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the…

  8. 41 CFR 101-30.101-2 - Item of supply.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ....101-2 Section 101-30.101-2 Public Contracts and Property Management Federal Property Management Regulations System FEDERAL PROPERTY MANAGEMENT REGULATIONS SUPPLY AND PROCUREMENT 30-FEDERAL CATALOG SYSTEM 30... differentiates one item from another item in the Federal Catalog System. Each item of supply is expressed in and...

  9. 41 CFR 101-30.101-2 - Item of supply.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ....101-2 Section 101-30.101-2 Public Contracts and Property Management Federal Property Management Regulations System FEDERAL PROPERTY MANAGEMENT REGULATIONS SUPPLY AND PROCUREMENT 30-FEDERAL CATALOG SYSTEM 30... differentiates one item from another item in the Federal Catalog System. Each item of supply is expressed in and...

  10. 41 CFR 101-30.101-2 - Item of supply.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ....101-2 Section 101-30.101-2 Public Contracts and Property Management Federal Property Management Regulations System FEDERAL PROPERTY MANAGEMENT REGULATIONS SUPPLY AND PROCUREMENT 30-FEDERAL CATALOG SYSTEM 30... differentiates one item from another item in the Federal Catalog System. Each item of supply is expressed in and...

  11. 41 CFR 101-30.101-2 - Item of supply.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ....101-2 Section 101-30.101-2 Public Contracts and Property Management Federal Property Management Regulations System FEDERAL PROPERTY MANAGEMENT REGULATIONS SUPPLY AND PROCUREMENT 30-FEDERAL CATALOG SYSTEM 30... differentiates one item from another item in the Federal Catalog System. Each item of supply is expressed in and...

  12. 41 CFR 101-30.101-2 - Item of supply.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ....101-2 Section 101-30.101-2 Public Contracts and Property Management Federal Property Management Regulations System FEDERAL PROPERTY MANAGEMENT REGULATIONS SUPPLY AND PROCUREMENT 30-FEDERAL CATALOG SYSTEM 30... differentiates one item from another item in the Federal Catalog System. Each item of supply is expressed in and...

  13. Psychometric properties of the painDETECT questionnaire in rheumatoid arthritis, psoriatic arthritis and spondyloarthritis: Rasch analysis and test-retest reliability.

    PubMed

    Rifbjerg-Madsen, Signe; Wæhrens, Eva Ejlersen; Danneskiold-Samsøe, Bente; Amris, Kirstine

    2017-05-22

    Pain is inherent in rheumatoid arthritis (RA), psoriatic arthritis (PsA) and spondyloarthritis (SpA) and traditionally considered to be of nociceptive origin. Emerging data suggest a potential role of augmented central pain mechanisms in subsets of patients, thus, valid instruments that can identify underlying pain mechanisms are needed. The painDETECT questionnaire (PDQ) was originally designed to differentiate between pain phenotypes. The objectives were to evaluate the psychometric properties of the PDQ in patients with inflammatory arthritis by applying Rasch analysis and to explore the reliability of pain classification by test-retest. For the Rasch analysis 900 questionnaires from patients with RA, PsA and SpA (300 per diagnosis) were extracted from 'the DANBIO painDETECT study'. The analysis was directed at the seven items assessing somatosensory symptoms and included: 1) the performance of the six-category Likert scale; 2) whether a unidimensional construct was defined; 3) the reliability and precision of estimates. Another group of 30 patients diagnosed with RA, PsA or SpA participated in a test-retest study. Intraclass Correlation Coefficients (ICC) and classification consistency were calculated. The Rasch analysis revealed: (1) Acceptable psychometric rating scale properties; the frequency distribution peaked in category 0 except for item 5, threshold calibration >10 observations per category, no disorder in the category measures for all items, scale category outfit Mnsq <2.0, small distances (<1.4 logits) between thresholds for category 1, 2 and 3 for all items. (2) The principal component analysis supported unidimensionality; the standardized residuals showed that 53.7% of total variance was explained by the measure and the magnitude of first contrast had an eigenvalue of 1.5, no misfitting items, clinical insignificant different item hierarchies across diagnoses (DIF < 0.5 logits). (3) A targeted item-person map, person and item separation indices of 1.88(reliability = 0.78), and 13.04 (reliability = 0.99). The test-retest revealed: ICC: RA 0.86(0.56-0.96), PsA 0.96(0.74-0.99), SpA 0.93(0.76-98), overall 0.94(0.84-0.98). Classification consistency was: RA 70%, PsA 80%, SpA 90%, overall 80%. The results support that the PDQ can be used as a classification instrument and assist identification of underlying pain-mechanisms in patients suffering from inflammatory arthritis.

  14. Numerical Differentiation Methods for Computing Error Covariance Matrices in Item Response Theory Modeling: An Evaluation and a New Proposal

    ERIC Educational Resources Information Center

    Tian, Wei; Cai, Li; Thissen, David; Xin, Tao

    2013-01-01

    In item response theory (IRT) modeling, the item parameter error covariance matrix plays a critical role in statistical inference procedures. When item parameters are estimated using the EM algorithm, the parameter error covariance matrix is not an automatic by-product of item calibration. Cai proposed the use of Supplemented EM algorithm for…

  15. Effects of Ignoring Item Interaction on Item Parameter Estimation and Detection of Interacting Items

    ERIC Educational Resources Information Center

    Chen, Cheng-Te; Wang, Wen-Chung

    2007-01-01

    This study explores the effects of ignoring item interaction on item parameter estimation and the efficiency of using the local dependence index Q[subscript 3] and the SAS NLMIXED procedure to detect item interaction under the three-parameter logistic model and the generalized partial credit model. Through simulations, it was found that ignoring…

  16. Rasch Analysis of the Power as Knowing Participation in Change Tool--the Brazilian version.

    PubMed

    Guedes, Erika de Souza; Orozco-Vargas, Luiz Carlos; Turrini, Ruth Natália Teresa; de Sousa, Regina Márcia Cardoso; dos Santos, Mariana Alvina; da Cruz, Diná de Almeida Lopes Monteiro

    2013-01-01

    the objective of this study was to evaluate the items contained in the Brazilian version of the Power as Knowing Participation in Change Tool (PKPCT). investigation of the psychometric properties of the mentioned questionnaire through Rasch analysis. the data from 952 nursing assistants and 627 baccalaureate nurses were analyzed (average age 44.1 (SD=9.5); 13.0% men). The subscales Choices, Awareness, Freedom and Involvement were tested separately and presented unidimensionality; the categories of the responses given to the items were compiled from 7 to 3 levels and the items fit the model well, except for the following/leading item, in which the infit and outfit values were above 1.4; this item has also presented Differential Item Functioning (DIF) according to the participant's role. The reliability of the items was of 0.99 and the reliability of the participants ranged from 0.80 to 0.84 in the subscales. Items with extremely high levels of difficulty were not identified. the PKPCT should not be viewed as unidimensional, items with extremely high levels of difficulty in the scale need to be created and the differential functioning of some items has to be further investigated.

  17. Item response theory analysis of the life orientation test-revised: age and gender differential item functioning analyses.

    PubMed

    Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina

    2015-06-01

    This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism. © The Author(s) 2014.

  18. Acculturation and the Center For Epidemiological Studies-Depression Scale for Hispanic women.

    PubMed

    McCabe, Brian E; Vermeesch, Amber L; Hall, Rosemary F; Peragallo, Nilda P; Mitrani, Victoria B

    2011-01-01

    Culturally valid measures of depression for Spanish-speaking Hispanic women are important for developing and implementing effective interventions to reduce health disparities. The Center for Epidemiological Studies-Depression Scale (CES-D) is a widely used measure of depression. Differential item functioning has been studied using language preference as a proxy for acculturation, but it is unknown if the results were due to acculturation or the language of administration. The aim of this study was to evaluate the relationship of acculturation, defined with a dimensional measure, to Spanish CES-D item responses. Spanish-speaking Hispanic women (n = 504) were recruited for a randomized controlled trial of Salud, Educación, Prevención y Autocuidado (Health, Education, Prevention, and Self-Care). Acculturation, an important dimension of variation within the diverse U.S. Hispanic community, was defined by high or low scores on the Americanism subscale of the Bidimensional Acculturation Scale. Differential item functioning for each of the 20 CES-D items between more acculturated and less acculturated women was tested using ordinal logistic regression. No items on the Depressed Affect, Somatic Activity, or Positive Affect subscales showed meaningful differential item functioning, but 1 item ("People were unfriendly") on the Interpersonal subscale had small results (R = 1.1%). The majority of CES-D items performed similarly for Spanish-speaking Hispanic women with high and low acculturation. Less acculturated women responded more positively to "People were unfriendly," despite having an equivalent level of depression, than did more acculturated women. Possibilities for improving this item are proposed.

  19. Comparison of Objective and Subjective Methods on Determination of Differential Item Functioning

    ERIC Educational Resources Information Center

    Sahin, Melek Gülsah

    2017-01-01

    Research objective is comparing the objective methods often used in literature for determination of differential item functioning (DIF) and the subjective method based on the opinions of the experts which are not used so often in literature. Mantel-Haenszel (MH), Logistic Regression (LR) and SIBTEST are chosen as objective methods. While the data…

  20. Differential Item Functioning Analysis of High-Stakes Test in Terms of Gender: A Rasch Model Approach

    ERIC Educational Resources Information Center

    Alavi, Seyed Mohammad; Bordbar, Soodeh

    2017-01-01

    Differential Item Functioning (DIF) analysis is a key element in evaluating educational test fairness and validity. One of the frequently cited sources of construct-irrelevant variance is gender which has an important role in the university entrance exam; therefore, it causes bias and consequently undermines test validity. The present study aims…

  1. Exploring Crossing Differential Item Functioning by Gender in Mathematics Assessment

    ERIC Educational Resources Information Center

    Ong, Yoke Mooi; Williams, Julian; Lamprianou, Iasonas

    2015-01-01

    The purpose of this article is to explore crossing differential item functioning (DIF) in a test drawn from a national examination of mathematics for 11-year-old pupils in England. An empirical dataset was analyzed to explore DIF by gender in a mathematics assessment. A two-step process involving the logistic regression (LR) procedure for…

  2. An Introduction to Missing Data in the Context of Differential Item Functioning

    ERIC Educational Resources Information Center

    Banks, Kathleen

    2015-01-01

    This article introduces practitioners and researchers to the topic of missing data in the context of differential item functioning (DIF), reviews the current literature on the issue, discusses implications of the review, and offers suggestions for future research. A total of nine studies were reviewed. All of these studies determined what effect…

  3. Differential Item Functioning By Sex and Race in The Hogan Personality Inventory

    ERIC Educational Resources Information Center

    Sheppard, Richard; Han, Kyunghee; Colarelli, Stephen M.; Dai, Guangdong; King, Daniel W.

    2006-01-01

    The authors examined measurement bias in the Hogan Personality Inventory by investigating differential item functioning (DIF) across sex and two racial groups (Caucasian and Black). The sample consisted of 1,579 Caucasians (1,023 men, 556 women) and 523 Blacks (321 men, 202 women) who were applying for entry-level, unskilled jobs in factories.…

  4. Symptom endorsement in men versus women with a diagnosis of depression: A differential item functioning approach.

    PubMed

    Cavanagh, Anna; Wilson, Coralie J; Caputi, Peter; Kavanagh, David J

    2016-09-01

    There is some evidence that, in contrast to depressed women, depressed men tend to report alternative symptoms that are not listed as standard diagnostic criteria. This may possibly lead to an under- or misdiagnosis of depression in men. This study aims to clarify whether depressed men and women report different symptoms. This study used data from the 2007 Australian National Survey of Mental Health and Wellbeing that was collected using the World Health Organization's Composite International Diagnostic Interview. Participants with a diagnosis of a depressive disorder with 12-month symptoms (n = 663) were identified and included in this study. Differential item functioning (DIF) was used to test whether depressed men and women endorse different features associated with their condition. Gender-related DIF was present for three symptoms associated with depression. Depressed women were more likely to report 'appetite/weight disturbance', whereas depressed men were more likely to report 'alcohol misuse' and 'substance misuse'. While the results may reflect a greater risk of co-occurring alcohol and substance misuse in men, inclusion of these features in assessments may improve the detection of depression in men, especially if standard depressive symptoms are under-reported. © The Author(s) 2016.

  5. A Differential Item Functional Analysis by Age of Perceived Interpersonal Discrimination in a Multi-racial/ethnic Sample of Adults.

    PubMed

    Owens, Sherry; Kristjansson, Alfgeir L; Hunte, Haslyn E R

    2015-11-05

    We investigated whether individual items on the nine item William's Perceived Everyday Discrimination Scale (EDS) functioned differently by age (<45 vs ≥ 45) within five racial groups in the United States: Asians (n=2,017); Hispanics (n=2,688); Black Caribbeans (n=1,377); African Americans (n=3,434); and Whites (n=854). We used data from the 2001-2003 National Survey of American Lives and the 2001-2003 National Latino and Asian Studies. Multiple-indicator, multiple-cause models (MIMIC) were used to examine differential item functioning (DIF) on the EDS by age within each racial/ethnic group. Overall, Asian and Hispanic respondents reported less discrimination than Whites; on the other hand, African Americans and Black Caribbeans reported more discrimination than Whites. Regardless of race/ethnicity, the younger respondents (aged <45 years) reported less discrimination than the older respondents (aged ≥ 45 years). In terms of age by race/ethnicity, the results were mixed for 19 out of 45 tests of DIF (40%). No differences in item function were observed among Black Caribbeans. "Being called names or insulted" and others acting as "if they are afraid" of the respondents were the only two items that did not exhibit differential item functioning by age across all racial/ethnic groups. Overall, our findings suggest that the EDS scale should be used with caution in multi-age multi-racial/ethnic samples.

  6. Development and psychometric evaluation of the PROMIS Pediatric Life Satisfaction item banks, child-report, and parent-proxy editions.

    PubMed

    Forrest, Christopher B; Devine, Janine; Bevans, Katherine B; Becker, Brandon D; Carle, Adam C; Teneralli, Rachel E; Moon, JeanHee; Tucker, Carole A; Ravens-Sieberer, Ulrike

    2018-01-01

    To describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Life Satisfaction item banks, child-report, and parent-proxy editions. A pool of 55 life satisfaction items was administered to 1992 children 8-17 years old and 964 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and assessment of construct validity. Thirteen items were deleted because of poor psychometric performance. An 8-item short form was administered to a national sample of 996 children 8-17 years old, and 1294 parents of children 5-17 years old. The combined sample (2988 children and 2258 parents) was used in item response theory (IRT) calibration analyses. The final item banks were unidimensional, the items were locally independent, and the items were free from impactful differential item functioning. The 8-item and 4-item short form scales showed excellent reliability, convergent validity, and discriminant validity. Life satisfaction decreased with declining socio-economic status, presence of a special health care need, and increasing age for girls, but not boys. After IRT calibration, we found that 4- and 8-item short forms had a high degree of precision (reliability) across a wide range (>4 SD units) of the latent variable. The PROMIS Pediatric Life Satisfaction item banks and their short forms provide efficient, precise, and valid assessments of life satisfaction in children and youth.

  7. Development and Evaluation of the PROMIS® Pediatric Positive Affect Item Bank, Child-Report and Parent-Proxy Editions.

    PubMed

    Forrest, Christopher B; Ravens-Sieberer, Ulrike; Devine, Janine; Becker, Brandon D; Teneralli, Rachel; Moon, JeanHee; Carle, Adam; Tucker, Carole A; Bevans, Katherine B

    2018-03-01

    The purpose of this study is to describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Positive Affect item bank, child-report and parent-proxy editions. The initial item pool comprising 53 items, previously developed using qualitative methods, was administered to 1,874 children 8-17 years old and 909 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and construct validity. A total of 14 items were deleted, because of poor psychometric performance, and an 8-item short form constructed from the remaining 39 items was administered to a national sample of 1,004 children 8-17 years old, and 1,306 parents of children 5-17 years old. The combined sample was used in item response theory (IRT) calibration analyses. The final item bank appeared unidimensional, the items appeared locally independent, and the items were free from differential item functioning. The scales showed excellent reliability and convergent and discriminant validity. Positive affect decreased with children's age and was lower for those with a special health care need. After IRT calibration, we found that 4 and 8 item short forms had a high degree of precision (reliability) across a wide range of the latent trait (>4 SD units). The PROMIS Pediatric Positive Affect item bank and its short forms provide an efficient, precise, and valid assessment of positive affect in children and youth.

  8. Aggregating Polytomous DIF Results over Multiple Test Administrations

    ERIC Educational Resources Information Center

    Zwick, Rebecca; Ye, Lei; Isham, Steven

    2018-01-01

    In typical differential item functioning (DIF) assessments, an item's DIF status is not influenced by its status in previous test administrations. An item that has shown DIF at multiple administrations may be treated the same way as an item that has shown DIF in only the most recent administration. Therefore, much useful information about the…

  9. A Comparison of Linking and Concurrent Calibration under the Graded Response Model.

    ERIC Educational Resources Information Center

    Kim, Seock-Ho; Cohen, Allan S.

    Applications of item response theory to practical testing problems including equating, differential item functioning, and computerized adaptive testing, require that item parameter estimates be placed onto a common metric. In this study, two methods for developing a common metric for the graded response model under item response theory were…

  10. Calibrating well-being, quality of life and common mental disorder items: psychometric epidemiology in public mental health research.

    PubMed

    Böhnke, Jan R; Croudace, Tim J

    2016-08-01

    The assessment of 'general health and well-being' in public mental health research stimulates debates around relative merits of questionnaire instruments and their items. Little evidence regarding alignment or differential advantages of instruments or items has appeared to date. Population-based psychometric study of items employed in public mental health narratives. Multidimensional item response theory was applied to General Health Questionnaire (GHQ-12), Warwick-Edinburgh Mental Well-being Scale (WEMWBS) and EQ-5D items (Health Survey for England, 2010-2012; n = 19 290). A bifactor model provided the best account of the data and showed that the GHQ-12 and WEMWBS items assess mainly the same construct. Only one item of the EQ-5D showed relevant overlap with this dimension (anxiety/depression). Findings were corroborated by comparisons with alternative models and cross-validation analyses. The consequences of this lack of differentiation (GHQ-12 v. WEMWBS) for mental health and well-being narratives deserves discussion to enrich debates on priorities in public mental health and its assessment. © The Royal College of Psychiatrists 2015.

  11. Detecting Migraine in Patients with Mild Traumatic Brain Injury Using Three Different Headache Measures

    PubMed Central

    Anderson, Kirsten; Tinawi, Simon; de Guise, Elaine

    2015-01-01

    Posttraumatic migraine may represent an important subtype of headache among the traumatic brain injury (TBI) population and is associated with increased recovery times. However, it is underdiagnosed in patients with mild traumatic brain injury (mTBI). This study examined the effectiveness of the self-administered Nine-Item Screener (Nine-Item Screener-SA), the Headache Impact Test- 6 (HIT-6), the 3-Item Migraine Screener, and the Rivermead Post-Concussion Questionnaire (RPQ) at discriminating between mTBI patients with (n = 23) and without (n = 20) migraines. The Nine-Item Screener demonstrated significant differences between migraine patients with and without migraine on nearly every question, especially on Question 9 (disability), sensitivity: 0.95 and specificity: 0.65 (95% CI, 0.64–0.90). The HIT-6 demonstrated significant differences between migraine and no-migraine patients on disability and pain severity, with disability having a sensitivity of 0.70 and specificity of 0.75 (95% CI, 0.54–0.83). Only Question 3 of the 3-Item ID Migraine Screener (photosensitivity) showed significant differences between migraine and no-migraine patients, sensitivity: 0.84 and specificity: 0.55 (CI, 0.52–0.82). The RPQ did not reveal greater symptoms in migraine patients compared with those without. Among headache measures, the Nine-Item Screener-SA best differentiated between mTBI patients with and without migraine. Disability may best identify migraine sufferers among the TBI population. PMID:26106255

  12. Laser Raman detection for oral cancer based on a Gaussian process classification method

    NASA Astrophysics Data System (ADS)

    Du, Zhanwei; Yang, Yongjian; Bai, Yuan; Wang, Lijun; Zhang, Chijun; Chen, He; Luo, Yusheng; Su, Le; Chen, Yong; Li, Xianchang; Zhou, Xiaodong; Jia, Jun; Shen, Aiguo; Hu, Jiming

    2013-06-01

    Oral squamous cell carcinoma is the most common neoplasm of the oral cavity. The incidence rate accounts for 80% of total oral cancer and shows an upward trend in recent years. It has a high degree of malignancy and is difficult to detect in terms of differential diagnosis, as a consequence of which the timing of treatment is always delayed. In this work, Raman spectroscopy was adopted to differentially diagnose oral squamous cell carcinoma and oral gland carcinoma. In total, 852 entries of raw spectral data which consisted of 631 items from 36 oral squamous cell carcinoma patients, 87 items from four oral gland carcinoma patients and 134 items from five normal people were collected by utilizing an optical method on oral tissues. The probability distribution of the datasets corresponding to the spectral peaks of the oral squamous cell carcinoma tissue was analyzed and the experimental result showed that the data obeyed a normal distribution. Moreover, the distribution characteristic of the noise was also in compliance with a Gaussian distribution. A Gaussian process (GP) classification method was utilized to distinguish the normal people and the oral gland carcinoma patients from the oral squamous cell carcinoma patients. The experimental results showed that all the normal people could be recognized. 83.33% of the oral squamous cell carcinoma patients could be correctly diagnosed and the remaining ones would be diagnosed as having oral gland carcinoma. For the classification process of oral gland carcinoma and oral squamous cell carcinoma, the correct ratio was 66.67% and the erroneously diagnosed percentage was 33.33%. The total sensitivity was 80% and the specificity was 100% with the Matthews correlation coefficient (MCC) set to 0.447 213 595. Considering the numerical results above, the application prospects and clinical value of this technique are significantly impressive.

  13. Assessing the Utility of Item Response Theory Models: Differential Item Functioning.

    ERIC Educational Resources Information Center

    Scheuneman, Janice Dowd

    The current status of item response theory (IRT) is discussed. Several IRT methods exist for assessing whether an item is biased. Focus is on methods proposed by L. M. Rudner (1975), F. M. Lord (1977), D. Thissen et al. (1988) and R. L. Linn and D. Harnisch (1981). Rudner suggested a measure of the area lying between the two item characteristic…

  14. What's in a Topic? Exploring the Interaction between Test-Taker Age and Item Content in High-Stakes Testing

    ERIC Educational Resources Information Center

    Banerjee, Jayanti; Papageorgiou, Spiros

    2016-01-01

    The research reported in this article investigates differential item functioning (DIF) in a listening comprehension test. The study explores the relationship between test-taker age and the items' language domains across multiple test forms. The data comprise test-taker responses (N = 2,861) to a total of 133 unique items, 46 items of which were…

  15. The Impact of Non-attempted and Dually-Attempted Items on Person Abilities Using Item Response Theory

    PubMed Central

    Sideridis, Georgios D.; Tsaousis, Ioannis; Al Harbi, Khaleel

    2016-01-01

    The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction. PMID:27790174

  16. The Impact of Non-attempted and Dually-Attempted Items on Person Abilities Using Item Response Theory.

    PubMed

    Sideridis, Georgios D; Tsaousis, Ioannis; Al Harbi, Khaleel

    2016-01-01

    The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction.

  17. Uncertainty in BRCA1 cancer susceptibility testing.

    PubMed

    Baty, Bonnie J; Dudley, William N; Musters, Adrian; Kinney, Anita Y

    2006-11-15

    This study investigated uncertainty in individuals undergoing genetic counseling/testing for breast/ovarian cancer susceptibility. Sixty-three individuals from a single kindred with a known BRCA1 mutation rated uncertainty about 12 items on a five-point Likert scale before and 1 month after genetic counseling/testing. Factor analysis identified a five-item total uncertainty scale that was sensitive to changes before and after testing. The items in the scale were related to uncertainty about obtaining health care, positive changes after testing, and coping well with results. The majority of participants (76%) rated reducing uncertainty as an important reason for genetic testing. The importance of reducing uncertainty was stable across time and unrelated to anxiety or demographics. Yet, at baseline, total uncertainty was low and decreased after genetic counseling/testing (P = 0.004). Analysis of individual items showed that after genetic counseling/testing, there was less uncertainty about the participant detecting cancer early (P = 0.005) and coping well with their result (P < 0.001). Our findings support the importance to clients of genetic counseling/testing as a means of reducing uncertainty. Testing may help clients to reduce the uncertainty about items they can control, and it may be important to differentiate the sources of uncertainty that are more or less controllable. Genetic counselors can help clients by providing anticipatory guidance about the role of uncertainty in genetic testing. (c) 2006 Wiley-Liss, Inc.

  18. Solving the measurement invariance anchor item problem in item response theory.

    PubMed

    Meade, Adam W; Wright, Natalie A

    2012-09-01

    The efficacy of tests of differential item functioning (measurement invariance) has been well established. It is clear that when properly implemented, these tests can successfully identify differentially functioning (DF) items when they exist. However, an assumption of these analyses is that the metric for different groups is linked using anchor items that are invariant. In practice, however, it is impossible to be certain which items are DF and which are invariant. This problem of anchor items, or referent indicators, has long plagued invariance research, and a multitude of suggested approaches have been put forth. Unfortunately, the relative efficacy of these approaches has not been tested. This study compares 11 variations on 5 qualitatively different approaches from recent literature for selecting optimal anchor items. A large-scale simulation study indicates that for nearly all conditions, an easily implemented 2-stage procedure recently put forth by Lopez Rivas, Stark, and Chernyshenko (2009) provided optimal power while maintaining nominal Type I error. With this approach, appropriate anchor items can be easily and quickly located, resulting in more efficacious invariance tests. Recommendations for invariance testing are illustrated using a pedagogical example of employee responses to an organizational culture measure.

  19. Differential Item Functioning in While-Listening Performance Tests: The Case of the International English Language Testing System (IELTS) Listening Module

    ERIC Educational Resources Information Center

    Aryadoust, Vahid

    2012-01-01

    This article investigates a version of the International English Language Testing System (IELTS) listening test for evidence of differential item functioning (DIF) based on gender, nationality, age, and degree of previous exposure to the test. Overall, the listening construct was found to be underrepresented, which is probably an important cause…

  20. Differential Item Functioning for Accommodated Students with Disabilities: Effect of Differences in Proficiency Distributions

    ERIC Educational Resources Information Center

    Quesen, Sarah

    2016-01-01

    When studying differential item functioning (DIF) with students with disabilities (SWD) focal groups typically suffer from small sample size, whereas the reference group population is usually large. This makes it possible for a researcher to select a sample from the reference population to be similar to the focal group on the ability scale. Doing…

  1. A Robust Outlier Approach to Prevent Type I Error Inflation in Differential Item Functioning

    ERIC Educational Resources Information Center

    Magis, David; De Boeck, Paul

    2012-01-01

    The identification of differential item functioning (DIF) is often performed by means of statistical approaches that consider the raw scores as proxies for the ability trait level. One of the most popular approaches, the Mantel-Haenszel (MH) method, belongs to this category. However, replacing the ability level by the simple raw score is a source…

  2. Differential Item Functioning Comparisons on a Performance-Based Alternate Assessment for Students with Severe Cognitive Impairments, Autism and Orthopedic Impairments

    ERIC Educational Resources Information Center

    Laitusis, Cara Cahalan; Maneckshana, Behroz; Monfils, Lora; Ahlgrim-Delzell, Lynn

    2009-01-01

    The purpose of this study was to examine Differential Item Functioning (DIF) by disability groups on an on-demand performance assessment for students with severe cognitive impairments. Researchers examined the presence of DIF for two comparisons. One comparison involved students with severe cognitive impairments who served as the reference group…

  3. An Examination of Differential Item Functioning on the Vanderbilt Assessment of Leadership in Education

    ERIC Educational Resources Information Center

    Polikoff, Morgan S.; May, Henry; Porter, Andrew C.; Elliott, Stephen N.; Goldring, Ellen; Murphy, Joseph

    2009-01-01

    The Vanderbilt Assessment of Leadership in Education is a 360-degree assessment of the effectiveness of principals' learning-centered leadership behaviors. In this report, we present results from a differential item functioning (DIF) study of the assessment. Using data from a national field trial, we searched for evidence of DIF on school level,…

  4. Using a Mixture IRT Model to Understand English Learner Performance on Large-Scale Assessments

    ERIC Educational Resources Information Center

    Shea, Christine A.

    2013-01-01

    The purpose of this study was to determine whether an eighth grade state-level math assessment contained items that function differentially (DIF) for English Learner students (EL) as compared to English Only students (EO) and if so, what factors might have caused DIF. To determine this, Differential Item Functioning (DIF) analysis was employed.…

  5. The Generalized Anxiety Disorder 7-item scale in adolescents with generalized anxiety disorder: Signal detection and validation.

    PubMed

    Mossman, Sarah A; Luft, Marissa J; Schroeder, Heidi K; Varney, Sara T; Fleck, David E; Barzman, Drew H; Gilman, Richard; DelBello, Melissa P; Strawn, Jeffrey R

    2017-11-01

    In pediatric patients with anxiety disorders, existing symptom inventories are either not freely available or require extensive time and effort to administer. We sought to evaluate a brief self-report scale-the Generalized Anxiety Disorder 7-item scale (GAD-7)-in adolescents with generalized anxiety disorder (GAD). The Pediatric Anxiety Rating Scale (PARS) and the GAD-7 were administered to youth with GAD (confirmed by structured interview). Relationships between the measures were assessed, and sensitivity and specificity was determined with regard to a global symptom severity measure (Clinical Global Impression-Severity). In adolescents with GAD (N = 40; mean age, 14.8 ± 2.8), PARS and GAD-7 scores strongly correlated (R = 0.65, P ≤ .001) and a main effect for symptom severity was observed (P ≤ .001). GAD-7 scores ≥11 and ≥17 represented the optimum specificity and sensitivity for detecting moderate and severe anxiety, respectively. The PARS and GAD-7 similarly reflect symptom severity. The GAD-7 is associated with acceptable specificity and sensitivity for detecting clinically significant anxiety symptoms. GAD-7 scores may be used to assess anxiety symptoms and to differentiate between mild and moderate GAD in adolescents, and may be more efficient than the PARS.

  6. Cross-Group Equivalence of Interest and Motivation Items in PISA 2012 Turkey Sample

    ERIC Educational Resources Information Center

    Ardic, Elif Ozlem; Gelbal, Selahattin

    2017-01-01

    Purpose: The aim of this study was to examine measurement invariance of the interest and motivation related items contained in the PISA 2012 student survey with regard to gender school type and statistical regions and to identify the items that show differential item functioning (DIF) across groups. Research Methods: Multiple-group confirmatory…

  7. Application of a Method of Estimating DIF for Polytomous Test Items.

    ERIC Educational Resources Information Center

    Camilli, Gregory; Congdon, Peter

    1999-01-01

    Demonstrates a method for studying differential item functioning (DIF) that can be used with dichotomous or polytomous items and that is valid for data that follow a partial credit Item Response Theory model. A simulation study shows that positively biased Type I error rates are in accord with results from previous studies. (SLD)

  8. Item Parameter Drift as an Indication of Differential Opportunity to Learn: An Exploration of Item Flagging Methods & Accurate Classification of Examinees

    ERIC Educational Resources Information Center

    Sukin, Tia M.

    2010-01-01

    The presence of outlying anchor items is an issue faced by many testing agencies. The decision to retain or remove an item is a difficult one, especially when the content representation of the anchor set becomes questionable by item removal decisions. Additionally, the reason for the aberrancy is not always clear, and if the performance of the…

  9. Assessment of Differential Item Functioning in the Experiences of Discrimination Index

    PubMed Central

    Cunningham, Timothy J.; Berkman, Lisa F.; Gortmaker, Steven L.; Kiefe, Catarina I.; Jacobs, David R.; Seeman, Teresa E.; Kawachi, Ichiro

    2011-01-01

    The psychometric properties of instruments used to measure self-reported experiences of discrimination in epidemiologic studies are rarely assessed, especially regarding construct validity. The authors used 2000–2001 data from the Coronary Artery Risk Development in Young Adults (CARDIA) Study to examine differential item functioning (DIF) in 2 versions of the Experiences of Discrimination (EOD) Index, an index measuring self-reported experiences of racial/ethnic and gender discrimination. DIF may confound interpretation of subgroup differences. Large DIF was observed for 2 of 7 racial/ethnic discrimination items: White participants reported more racial/ethnic discrimination for the “at school” item, and black participants reported more racial/ethnic discrimination for the “getting housing” item. The large DIF by race/ethnicity in the index for racial/ethnic discrimination probably reflects item impact and is the result of valid group differences between blacks and whites regarding their respective experiences of discrimination. The authors also observed large DIF by race/ethnicity for 3 of 7 gender discrimination items. This is more likely to have been due to item bias. Users of the EOD Index must consider the advantages and disadvantages of DIF adjustment (omitting items, constructing separate measures, and retaining items). The EOD Index has substantial usefulness as an instrument that can assess self-reported experiences of discrimination. PMID:22038104

  10. Development and psychometric characteristics of the SCI-QOL Ability to Participate and Satisfaction with Social Roles and Activities item banks and short forms.

    PubMed

    Heinemann, Allen W; Kisala, Pamela A; Hahn, Elizabeth A; Tulsky, David S

    2015-05-01

    To develop a spinal cord injury (SCI)-focused version of PROMIS and Neuro-QOL social domain item banks; evaluate the psychometric properties of items developed for adults with SCI; and report information to facilitate clinical and research use. We used a mixed-methods design to develop and evaluate Ability to Participate in Social Roles and Activities and Satisfaction with Social Roles and Activities items. Focus groups helped define the constructs; cognitive interviews helped revise items; and confirmatory factor analysis and item response theory methods helped calibrate item banks and evaluate differential item functioning related to demographic and injury characteristics. Five SCI Model System sites and one Veterans Administration medical center. The calibration sample consisted of 641 individuals; a reliability sample consisted of 245 individuals residing in the community. A subset of 27 Ability to Participate and 35 Satisfaction items demonstrated good measurement properties and negligible differential item functioning related to demographic and injury characteristics. The SCI-specific measures correlate strongly with the PROMIS and Neuro-QOL versions. Ten item short forms correlate >0.96 with the full banks. Variable-length CATs with a minimum of 4 items, variable-length CATs with a minimum of 8 items, fixed-length CATs of 10 items, and the 10-item short forms demonstrate construct coverage and measurement error that is comparable to the full item bank. The Ability to Participate and Satisfaction with Social Roles and Activities CATs and short forms demonstrate excellent psychometric properties and are suitable for clinical and research applications.

  11. Older and younger adults differently judge the similarity between negative affect terms.

    PubMed

    Ready, Rebecca E; Santorelli, Gennarina D; Mather, Molly A

    2018-01-02

    Theoretical models of aging suggest changes across the adult lifespan in the capacity to differentiate emotions. Greater emotion differentiation is associated with advantages in terms of emotion regulation and emotion resiliency. This study utilized a novel method that directly measures judgments of affect differentiation and does not confound affective experience with knowledge about affect terms. Theoretical predictions that older adults would distinguish more between affect terms than younger persons were tested. Older (n = 27; aged 60-92) and younger (n = 56; aged 18-32) adults rated the difference versus similarity of 16 affect terms from the Kessler and Staudinger ( 2009 ) scales; each of the 16 items was paired with every other item for a total of 120 ratings. Participants provided self-reports of trait emotions, alexithymia, and depressive symptoms. Older adults significantly differentiated more between low arousal and high arousal negative affect (NA) items than younger persons. Depressive symptoms were associated with similarity ratings across and within valence and arousal. Findings offer partial support for theoretical predictions that older adults differentiate more between affect terms than younger persons. To the extent that differentiating between negative affects can aid in emotion regulation, older adults may have an advantage over younger persons. Future research should investigate mechanisms that underlie age group differences in emotion differentiation.

  12. Evaluation of the Psychometric Properties of the Asian Adolescent Depression Scale and Construction of a Short Form: An Item Response Theory Analysis.

    PubMed

    Lo, Barbara Chuen Yee; Zhao, Yue; Kwok, Alice Wai Yee; Chan, Wai; Chan, Calais Kin Yuen

    2017-07-01

    The present study applied item response theory to examine the psychometric properties of the Asian Adolescent Depression Scale and to construct a short form among 1,084 teenagers recruited from secondary schools in Hong Kong. Findings suggested that some items of the full form reflected higher levels of severity and were more discriminating than others, and the Asian Adolescent Depression Scale was useful in measuring a broad range of depressive severity in community youths. Differential item functioning emerged in several items where females reported higher depressive severity than males. In the short form construction, preliminary validation suggested that, relative to the 20-item full form, our derived short form offered significantly greater diagnostic performance and stronger discriminatory ability in differentiating depressed and nondepressed groups, and simultaneously maintained adequate measurement precision with a reduced response burden in assessing depression in the Asian adolescents. Cultural variance in depressive symptomatology and clinical implications are discussed.

  13. Gender-based Differential Item Functioning in the Application of the Theory of Planned Behavior for the Study of Entrepreneurial Intentions

    PubMed Central

    Zampetakis, Leonidas A.; Bakatsaki, Maria; Litos, Charalambos; Kafetsios, Konstantinos G.; Moustakis, Vassilis

    2017-01-01

    Over the past years the percentage of female entrepreneurs has increased, yet it is still far below of that for males. Although various attempts have been made to explain differences in mens’ and women’s entrepreneurial attitudes and intentions, the extent to which those differences are due to self-report biases has not been yet considered. The present study utilized Differential Item Functioning (DIF) to compare men and women’s reporting on entrepreneurial intentions. DIF occurs in situations where members of different groups show differing probabilities of endorsing an item despite possessing the same level of the ability that the item is intended to measure. Drawing on the theory of planned behavior (TPB), the present study investigated whether constructs such as entrepreneurial attitudes, perceived behavioral control, subjective norms and intention would show gender differences and whether these gender differences could be explained by DIF. Using DIF methods on a dataset of 1800 Greek participants (50.4% female) indicated that differences at the item-level are almost non-existent. Moreover, the differential test functioning (DTF) analysis, which allows assessing the overall impact of DIF effects with all items being taken into account simultaneously, suggested that the effect of DIF across all the items for each scale was negligible. Future research should consider that measurement invariance can be assumed when using TPB constructs for the study of entrepreneurial motivation independent of gender. PMID:28386244

  14. Gender-based Differential Item Functioning in the Application of the Theory of Planned Behavior for the Study of Entrepreneurial Intentions.

    PubMed

    Zampetakis, Leonidas A; Bakatsaki, Maria; Litos, Charalambos; Kafetsios, Konstantinos G; Moustakis, Vassilis

    2017-01-01

    Over the past years the percentage of female entrepreneurs has increased, yet it is still far below of that for males. Although various attempts have been made to explain differences in mens' and women's entrepreneurial attitudes and intentions, the extent to which those differences are due to self-report biases has not been yet considered. The present study utilized Differential Item Functioning (DIF) to compare men and women's reporting on entrepreneurial intentions. DIF occurs in situations where members of different groups show differing probabilities of endorsing an item despite possessing the same level of the ability that the item is intended to measure. Drawing on the theory of planned behavior (TPB), the present study investigated whether constructs such as entrepreneurial attitudes, perceived behavioral control, subjective norms and intention would show gender differences and whether these gender differences could be explained by DIF. Using DIF methods on a dataset of 1800 Greek participants (50.4% female) indicated that differences at the item-level are almost non-existent. Moreover, the differential test functioning (DTF) analysis, which allows assessing the overall impact of DIF effects with all items being taken into account simultaneously, suggested that the effect of DIF across all the items for each scale was negligible. Future research should consider that measurement invariance can be assumed when using TPB constructs for the study of entrepreneurial motivation independent of gender.

  15. Differential Item Functioning Assessment in Cognitive Diagnostic Modeling: Application of the Wald Test to Investigate DIF in the DINA Model

    ERIC Educational Resources Information Center

    Hou, Likun; de la Torre, Jimmy; Nandakumar, Ratna

    2014-01-01

    Analyzing examinees' responses using cognitive diagnostic models (CDMs) has the advantage of providing diagnostic information. To ensure the validity of the results from these models, differential item functioning (DIF) in CDMs needs to be investigated. In this article, the Wald test is proposed to examine DIF in the context of CDMs. This study…

  16. Formulating the Rasch Differential Item Functioning Model under the Marginal Maximum Likelihood Estimation Context and Its Comparison with Mantel-Haenszel Procedure in Short Test and Small Sample Conditions

    ERIC Educational Resources Information Center

    Paek, Insu; Wilson, Mark

    2011-01-01

    This study elaborates the Rasch differential item functioning (DIF) model formulation under the marginal maximum likelihood estimation context. Also, the Rasch DIF model performance was examined and compared with the Mantel-Haenszel (MH) procedure in small sample and short test length conditions through simulations. The theoretically known…

  17. Fitting a Mixture Rasch Model to English as a Foreign Language Listening Tests: The Role of Cognitive and Background Variables in Explaining Latent Differential Item Functioning

    ERIC Educational Resources Information Center

    Aryadoust, Vahid

    2015-01-01

    The present study uses a mixture Rasch model to examine latent differential item functioning in English as a foreign language listening tests. Participants (n = 250) took a listening and lexico-grammatical test and completed the metacognitive awareness listening questionnaire comprising problem solving (PS), planning and evaluation (PE), mental…

  18. An analysis of the DuPage County Regional Office of Education physics exam

    NASA Astrophysics Data System (ADS)

    Muehsler, Hans

    In 2009, the DuPage County Regional Office of Education (ROE) tasked volunteer physics teachers with creating a basic skills physics exam reflecting what the participants valued and shared in common across curricula. Mechanics, electricity & magnetism (E&M), and wave phenomena emerged as the primary constructs. The resulting exam was intended for first-exposure physics students. The most recently completed version was psychometrically assessed for unidimensionality within the constructs using a robust WLS structural equation model and for reliability. An item analysis using a 3-PL IRT model was performed on the mechanics items and a 2-PL IRT model was performed on the E&M and waves items; a distractor analysis was also performed on all items. Lastly, differential item functioning (DIF) and differential test functioning (DTF) analyses, using the Mantel-Haenszel procedure, were performed using gender, ethnicity, year in school, ELL, physics level, and math level as groupings.

  19. Differential item functional analysis on pedagogic and content knowledge (PCK) questionnaire for Indonesian teachers using RASCH model

    NASA Astrophysics Data System (ADS)

    Rahmani, B. D.

    2018-01-01

    The purpose of this paper is to evaluate Indonesian senior high school teacher’s pedagogical content knowledge also their perception toward curriculum changing in West Java Indonesia. The data used in this study were derived from a questionnaire survey conducted among teachers in Bandung, West Java. A total of 61 usable responses were collected. The Differential Item Functioning (DIFF) was used to analyze the data whether the item had a difference or not toward gender, education background also on school location. However, the result showed that there was no any significant difference on gender and school location toward the item response but educational background. As a conclusion, the teacher’s educational background influence on giving the response to the questionnaire. Therefore, it is suggested in the future to construct the items on the questionnaire which is coped the differences of the participant particularly the educational background.

  20. Validation of a mobility item bank for older patients in primary care.

    PubMed

    Cabrero-García, Julio; Ramos-Pichardo, Juan Diego; Muñoz-Mendoza, Carmen Luz; Cabañero-Martínez, María José; González-Llopis, Lorena; Reig-Ferrer, Abilio

    2012-12-05

    To develop and validate an item bank to measure mobility in older people in primary care and to analyse differential item functioning (DIF) and differential bundle functioning (DBF) by sex. A pool of 48 mobility items was administered by interview to 593 older people attending primary health care practices. The pool contained four domains based on the International Classification of Functioning: changing and maintaining body position, carrying, lifting and pushing, walking and going up and down stairs. The Late Life Mobility item bank consisted of 35 items, and measured with a reliability of 0.90 or more across the full spectrum of mobility, except at the higher end of better functioning. No evidence was found of non-uniform DIF but uniform DIF was observed, mainly for items in the changing and maintaining body position and carrying, lifting and pushing domains. The walking domain did not display DBF, but the other three domains did, principally the carrying, lifting and pushing items. During the design and validation of an item bank to measure mobility in older people, we found that strength (carrying, lifting and pushing) items formed a secondary dimension that produced DBF. More research is needed to determine how best to include strength items in a mobility measure, or whether it would be more appropriate to design separate measures for each construct.

  1. Item Anomaly Detection Based on Dynamic Partition for Time Series in Recommender Systems

    PubMed Central

    Gao, Min; Tian, Renli; Wen, Junhao; Xiong, Qingyu; Ling, Bin; Yang, Linda

    2015-01-01

    In recent years, recommender systems have become an effective method to process information overload. However, recommendation technology still suffers from many problems. One of the problems is shilling attacks-attackers inject spam user profiles to disturb the list of recommendation items. There are two characteristics of all types of shilling attacks: 1) Item abnormality: The rating of target items is always maximum or minimum; and 2) Attack promptness: It takes only a very short period time to inject attack profiles. Some papers have proposed item anomaly detection methods based on these two characteristics, but their detection rate, false alarm rate, and universality need to be further improved. To solve these problems, this paper proposes an item anomaly detection method based on dynamic partitioning for time series. This method first dynamically partitions item-rating time series based on important points. Then, we use chi square distribution (χ2) to detect abnormal intervals. The experimental results on MovieLens 100K and 1M indicate that this approach has a high detection rate and a low false alarm rate and is stable toward different attack models and filler sizes. PMID:26267477

  2. Item Anomaly Detection Based on Dynamic Partition for Time Series in Recommender Systems.

    PubMed

    Gao, Min; Tian, Renli; Wen, Junhao; Xiong, Qingyu; Ling, Bin; Yang, Linda

    2015-01-01

    In recent years, recommender systems have become an effective method to process information overload. However, recommendation technology still suffers from many problems. One of the problems is shilling attacks-attackers inject spam user profiles to disturb the list of recommendation items. There are two characteristics of all types of shilling attacks: 1) Item abnormality: The rating of target items is always maximum or minimum; and 2) Attack promptness: It takes only a very short period time to inject attack profiles. Some papers have proposed item anomaly detection methods based on these two characteristics, but their detection rate, false alarm rate, and universality need to be further improved. To solve these problems, this paper proposes an item anomaly detection method based on dynamic partitioning for time series. This method first dynamically partitions item-rating time series based on important points. Then, we use chi square distribution (χ2) to detect abnormal intervals. The experimental results on MovieLens 100K and 1M indicate that this approach has a high detection rate and a low false alarm rate and is stable toward different attack models and filler sizes.

  3. Anchor Selection Strategies for DIF Analysis: Review, Assessment, and New Approaches

    ERIC Educational Resources Information Center

    Kopf, Julia; Zeileis, Achim; Strobl, Carolin

    2015-01-01

    Differential item functioning (DIF) indicates the violation of the invariance assumption, for instance, in models based on item response theory (IRT). For item-wise DIF analysis using IRT, a common metric for the item parameters of the groups that are to be compared (e.g., for the reference and the focal group) is necessary. In the Rasch model,…

  4. A general theoretical framework for interpreting patient-reported outcomes estimated from ordinally scaled item responses.

    PubMed

    Massof, Robert W

    2014-10-01

    A simple theoretical framework explains patient responses to items in rating scale questionnaires. Fixed latent variables position each patient and each item on the same linear scale. Item responses are governed by a set of fixed category thresholds, one for each ordinal response category. A patient's item responses are magnitude estimates of the difference between the patient variable and the patient's estimate of the item variable, relative to his/her personally defined response category thresholds. Differences between patients in their personal estimates of the item variable and in their personal choices of category thresholds are represented by random variables added to the corresponding fixed variables. Effects of intervention correspond to changes in the patient variable, the patient's response bias, and/or latent item variables for a subset of items. Intervention effects on patients' item responses were simulated by assuming the random variables are normally distributed with a constant scalar covariance matrix. Rasch analysis was used to estimate latent variables from the simulated responses. The simulations demonstrate that changes in the patient variable and changes in response bias produce indistinguishable effects on item responses and manifest as changes only in the estimated patient variable. Changes in a subset of item variables manifest as intervention-specific differential item functioning and as changes in the estimated person variable that equals the average of changes in the item variables. Simulations demonstrate that intervention-specific differential item functioning produces inefficiencies and inaccuracies in computer adaptive testing. © The Author(s) 2013 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.

  5. Differential item functioning analysis of the Vanderbilt Expertise Test for cars.

    PubMed

    Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W; Van Gulick, Ana Beth; Gauthier, Isabel

    2015-01-01

    The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge.

  6. Occupation-differential construct validity of the Job Content Questionnaire (JCQ) psychological job demands scale with physical job demands items: a mixed methods research.

    PubMed

    Choi, Bongkyoo; Kurowski, Alicia; Bond, Meg; Baker, Dean; Clays, Els; De Bacquer, Dirk; Punnett, Laura

    2012-01-01

    The construct validity of the Job Content Questionnaire (JCQ) psychological demands scale in relationship to physical demands has been inconsistent. This study aims to test quantitatively and qualitatively whether the scale validity differs by occupation. Hierarchical clustering analyses of 10 JCQ psychological and physical demands items were conducted in 61 occupations from two datasets: one of non-faculty workers at a university in the United States (6 occupations with 208 total workers) and the other of a Belgian working population (55 occupations with 13,039 total workers). The psychological and physical demands items overlapped in 13 of 61 occupation-stratified clustering analyses. Most of the overlaps occurred in physically-demanding occupations and involved the two psychological demands items, 'work fast' and 'work hard'. Generally, the scale reliability was low in such occupations. Additionally, interviews with eight university workers revealed that workers interpreted the two psychological demands items differently by the nature of their tasks. The scale validity was occupation-differential. The JCQ psychological job demands scale as a job demand measure has been used worldwide in many studies. This study indicates that the wordings of the 'work fast' and 'work hard' items of the scale need to be reworded enough to differentiate mental and physical job demands as intended, 'psychological.'

  7. Modulation of the electrophysiological correlates of retrieval cue processing by the specificity of task demands.

    PubMed

    Johnson, Jeffrey D; Rugg, Michael D

    2006-02-03

    Retrieval orientation refers to the differential processing of retrieval cues according to the type of information sought from memory (e.g., words vs. pictures). In the present study, event-related potentials (ERPs) were employed to investigate whether the neural correlates of differential retrieval orientations are sensitive to the specificity of the retrieval demands of the test task. In separate study-test phases, subjects encoded lists of intermixed words and pictures, and then undertook one of two retrieval tests, in both of which the retrieval cues were exclusively words. In the recognition test, subjects performed 'old/new' discriminations on the test items, and old items corresponded to only one class of studied material (words or pictures). In the exclusion test, old items corresponded to both classes of study material, and subjects were required to respond 'old' only to test items corresponding to a designated class of material. Thus, demands for retrieval specificity were greater in the exclusion test than during recognition. ERPs elicited by correctly classified new items in the two types of test were contrasted according to whether words or pictures were the sought-for material. Material-dependent ERP effects were evident in both tests, but the effects onset earlier and offset later in the exclusion test. The findings suggest that differential processing of retrieval cues, and hence the adoption of differential retrieval orientations, varies according to the specificity of the retrieval goal.

  8. Validation of a new measure of availability and accommodation of health care that is valid for rural and urban contexts.

    PubMed

    Haggerty, Jeannie L; Levesque, Jean-Frédéric

    2017-04-01

    Patients are the most valid source for evaluating the accessibility of services, but a previous study observed differential psychometric performance of instruments in rural and urban respondents. To validate a measure of organizational accessibility free of differential rural-urban performance that predicts consequences of difficult access for patient-initiated care. Sequential qualitative-quantitative study. Qualitative findings used to adapt or develop evaluative and reporting items. Quantitative validation study. Primary data by telephone from 750 urban, rural and remote respondents in Quebec, Canada; follow-up mailed questionnaire to a subset of 316. Items were developed for barriers along the care trajectory. We used common factor and confirmatory factor analysis to identify constructs and compare models. We used item response theory analysis to test for differential rural-urban performance; examine individual item performance; adjust response options; and exclude redundant or non-discriminatory items. We used logistic regression to examine predictive validity of the subscale on access difficulty (outcome). Initial factor resolution suggested geographic and organizational dimensions, plus consequences of access difficulty. After second administration, organizational accommodation and geographic indicators were integrated into a 6-item subscale of Effective Availability and Accommodation, which demonstrates good variability and internal consistency (α = 0.84) and no differential functioning by geographic area. Each unit increase predicts decreased likelihood of consequences of access difficulties (unmet need and problem aggravation). The new subscale is a practical, valid and reliable measure for patients to evaluate first-contact health services accessibility, yielding valid comparisons between urban and rural contexts. © 2016 The Authors. Health Expectations published by John Wiley & Sons Ltd.

  9. Electronic Quality of Life Assessment Using Computer-Adaptive Testing

    PubMed Central

    2016-01-01

    Background Quality of life (QoL) questionnaires are desirable for clinical practice but can be time-consuming to administer and interpret, making their widespread adoption difficult. Objective Our aim was to assess the performance of the World Health Organization Quality of Life (WHOQOL)-100 questionnaire as four item banks to facilitate adaptive testing using simulated computer adaptive tests (CATs) for physical, psychological, social, and environmental QoL. Methods We used data from the UK WHOQOL-100 questionnaire (N=320) to calibrate item banks using item response theory, which included psychometric assessments of differential item functioning, local dependency, unidimensionality, and reliability. We simulated CATs to assess the number of items administered before prespecified levels of reliability was met. Results The item banks (40 items) all displayed good model fit (P>.01) and were unidimensional (fewer than 5% of t tests significant), reliable (Person Separation Index>.70), and free from differential item functioning (no significant analysis of variance interaction) or local dependency (residual correlations < +.20). When matched for reliability, the item banks were between 45% and 75% shorter than paper-based WHOQOL measures. Across the four domains, a high standard of reliability (alpha>.90) could be gained with a median of 9 items. Conclusions Using CAT, simulated assessments were as reliable as paper-based forms of the WHOQOL with a fraction of the number of items. These properties suggest that these item banks are suitable for computerized adaptive assessment. These item banks have the potential for international development using existing alternative language versions of the WHOQOL items. PMID:27694100

  10. Factor Structure and Reliability of Test Items for Saudi Teacher Licence Assessment

    ERIC Educational Resources Information Center

    Alsadaawi, Abdullah Saleh

    2017-01-01

    The Saudi National Assessment Centre administers the Computer Science Teacher Test for teacher certification. The aim of this study is to explore gender differences in candidates' scores, and investigate dimensionality, reliability, and differential item functioning using confirmatory factor analysis and item response theory. The confirmatory…

  11. Likelihood-Ratio DIF Testing: Effects of Nonnormality

    ERIC Educational Resources Information Center

    Woods, Carol M.

    2008-01-01

    Differential item functioning (DIF) occurs when an item has different measurement properties for members of one group versus another. Likelihood-ratio (LR) tests for DIF based on item response theory (IRT) involve statistically comparing IRT models that vary with respect to their constraints. A simulation study evaluated how violation of the…

  12. Assessment of Preference for Edible and Leisure Items in Individuals with Dementia

    ERIC Educational Resources Information Center

    Ortega, Javier Virues; Iwata, Brian A.; Nogales-Gonzalez, Celia; Frades, Belen

    2012-01-01

    We conducted 2 studies on reinforcer preference in patients with dementia. Results of preference assessments yielded differential selections by 14 participants. Unlike prior studies with individuals with intellectual disabilities, all participants showed a noticeable preference for leisure items over edible items. Results of a subsequent analysis…

  13. Using Rasch Analysis to Examine the Dimensionality Structure and Differential Item Functioning of the Arabic Version of the Perceived Physical Ability Scale for Children

    ERIC Educational Resources Information Center

    Abd-El-Fattah, Sabry M.; AL-Sinani, Yousra; El Shourbagi, Sahar; Fakhroo, Hessa A.

    2014-01-01

    This study uses the Rasch model technique to examine the dimensionality structure and differential item functioning of the Arabic version of the Perceived Physical Ability Scale for Children (PPASC). A sample of 220 Omani fourth graders (120 males and 100 females) responded to an Arabic translated version of the PPASC. Data on students'…

  14. Measurement invariance across educational levels and gender in 12-item Zarit Burden Interview (ZBI) on caregivers of people with dementia.

    PubMed

    Lin, Chung-Ying; Ku, Li-Jung Elizabeth; Pakpour, Amir H

    2017-11-01

    The Zarit Burden Interview (ZBI) is a commonly used self-report to assess caregiver burden. A 12-item short form of the ZBI has been developed; however, its measurement invariance has not been examined across some different demographics. It is unclear whether different genders and educational levels of a population interpret the ZBI items similarly. Therefore, this study aimed to examine the measurement invariance of the 12-item ZBI across gender and educational levels in a Taiwanese sample. Caregivers who had a family member with dementia (n = 270) completed the ZBI through telephone interviews. Three confirmatory factor analysis (CFA) models were conducted: Model 1 was the configural model, Model 2 constrained all factor loadings, Model 3 constrained all factor loadings and item intercepts. Multiple group CFAs and the differential item functioning (DIF) contrast under Rasch analyses were used to detect measurement invariance across males (n = 100) and females (n = 170) and across educational levels of junior high schools and below (n = 86) and senior high schools and above (n = 183). The fit index differences between models supported the measurement invariance across gender and across educational levels (∆ comparative fit index (CFI) = -0.010 and 0.003; ∆ root mean square error of approximation (RMSEA) = -0.006 to 0.004). No substantial DIF contrast was found across gender and educational levels (value = -0.36 to 0.29). The ZBI is appropriate for combined use and for comparisons in caregivers across gender and different educational levels in Taiwan.

  15. The Val30Met familial amyloid polyneuropathy specific Rasch-built overall disability scale (FAP-RODS(©) ).

    PubMed

    Pruppers, Mariëlle H J; Merkies, Ingemar S J; Faber, Catharina G; Da Silva, Ana M; Costa, Vanessa; Coelho, Teresa

    2015-09-01

    Familial amyloid polyneuropathy (FAP) is a chronic debilitating multi-organic disorder, mainly assessed using ordinal-based impairment measures. To date, no outcome measure at the activity and participation level has been constructed in FAP. The current study aimed to design an interval activity/participation scale for FAP through Rasch methodology. A preliminary FAP Rasch-built overall disability scale (pre-FAP-RODS) containing 146 activity/participation items was assessed twice (interval: 2-4 week; test-retest reliability) in 248 patients with Val30Met FAP examined in Porto, Portugal, of which 65.7% have received liver transplantation. An ordinal-based 24-item FAP-symptoms inventory questionnaire (FAP-SIQ) was also assessed (validity purposes). The pre-FAP-RODS and FAP-SIQ data were subjected to Rasch analyses. The pre-FAP-RODS did not meet model's expectations. On the basis of requirements such as misfit statistics, differential item functioning, and local dependency, items were systematically removed until a final 34-item FAP-RODS(©) was constructed fulfilling all Rasch requirements. Acceptable reliability/validity scores were demonstrated. In conclusion, the 34-item FAP-RODS(©) is a disease-specific interval measure suitable for detecting activity and participation restrictions in patients with FAP. The use of the FAP-RODS(©) is recommended for future international clinical trials in patients with Val30Met FAP determining its responsiveness and its cross-cultural validation. Its expansion to other forms of FAP should also be focus of future clinical studies. © 2015 Peripheral Nerve Society.

  16. Sex Differences in Item Functioning in the Comprehensive Inventory of Basic Skills-II Vocabulary Assessments

    ERIC Educational Resources Information Center

    French, Brian F.; Gotch, Chad M.

    2013-01-01

    The Brigance Comprehensive Inventory of Basic Skills-II (CIBS-II) is a diagnostic battery intended for children in grades 1st through 6th. The aim of this study was to test for item invariance, or differential item functioning (DIF), of the CIBS-II across sex in the standardization sample through the use of item response theory DIF detection…

  17. Brief Report: Checklist for Autism Spectrum Disorder--Most Discriminating Items for Diagnosing Autism

    ERIC Educational Resources Information Center

    Mayes, Susan D.

    2018-01-01

    The smallest subset of items from the 30-item Checklist for Autism Spectrum Disorder (CASD) that differentiated 607 referred children (3-17 years) with and without autism with 100% accuracy was identified. This 6-item subset (CASD-Short Form) was cross-validated on an independent sample of 397 referred children (1-18 years) with and without autism…

  18. Cross-Cultural Validation of the Quality of Life in Hand Eczema Questionnaire (QOLHEQ).

    PubMed

    Ofenloch, Robert F; Oosterhaven, Jart A F; Susitaival, Päivikki; Svensson, Åke; Weisshaar, Elke; Minamoto, Keiko; Onder, Meltem; Schuttelaar, Marie Louise A; Bulbul Baskan, Emel; Diepgen, Thomas L; Apfelbacher, Christian

    2017-07-01

    The Quality of Life in Hand Eczema Questionnaire (QOLHEQ) is the only instrument assessing disease-specific health-related quality of life in patients with hand eczema. It is available in eight language versions. In this study we assessed if the items of different language versions of the QOLHEQ yield comparable values across countries. An international multicenter study was conducted with participating centers in Finland, Germany, Japan, The Netherlands, Sweden, and Turkey. Methods of item response theory were applied to each subscale to assess differential item functioning for items among countries. Overall, 662 hand eczema patients were recruited into the study. Single items were removed or split according to the item response theory model by country to resolve differential item functioning. After this adjustment, none of the four subscales of the QOLHEQ showed significant misfit to the item response theory model (P < 0.01), and a Person Separation Index of greater than 0.7 showed good internal consistency for each subscale. By adapting the scoring of the QOLHEQ using the methods of item response theory, it was possible to obtain QOLHEQ values that are comparable across countries. Cross-cultural variations in the interpretation of single items were resolved. The QOLHEQ is now ready to be used in international studies assessing the health-related quality of life impact of hand eczema. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  19. Language-related differential item functioning between English and German PROMIS Depression items is negligible.

    PubMed

    Fischer, H Felix; Wahl, Inka; Nolte, Sandra; Liegl, Gregor; Brähler, Elmar; Löwe, Bernd; Rose, Matthias

    2017-12-01

    To investigate differential item functioning (DIF) of PROMIS Depression items between US and German samples we compared data from the US PROMIS calibration sample (n = 780), a German general population survey (n = 2,500) and a German clinical sample (n = 621). DIF was assessed in an ordinal logistic regression framework, with 0.02 as criterion for R 2 -change and 0.096 for Raju's non-compensatory DIF. Item parameters were initially fixed to the PROMIS Depression metric; we used plausible values to account for uncertainty in depression estimates. Only four items showed DIF. Accounting for DIF led to negligible effects for the full item bank as well as a post hoc simulated computer-adaptive test (< 0.1 point on the PROMIS metric [mean = 50, standard deviation =10]), while the effect on the short forms was small (< 1 point). The mean depression severity (43.6) in the German general population sample was considerably lower compared to the US reference value of 50. Overall, we found little evidence for language DIF between US and German samples, which could be addressed by either replacing the DIF items by items not showing DIF or by scoring the short form in German samples with the corrected item parameters reported. Copyright © 2016 John Wiley & Sons, Ltd.

  20. [Screening interview for early detection of high-functioning autism spectrum disorders].

    PubMed

    Hoffmann, Wiebke; Heinzel-Gutenbrunner, Monika; Becker, Katja; Kamp-Becker, Inge

    2015-05-01

    Various different questionnaires are available for the screening of autism spectrum disorders (ASD). These screening instruments show high sensitivity and are able to identify a large number of individuals with ASD, but they lack the specificity to differentiate individuals with ASD from those children and adolescents with other complex neurobehavioural disorders (such as attention-deficit/hyperactivity disorder, emotional disorders, and others), especially for those without intellectual disabilities. The present study evaluates the data of 309 individuals (153 with high-functioning ASD, 156 with other psychiatric disorders, IQ > 70) to find out whether selected items of the ADI-R can be used for an economic and sensitive screening of high-functioning ASD. The results show that 8 items of the ADI-R can be used to discriminate high-functioning ASD and other psychiatric disorders. A cutoff of 5 led to a sensitivity of 0.93 and a cutoff of 6 to a specificity of 0.74. The combination of early onset, serious abnormalities in social contact with stereotyped or compulsive-ritualized behaviour or interests can be detected with few interview questions for screening of ASD. Nevertheless, a more detailed and specific assessment in an expert setting should follow the screening process.

  1. Evaluation of the Fecal Incontinence Quality of Life Scale (FIQL) using item response theory reveals limitations and suggests revisions.

    PubMed

    Peterson, Alexander C; Sutherland, Jason M; Liu, Guiping; Crump, R Trafford; Karimuddin, Ahmer A

    2018-06-01

    The Fecal Incontinence Quality of Life Scale (FIQL) is a commonly used patient-reported outcome measure for fecal incontinence, often used in clinical trials, yet has not been validated in English since its initial development. This study uses modern methods to thoroughly evaluate the psychometric characteristics of the FIQL and its potential for differential functioning by gender. This study analyzed prospectively collected patient-reported outcome data from a sample of patients prior to colorectal surgery. Patients were recruited from 14 general and colorectal surgeons in Vancouver Coastal Health hospitals in Vancouver, Canada. Confirmatory factor analysis was used to assess construct validity. Item response theory was used to evaluate test reliability, describe item-level characteristics, identify local item dependence, and test for differential functioning by gender. 236 patients were included for analysis, with mean age 58 and approximately half female. Factor analysis failed to identify the lifestyle, coping, depression, and embarrassment domains, suggesting lack of construct validity. Items demonstrated low difficulty, indicating that the test has the highest reliability among individuals who have low quality of life. Five items are suggested for removal or replacement. Differential test functioning was minimal. This study has identified specific improvements that can be made to each domain of the Fecal Incontinence Quality of Life Scale and to the instrument overall. Formatting, scoring, and instructions may be simplified, and items with higher difficulty developed. The lifestyle domain can be used as is. The embarrassment domain should be significantly revised before use.

  2. A travel time forecasting model based on change-point detection method

    NASA Astrophysics Data System (ADS)

    LI, Shupeng; GUANG, Xiaoping; QIAN, Yongsheng; ZENG, Junwei

    2017-06-01

    Travel time parameters obtained from road traffic sensors data play an important role in traffic management practice. A travel time forecasting model is proposed for urban road traffic sensors data based on the method of change-point detection in this paper. The first-order differential operation is used for preprocessing over the actual loop data; a change-point detection algorithm is designed to classify the sequence of large number of travel time data items into several patterns; then a travel time forecasting model is established based on autoregressive integrated moving average (ARIMA) model. By computer simulation, different control parameters are chosen for adaptive change point search for travel time series, which is divided into several sections of similar state.Then linear weight function is used to fit travel time sequence and to forecast travel time. The results show that the model has high accuracy in travel time forecasting.

  3. Screening for elevated levels of fear-avoidance beliefs regarding work or physical activities in people receiving outpatient therapy.

    PubMed

    Hart, Dennis L; Werneke, Mark W; George, Steven Z; Matheson, James W; Wang, Ying-Chih; Cook, Karon F; Mioduski, Jerome E; Choi, Seung W

    2009-08-01

    Screening people for elevated levels of fear-avoidance beliefs is uncommon, but elevated levels of fear could worsen outcomes. Developing short screening tools might reduce the data collection burden and facilitate screening, which could prompt further testing or management strategy modifications to improve outcomes. The purpose of this study was to develop efficient yet accurate screening methods for identifying elevated levels of fear-avoidance beliefs regarding work or physical activities in people receiving outpatient rehabilitation. A secondary analysis of data collected prospectively from people with a variety of common neuromusculoskeletal diagnoses was conducted. Intake Fear-Avoidance Beliefs Questionnaire (FABQ) data were collected from 17,804 people who had common neuromusculoskeletal conditions and were receiving outpatient rehabilitation in 121 clinics in 26 states (in the United States). Item response theory (IRT) methods were used to analyze the FABQ data, with particular emphasis on differential item functioning among clinically logical groups of subjects, and to identify screening items. The accuracy of screening items for identifying subjects with elevated levels of fear was assessed with receiver operating characteristic analyses. Three items for fear of physical activities and 10 items for fear of work activities represented unidimensional scales with adequate IRT model fit. Differential item functioning was negligible for variables known to affect functional status outcomes: sex, age, symptom acuity, surgical history, pain intensity, condition severity, and impairment. Items that provided maximum information at the median for the FABQ scales were selected as screening items to dichotomize subjects by high versus low levels of fear. The accuracy of the screening items was supported for both scales. This study represents a retrospective analysis, which should be replicated using prospective designs. Future prospective studies should assess the reliability and validity of using one FABQ item to screen people for high levels of fear-avoidance beliefs. The lack of differential item functioning in the FABQ scales in the sample tested in this study suggested that FABQ screening could be useful in routine clinical practice and allowed the development of single-item screening for fear-avoidance beliefs that accurately identified subjects with elevated levels of fear. Because screening was accurate and efficient, single IRT-based FABQ screening items are recommended to facilitate improved evaluation and care of heterogeneous populations of people receiving outpatient rehabilitation.

  4. Internal construct validity of the stress-energy questionnaire in a working population, a cohort study.

    PubMed

    Hadzibajramovic, Emina; Ahlborg, Gunnar; Grimby-Ekman, Anna; Lundgren-Nilsson, Åsa

    2015-02-25

    Psychosocial stress at work has been recognised as one of the most important factors behind the increase in sick leave due to stress-related mental disorders. It is therefore important to be able to measure perceived work stress in a way that is both valid and reliable. It has been suggested that the Stress-Energy Questionnaire (SEQ) could be a useful tool for measuring mood (stress and energy) at work and it has been used in many Scandinavian studies. The aim of the study is to examine the internal construct validity of the SEQ in a working population and to address measurement issues, such as the ordering of response categories and potential differences in how women and men use the scale - what is termed differential item functioning (DIF). The data used in the present study is baseline data from a longitudinal cohort study aimed at evaluating psychosocial working conditions, stress, health and well-being among employees in two human service organisations in Western Sweden. A modern psychometric approach for scale validations, the Rasch model, was used. Stress items showed a satisfactory fit to the model. Problems related to unidimensionality and local dependence were found when the six stress items were fitted to the model, but these could be resolved by using two testlets. As regards the energy scale, although the final analysis showed an acceptable fit to the model some scale problems were identified. The item dull had disordered thresholds and DIF for gender was detected for the item passive. The items were not well targeted to the persons, with skewness towards high energy. This might explain the scale problems that were detected but these problems need to be investigated in a group where the level of energy is spread across the trait, measured by the SEQ. The stress scale of the SEQ has good psychometric properties and provides a useful tool for assessing work-related stress, on both group and individual levels. However, the limitations of the energy scale make it suitable for group evaluations only. The energy scale needs to be evaluated further in different settings and populations.

  5. The aftermath of memory retrieval for recycling visual working memory representations.

    PubMed

    Park, Hyung-Bum; Zhang, Weiwei; Hyun, Joo-Seok

    2017-07-01

    We examined the aftermath of accessing and retrieving a subset of information stored in visual working memory (VWM)-namely, whether detection of a mismatch between memory and perception can impair the original memory of an item while triggering recognition-induced forgetting for the remaining, untested items. For this purpose, we devised a consecutive-change detection task wherein two successive testing probes were displayed after a single set of memory items. Across two experiments utilizing different memory-testing methods (whole vs. single probe), we observed a reliable pattern of poor performance in change detection for the second test when the first test had exhibited a color change. The impairment after a color change was evident even when the same memory item was repeatedly probed; this suggests that an attention-driven, salient visual change made it difficult to reinstate the previously remembered item. The second change detection, for memory items untested during the first change detection, was also found to be inaccurate, indicating that recognition-induced forgetting had occurred for the unprobed items in VWM. In a third experiment, we conducted a task that involved change detection plus continuous recall, wherein a memory recall task was presented after the change detection task. The analyses of the distributions of recall errors with a probabilistic mixture model revealed that the memory impairments from both visual changes and recognition-induced forgetting are explained better by the stochastic loss of memory items than by their degraded resolution. These results indicate that attention-driven visual change and recognition-induced forgetting jointly influence the "recycling" of VWM representations.

  6. Checking Equity: Why Differential Item Functioning Analysis Should Be a Routine Part of Developing Conceptual Assessments

    PubMed Central

    Martinková, Patrícia; Drabinová, Adéla; Liaw, Yuan-Ling; Sanders, Elizabeth A.; McFarland, Jenny L.; Price, Rebecca M.

    2017-01-01

    We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments. After explaining a number of methodological approaches, we test for gender bias in two scenarios that demonstrate why DIF analysis is crucial for developing assessments, particularly because simply comparing two groups’ total scores can lead to incorrect conclusions about test fairness. First, a significant difference between groups on total scores can exist even when items are not biased, as we illustrate with data collected during the validation of the Homeostasis Concept Inventory. Second, item bias can exist even when the two groups have exactly the same distribution of total scores, as we illustrate with a simulated data set. We also present a brief overview of how DIF analysis has been used in the biology education literature to illustrate the way DIF items need to be reevaluated by content experts to determine whether they should be revised or removed from the assessment. Finally, we conclude by arguing that DIF analysis should be used routinely to evaluate items in developing conceptual assessments. These steps will ensure more equitable—and therefore more valid—scores from conceptual assessments. PMID:28572182

  7. Use of Automated Scoring Features to Generate Hypotheses Regarding Language-Based DIF

    ERIC Educational Resources Information Center

    Shermis, Mark D.; Mao, Liyang; Mulholland, Matthew; Kieftenbeld, Vincent

    2017-01-01

    This study uses the feature sets employed by two automated scoring engines to determine if a "linguistic profile" could be formulated that would help identify items that are likely to exhibit differential item functioning (DIF) based on linguistic features. Sixteen items were administered to 1200 students where demographic information…

  8. Differential Item Functioning Amplification and Cancellation in a Reading Test

    ERIC Educational Resources Information Center

    Bao, Han; Dayton, C. Mitchell; Hendrickson, Amy B.

    2009-01-01

    When testlet effects and item idiosyncratic features are both considered to be the reasons of DIF in educational tests using testlets (Wainer & Kiely, 1987) or item bundles (Rosenbaum, 1988), it is interesting to investigate the phenomena of DIF amplification and cancellation due to the interactive effects of these two factors. This research…

  9. A Proposed System of "Project Management" for Study Items.

    ERIC Educational Resources Information Center

    Worcester Public Schools, MA.

    The purposes of the proposed system are to provide a standard operating procedure for a systematic and effective handling of project-type study items as differentiated from informational-type items; to assign definite singular responsibility for projects; to suggest specific sequential steps to be taken in the preparation of the project report;…

  10. Gender Invariance of the Gambling Behavior Scale for Adolescents (GBS-A): An Analysis of Differential Item Functioning Using Item Response Theory.

    PubMed

    Donati, Maria Anna; Chiesi, Francesca; Izzo, Viola A; Primi, Caterina

    2017-01-01

    As there is a lack of evidence attesting the equivalent item functioning across genders for the most employed instruments used to measure pathological gambling in adolescence, the present study was aimed to test the gender invariance of the Gambling Behavior Scale for Adolescents (GBS-A), a new measurement tool to assess the severity of Gambling Disorder (GD) in adolescents. The equivalence of the items across genders was assessed by analyzing Differential Item Functioning within an Item Response Theory framework. The GBS-A was administered to 1,723 adolescents, and the graded response model was employed. The results attested the measurement equivalence of the GBS-A when administered to male and female adolescent gamblers. Overall, findings provided evidence that the GBS-A is an effective measurement tool of the severity of GD in male and female adolescents and that the scale was unbiased and able to relieve truly gender differences. As such, the GBS-A can be profitably used in educational interventions and clinical treatments with young people.

  11. Evaluating HIV Knowledge Questionnaires Among Men Who Have Sex with Men: A Multi-Study Item Response Theory Analysis.

    PubMed

    Janulis, Patrick; Newcomb, Michael E; Sullivan, Patrick; Mustanski, Brian

    2018-01-01

    Knowledge about the transmission, prevention, and treatment of HIV remains a critical element in psychosocial models of HIV risk behavior and is commonly used as an outcome in HIV prevention interventions. However, most HIV knowledge questions have not undergone rigorous psychometric testing such as using item response theory. The current study used data from six studies of men who have sex with men (MSM; n = 3565) to (1) examine the item properties of HIV knowledge questions, (2) test for differential item functioning on commonly studied characteristics (i.e., age, race/ethnicity, and HIV risk behavior), (3) select items with the optimal item characteristics, and (4) leverage this combined dataset to examine the potential moderating effect of age on the relationship between condomless anal sex (CAS) and HIV knowledge. Findings indicated that existing questions tend to poorly differentiate those with higher levels of HIV knowledge, but items were relatively robust across diverse individuals. Furthermore, age moderated the relationship between CAS and HIV knowledge with older MSM having the strongest association. These findings suggest that additional items are required in order to capture a more nuanced understanding of HIV knowledge and that the association between CAS and HIV knowledge may vary by age.

  12. Differential item functioning analysis of the Vanderbilt Expertise Test for cars

    PubMed Central

    Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W.; Van Gulick, Ana Beth; Gauthier, Isabel

    2015-01-01

    The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge. PMID:26418499

  13. Identifying Country-Specific Cultures of Physics Education: A differential item functioning approach

    NASA Astrophysics Data System (ADS)

    Mesic, Vanes

    2012-11-01

    In international large-scale assessments of educational outcomes, student achievement is often represented by unidimensional constructs. This approach allows for drawing general conclusions about country rankings with respect to the given achievement measure, but it typically does not provide specific diagnostic information which is necessary for systematic comparisons and improvements of educational systems. Useful information could be obtained by exploring the differences in national profiles of student achievement between low-achieving and high-achieving countries. In this study, we aimed to identify the relative weaknesses and strengths of eighth graders' physics achievement in Bosnia and Herzegovina in comparison to the achievement of their peers from Slovenia. For this purpose, we ran a secondary analysis of Trends in International Mathematics and Science Study (TIMSS) 2007 data. The student sample consisted of 4,220 students from Bosnia and Herzegovina and 4,043 students from Slovenia. After analysing the cognitive demands of TIMSS 2007 physics items, the correspondent differential item functioning (DIF)/differential group functioning contrasts were estimated. Approximately 40% of items exhibited large DIF contrasts, indicating significant differences between cultures of physics education in Bosnia and Herzegovina and Slovenia. The relative strength of students from Bosnia and Herzegovina showed to be mainly associated with the topic area 'Electricity and magnetism'. Classes of items which required the knowledge of experimental method, counterintuitive thinking, proportional reasoning and/or the use of complex knowledge structures proved to be differentially easier for students from Slovenia. In the light of the presented results, the common practice of ranking countries with respect to universally established cognitive categories seems to be potentially misleading.

  14. Negative Symptom Dimensions of the Positive and Negative Syndrome Scale Across Geographical Regions

    PubMed Central

    Liharska, Lora; Harvey, Philip D.; Atkins, Alexandra; Ulshen, Daniel; Keefe, Richard S.E.

    2017-01-01

    Objective: Recognizing the discrete dimensions that underlie negative symptoms in schizophrenia and how these dimensions are understood across localities might result in better understanding and treatment of these symptoms. To this end, the objectives of this study were to 1) identify the Positive and Negative Syndrome Scale negative symptom dimensions of expressive deficits and experiential deficits and 2) analyze performance on these dimensions over 15 geographical regions to determine whether the items defining them manifest similar reliability across these regions. Design: Data were obtained for the baseline Positive and Negative Syndrome Scale visits of 6,889 subjects across 15 geographical regions. Using confirmatory factor analysis, we examined whether a two-factor negative symptom structure that is found in schizophrenia (experiential deficits and expressive deficits) would be replicated in our sample, and using differential item functioning, we tested the degree to which specific items from each negative symptom subfactor performed across geographical regions in comparison with the United States. Results: The two-factor negative symptom solution was replicated in this sample. Most geographical regions showed moderate-to-large differential item functioning for Positive and Negative Syndrome Scale expressive deficit items, especially N3 Poor Rapport, as compared with Positive and Negative Syndrome Scale experiential deficit items, showing that these items might be interpreted or scored differently in different regions. Across countries, except for India, the differential item functioning values did not favor raters in the United States. Conclusion: These results suggest that the Positive and Negative Syndrome Scale negative symptom factor can be better represented by a two-factor model than by a single-factor model. Additionally, the results show significant differences in responses to items representing the Positive and Negative Syndrome Scale expressive factors, but not the experiential factors, across regions. This could be due to a lack of equivalence between the original and translated versions, cultural differences with the interpretation of items, dissimilarities in rater training, or diversity in the understanding of scoring anchors. Knowing which items are challenging for raters across regions can help to guide Positive and Negative Syndrome Scale training and improve the results of international clinical trials aimed at negative symptoms. PMID:29410935

  15. Item Response Theory Applied to Factors Affecting the Patient Journey Towards Hearing Rehabilitation

    PubMed Central

    Chenault, Michelene; Berger, Martijn; Kremer, Bernd; Anteunis, Lucien

    2016-01-01

    To develop a tool for use in hearing screening and to evaluate the patient journey towards hearing rehabilitation, responses to the hearing aid rehabilitation questionnaire scales aid stigma, pressure, and aid unwanted addressing respectively hearing aid stigma, experienced pressure from others; perceived hearing aid benefit were evaluated with item response theory. The sample was comprised of 212 persons aged 55 years or more; 63 were hearing aid users, 64 with and 85 persons without hearing impairment according to guidelines for hearing aid reimbursement in the Netherlands. Bias was investigated relative to hearing aid use and hearing impairment within the differential test functioning framework. Items compromising model fit or demonstrating differential item functioning were dropped. The aid stigma scale was reduced from 6 to 4, the pressure scale from 7 to 4, and the aid unwanted scale from 5 to 4 items. This procedure resulted in bias-free scales ready for screening purposes and application to further understand the help-seeking process of the hearing impaired. PMID:28028428

  16. The representation of conceptual knowledge: visual, auditory, and olfactory imagery compared with semantic processing.

    PubMed

    Palmiero, Massimiliano; Di Matteo, Rosalia; Belardinelli, Marta Olivetti

    2014-05-01

    Two experiments comparing imaginative processing in different modalities and semantic processing were carried out to investigate the issue of whether conceptual knowledge can be represented in different format. Participants were asked to judge the similarity between visual images, auditory images, and olfactory images in the imaginative block, if two items belonged to the same category in the semantic block. Items were verbally cued in both experiments. The degree of similarity between the imaginative and semantic items was changed across experiments. Experiment 1 showed that the semantic processing was faster than the visual and the auditory imaginative processing, whereas no differentiation was possible between the semantic processing and the olfactory imaginative processing. Experiment 2 revealed that only the visual imaginative processing could be differentiated from the semantic processing in terms of accuracy. These results showed that the visual and auditory imaginative processing can be differentiated from the semantic processing, although both visual and auditory images strongly rely on semantic representations. On the contrary, no differentiation is possible within the olfactory domain. Results are discussed in the frame of the imagery debate.

  17. Incidental histopathological findings in hearts of control beagle dogs in toxicity studies.

    PubMed

    Bodié, Karen; Decker, Joshua H

    2014-08-01

    In preclinical studies of pharmaceutical agents, the beagle dog is a commonly used model for the detection of cardiotoxicity. Incidental findings, postmortem changes, and artifacts must be distinguished histopathologically from test item-related findings in the heart. In this retrospective analysis, cardiac sections from 88 control beagles (41 male, 47 female; ages 5-18 months) in preclinical studies were examined histopathologically. The most common finding was thickening of the tunica media of intramural coronary arteries, most likely a postmortem change. The second most common finding was the presence of vacuoles within Purkinje fibers. Dilated lymphatic and blood vessels at the insertion of chordae tendineae were noted more commonly in males than in females and were considered a normal anatomic feature. Mesothelial-lined papillary fronds along the epicardial surface of the atria were present in several dogs, as were small infiltrates of inflammatory cells usually within the myocardium. In summary, control beagles' hearts frequently have incidental findings that must be differentiated from test item-related pathologic changes. Historical control data can be useful for the interpretation of incidental and test item-related findings in the beagle heart. © 2013 by The Author(s).

  18. The “Good Cop, Bad Cop” Effect in the RT-Based Concealed Information Test: Exploring the Effect of Emotional Expressions Displayed by a Virtual Investigator

    PubMed Central

    Varga, Mihai; Visu-Petra, George; Miclea, Mircea; Visu-Petra, Laura

    2015-01-01

    Concealing the possession of relevant information represents a complex cognitive process, shaped by contextual demands and individual differences in cognitive and socio-emotional functioning. The Reaction Time-based Concealed Information Test (RT-CIT) is used to detect concealed knowledge based on the difference in RTs between denying recognition of critical (probes) and newly encountered (irrelevant) information. Several research questions were addressed in this scenario implemented after a mock crime. First, we were interested whether the introduction of a social stimulus (facial identity) simulating a virtual investigator would facilitate the process of deception detection. Next, we explored whether his emotional displays (friendly, hostile or neutral) would have a differential impact on speed of responses to probe versus irrelevant items. We also compared the impact of introducing similar stimuli in a working memory (WM) updating context without requirements to conceal information. Finally, we explored the association between deceptive behavior and individual differences in WM updating proficiency or in internalizing problems (state / trait anxiety and depression). Results indicated that the mere presence of a neutral virtual investigator slowed down participants' responses, but not the appended lie-specific time (difference between probes and irrelevants). Emotional expression was shown to differentially affect speed of responses to critical items, with positive displays from the virtual examiner enhancing lie-specific time, compared to negative facial expressions, which had an opposite impact. This valence-specific effect was not visible in the WM updating context. Higher levels of trait / state anxiety were related to faster responses to probes in the negative condition (hostile facial expression) of the RT-CIT. These preliminary findings further emphasize the need to take into account motivational and emotional factors when considering the transfer of deception detection techniques from the laboratory to real-life settings. PMID:25699516

  19. Some Memories are Odder than Others: Judgments of Episodic Oddity Violate Known Decision Rules

    PubMed Central

    O’Connor, Akira R.; Guhl, Emily N.; Cox, Justin C.; Dobbins, Ian G.

    2011-01-01

    Current decision models of recognition memory are based almost entirely on one paradigm, single item old/new judgments accompanied by confidence ratings. This task results in receiver operating characteristics (ROCs) that are well fit by both signal-detection and dual-process models. Here we examine an entirely new recognition task, the judgment of episodic oddity, whereby participants select the mnemonically odd members of triplets (e.g., a new item hidden among two studied items). Using the only two known signal-detection rules of oddity judgment derived from the sensory perception literature, the unequal variance signal-detection model predicted that an old item among two new items would be easier to discover than a new item among two old items. In contrast, four separate empirical studies demonstrated the reverse pattern: triplets with two old items were the easiest to resolve. This finding was anticipated by the dual-process approach as the presence of two old items affords the greatest opportunity for recollection. Furthermore, a bootstrap-fed Monte Carlo procedure using two independent datasets demonstrated that the dual-process parameters typically observed during single item recognition correctly predict the current oddity findings, whereas unequal variance signal-detection parameters do not. Episodic oddity judgments represent a case where dual- and single-process predictions qualitatively diverge and the findings demonstrate that novelty is “odder” than familiarity. PMID:22833695

  20. Multispectral imaging system for contaminant detection

    NASA Technical Reports Server (NTRS)

    Poole, Gavin H. (Inventor)

    2003-01-01

    An automated inspection system for detecting digestive contaminants on food items as they are being processed for consumption includes a conveyor for transporting the food items, a light sealed enclosure which surrounds a portion of the conveyor, with a light source and a multispectral or hyperspectral digital imaging camera disposed within the enclosure. Operation of the conveyor, light source and camera are controlled by a central computer unit. Light reflected by the food items within the enclosure is detected in predetermined wavelength bands, and detected intensity values are analyzed to detect the presence of digestive contamination.

  1. Psychometric properties of the Triarchic Psychopathy Measure: An item response theory approach.

    PubMed

    Shou, Yiyun; Sellbom, Martin; Xu, Jing

    2018-05-01

    There is cumulative evidence for the cross-cultural validity of the Triarchic Psychopathy Measure (TriPM; Patrick, 2010) among non-Western populations. Recent studies using correlational and regression analyses show promising construct validity of the TriPM in Chinese samples. However, little is known about the efficiency of items in TriPM in assessing the proposed latent traits. The current study evaluated the psychometric properties of the Chinese TriPM at the item level using item response theory analyses. It also examined the measurement invariance of the TriPM between the Chinese and the U.S. student samples by applying differential item functioning analyses under the item response theory framework. The results supported the unidimensional nature of the Disinhibition and Meanness scales. Both scales had a greater level of precision in the respective underlying constructs at the positive ends. The two scales, however, had several items that were weakly associated with their respective latent traits in the Chinese student sample. Boldness, on the other hand, was found to be multidimensional, and reflected a more normally distributed range of variation. The examination of measurement bias via differential item functioning analyses revealed that a number of items of the TriPM were not equivalent across the Chinese and the U.S. Some modification and adaptation of items might be considered for improving the precision of the TriPM for Chinese participants. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  2. Depression symptoms across cultures: an IRT analysis of standard depression symptoms using data from eight countries.

    PubMed

    Haroz, E E; Bolton, P; Gross, A; Chan, K S; Michalopoulos, L; Bass, J

    2016-07-01

    Prevalence estimates of depression vary between countries, possibly due to differential functioning of items between settings. This study compared the performance of the widely used Hopkins symptom checklist 15-item depression scale (HSCL-15) across multiple settings using item response theory analyses. Data came from adult populations in the low and middle income countries (LMIC) of Colombia, Indonesia, Kurdistan Iraq, Rwanda, Iraq, Thailand (Burmese refugees), and Uganda (N = 4732). Item parameters based on a graded response model were compared across LMIC settings. Differential item functioning (DIF) by setting was evaluated using multiple indicators multiple causes (MIMIC) models. Most items performed well across settings except items related to suicidal ideation and "loss of sexual interest or pleasure," which had low discrimination parameters (suicide: a = 0.31 in Thailand to a = 2.49 in Indonesia; sexual interest: a = 0.74 in Rwanda to a = 1.26 in one region of Kurdistan). Most items showed some degree of DIF, but DIF only impacted aggregate scale-level scores in Indonesia. Thirteen of the 15 HSCL depression items performed well across diverse settings, with most items showing a strong relationship to the underlying trait of depression. The results support the cross-cultural applicability of most of these depression symptoms across LMIC settings. DIF impacted aggregate depression scores in one setting illustrating a possible source of measurement invariance in prevalence estimates.

  3. Using Response-Time Constraints in Item Selection To Control for Differential Speededness in Computerized Adaptive Testing. LSAC Research Report Series.

    ERIC Educational Resources Information Center

    van der Linden, Wim J.; Scrams, David J.; Schnipke, Deborah L.

    This paper proposes an item selection algorithm that can be used to neutralize the effect of time limits in computer adaptive testing. The method is based on a statistical model for the response-time distributions of the test takers on the items in the pool that is updated each time a new item has been administered. Predictions from the model are…

  4. A Comparison of Measurement Equivalence Methods Based on Confirmatory Factor Analysis and Item Response Theory.

    ERIC Educational Resources Information Center

    Flowers, Claudia P.; Raju, Nambury S.; Oshima, T. C.

    Current interest in the assessment of measurement equivalence emphasizes two methods of analysis, linear, and nonlinear procedures. This study simulated data using the graded response model to examine the performance of linear (confirmatory factor analysis or CFA) and nonlinear (item-response-theory-based differential item function or IRT-Based…

  5. IRT-LR-DIF with Estimation of the Focal-Group Density as an Empirical Histogram

    ERIC Educational Resources Information Center

    Woods, Carol M.

    2008-01-01

    Item response theory-likelihood ratio-differential item functioning (IRT-LR-DIF) is used to evaluate the degree to which items on a test or questionnaire have different measurement properties for one group of people versus another, irrespective of group-mean differences on the construct. Usually, the latent distribution is presumed normal for both…

  6. Item Analysis and Differential Item Functioning of a Brief Conduct Problem Screen

    ERIC Educational Resources Information Center

    Wu, Johnny; King, Kevin M.; Witkiewitz, Katie; Racz, Sarah Jensen; McMahon, Robert J.

    2012-01-01

    Research has shown that boys display higher levels of childhood conduct problems than girls, and Black children display higher levels than White children, but few studies have tested for scalar equivalence of conduct problems across gender and race. The authors conducted a 2-parameter item response theory (IRT) model to examine item…

  7. Examining the Effectiveness of Test Accommodation Using DIF and a Mixture IRT Model

    ERIC Educational Resources Information Center

    Cho, Hyun-Jeong; Lee, Jaehoon; Kingston, Neal

    2012-01-01

    This study examined the validity of test accommodation in third-eighth graders using differential item functioning (DIF) and mixture IRT models. Two data sets were used for these analyses. With the first data set (N = 51,591) we examined whether item type (i.e., story, explanation, straightforward) or item features were associated with item…

  8. Using Cochran's Z Statistic to Test the Kernel-Smoothed Item Response Function Differences between Focal and Reference Groups

    ERIC Educational Resources Information Center

    Zheng, Yinggan; Gierl, Mark J.; Cui, Ying

    2010-01-01

    This study combined the kernel smoothing procedure and a nonparametric differential item functioning statistic--Cochran's Z--to statistically test the difference between the kernel-smoothed item response functions for reference and focal groups. Simulation studies were conducted to investigate the Type I error and power of the proposed…

  9. Estimating the Reliability of a Test Battery Composite or a Test Score Based on Weighted Item Scoring

    ERIC Educational Resources Information Center

    Feldt, Leonard S.

    2004-01-01

    In some settings, the validity of a battery composite or a test score is enhanced by weighting some parts or items more heavily than others in the total score. This article describes methods of estimating the total score reliability coefficient when differential weights are used with items or parts.

  10. Interest Inventory Items as Reinforcing Stimuli: A Test of the A-R-D Theory.

    ERIC Educational Resources Information Center

    Staats, Arthur W.; And Others

    An experiement was conducted to test the hypothesis that interest inventory items would function as reinforcing stimuli in a visual discrimination task. When previously rated liked and disliked items from the Strong Vocational Interest Blank were differentially presented following one of two responses, subjects learned to respond to the stimulus…

  11. Gender Differences in Figural Matrices: The Moderating Role of Item Design Features

    ERIC Educational Resources Information Center

    Arendasy, Martin E.; Sommer, Markus

    2012-01-01

    There is a heated debate on whether observed gender differences in some figural matrices in adults can be attributed to gender differences in inductive reasoning/G[subscript f] or differential item functioning and/or test bias. Based on previous studies we hypothesized that three specific item design features moderate the effect size of the gender…

  12. Methodologies for Investigating Item- and Test-Level Measurement Equivalence in International Large-Scale Assessments

    ERIC Educational Resources Information Center

    Oliveri, Maria Elena; Olson, Brent F.; Ercikan, Kadriye; Zumbo, Bruno D.

    2012-01-01

    In this study, the Canadian English and French versions of the Problem-Solving Measure of the Programme for International Student Assessment 2003 were examined to investigate their degree of measurement comparability at the item- and test-levels. Three methods of differential item functioning (DIF) were compared: parametric and nonparametric item…

  13. Validity of personality measurement in adults with anxiety disorders: psychometric properties of the Spanish NEO-FFI-R using Rasch analyses

    PubMed Central

    Inchausti, Felix; Mole, Joe; Fonseca-Pedrero, Eduardo; Ortuño-Sierra, Javier

    2015-01-01

    The aim of this study was to analyse the psychometric properties of the Spanish NEO Five Factor Inventory–Revised (NEO-FFI-R) using Rasch analyses, in order to test its rating scale functioning, the reliability of scores, internal structure, and differential item functioning (DIF) by gender in a psychiatric sample. The NEO-FFI-R responses of 433 Spanish adults (154 males) with an anxiety disorder as primary diagnosis were analysed using the Rasch model for rating scales. Two intermediate categories of response (‘neutral’ and ‘agree’) malfunctioned in the Neuroticism and Conscientiousness scales. In addition, model reliabilities were lower than expected in Agreeableness and Neuroticism, and the item fit values indicated each scale had items that did not achieve moderate to high discrimination on its dimension, particularly in the Agreeableness scale. Concerning unidimensionality, the five NEO-FFI-R scales showed large first components of unexplained variance. Finally, DIF by gender was detected in many items. The results suggest that the scores of the Spanish NEO-FFI-R are unreliable in psychiatric samples and cannot be generalized between males and females, especially in the Openness, Conscientiousness, and Agreeableness scales. Future directions for testing and refinement should be developed before the NEO-FFI-R can be used reliably in clinical samples. PMID:25954224

  14. Screening for depression in arthritis populations: an assessment of differential item functioning in three self-reported questionnaires.

    PubMed

    Hu, Jinxiang; Ward, Michael M

    2017-09-01

    To determine if persons with arthritis differ systematically from persons without arthritis in how they respond to questions on three depression questionnaires, which include somatic items such as fatigue and sleep disturbance. We extracted data on the Centers for Epidemiological Studies Depression (CES-D) scale, the Patient Health Questionnaire-9 (PHQ-9), and the Kessler-6 (K-6) scale from three large population-based national surveys. We assessed items on these questionnaires for differential item functioning (DIF) between persons with and without self-reported physician-diagnosed arthritis using multiple indicator multiple cause models, which controlled for the underlying level of depression and important confounders. We also examined if DIF by arthritis status was similar between women and men. Although five items of the CES-D, one item of the PHQ-9, and five items of the K-6 scale had evidence of DIF based on statistical comparisons, the magnitude of each difference was less than the threshold of a small effect. The statistical differences were a function of the very large sample sizes in the surveys. Effect sizes for DIF were similar between women and men except for two items on the Patient Health Questionnaire-9. For each questionnaire, DIF accounted for 8% or less of the arthritis-depression association, and excluding items with DIF did not reduce the difference in depression scores between those with and without arthritis. Persons with arthritis respond to items on the CES-D, PHQ-9, and K-6 depression scales similarly to persons without arthritis, despite the inclusion of somatic items in these scales.

  15. Initial validation of a scale to measure purposelessness, understimulation, and boredom in cancer patients: toward a redefinition of depression in advanced disease.

    PubMed

    Passik, Steven D; Inman, Alice; Kirsh, Kenneth; Theobald, Dale; Dickerson, Pamela

    2003-03-01

    The problem of boredom in people with cancer has received little research attention, and yet clinical experience suggests that it has the potential to profoundly affect quality of life in those patients. We were interested in developing a Purposelessness, Understimulation, and Boredom (PUB) Scale to identify this problem and to begin to differentiate it from depression. Cancer patients and professionals were interviewed using a semi-structured format to elicit their perceptions of the incidence, causes, scope, and consequences of boredom. From their responses, 45 questions were developed, edited for clarity, and piloted. A total of 100 cancer patients were recruited to participate in the study. Preliminary validation of the PUB using a cross-sectional survey of the measure was conducted. Other instruments used for purposes of convergent and divergent validity included the Functional Assessment of Cancer Therapy Scale-Anemia, Zung Self-Rating Depression Scale, Boredom Proneness Scale, Leisure Boredom Scale, Cancer Behavior Inventory, Systems of Belief Inventory, and the Eastern Cooperative Oncology Group Performance Status Scale. The average age of the sample was 62.37 years (SD = 13.43) and was comprised of 60 women (60.00%) and 40 men (40.00%). The results of a factor analysis on the 45 initial items (selected on the basis of professional and patient interviews) created a two-factor scale. The eight items from the strongest factor (items 1, 2, 3, 4, 5, 6, 9, 10) seemed to best tap the construct that could be deemed as overt boredom whereas the six items of the second factor (items 36, 38, 39, 42, 44, 45) seemed to tap the construct of boredom related to meaning and spirituality. Total scale internal consistency, when all 14 items were included in the analysis, yielded a coefficient alpha of 0.84 and good test-retest reliability at 2 weeks (r = .80, p < .001). The novel 14-item PUB Scale was significantly correlated to other measures of boredom; the Boredom Proneness Scale (r = -.588, p < .001) and the Leisure Boredom Scale (r = .576, p < .001). The PUB Scale was found to be a statistically viable tool with the ability to detect boredom and differentiate it from depression. In many respects this work is in concert with much of the current research and clinical effort going on in psycho-oncology that defines components of distress that in sum, redefines depression in advanced cancer.

  16. Using the Patient Health Questionnaire-9 to measure depression among racially and ethnically diverse primary care patients.

    PubMed

    Huang, Frederick Y; Chung, Henry; Kroenke, Kurt; Delucchi, Kevin L; Spitzer, Robert L

    2006-06-01

    The Patient Health Questionnaire depression scale (PHQ-9) is a well-validated, Diagnostic and Statistical Manual of Mental Disorders- Fourth Edition (DSM-IV) criterion-based measure for diagnosing depression, assessing severity and monitoring treatment response. The performance of most depression scales including the PHQ-9, however, has not been rigorously evaluated in different racial/ethnic populations. Therefore, we compared the factor structure of the PHQ-9 between different racial/ethnic groups as well as the rates of endorsement and differential item functioning (DIF) of the 9 items of the PHQ-9. The presence of DIF would indicate that responses to an individual item differ significantly between groups, controlling for the level of depression. A combined dataset from 2 separate studies of 5,053 primary care patients including non-Hispanic white (n=2,520), African American (n=598), Chinese American (n=941), and Latino (n=974) patients was used for our analysis. Exploratory principal components factor analysis was used to derive the factor structure of the PHQ-9 in each of the 4 racial/ethnic groups. A generalized Mantel-Haenszel statistic was used to test for DIF. One main factor that included all PHQ-9 items was found in each racial/ethnic group with alpha coefficients ranging from 0.79 to 0.89. Although endorsement rates of individual items were generally similar among the 4 groups, evidence of DIF was found for some items. Our analyses indicate that in African American, Chinese American, Latino, and non-Hispanic white patient groups the PHQ-9 measures a common concept of depression and can be effective for the detection and monitoring of depression in these diverse populations.

  17. Articulatory rehearsal in verbal working memory: a possible neurocognitive endophenotype that differentiates between schizophrenia and schizoaffective disorder.

    PubMed

    Gruber, Oliver; Gruber, Eva; Falkai, Peter

    2006-09-11

    Recent fMRI studies have identified brain systems underlying different components of working memory in healthy individuals. The aim of this study was to compare the functional integrity of these neural networks in terms of behavioural performance in patients with schizophrenia, schizoaffective disorder and healthy controls. In order to detect specific working memory deficits based on dysfunctions of underlying brain circuits we used the same verbal and visuospatial Sternberg item-recognition tasks as in previous neuroimaging studies. Clinical and performance data from matched groups consisting of 14 subjects each were statistically analyzed. Schizophrenic patients exhibited pronounced impairments of both verbal and visuospatial working memory, whereas verbal working memory performance was preserved in schizoaffective patients. The findings provide first evidence that dysfunction of a brain system subserving articulatory rehearsal could represent a biological marker which differentiates between schizophrenia and schizoaffective disorder.

  18. Acid phosphatase test on Phadebas® sheets - An optimized method for presumptive saliva and semen detection.

    PubMed

    Herman, Yael; Feine, Ilan; Gafny, Ron

    2018-04-30

    The precise and efficient detection of semen and saliva in sexual assault case-work items is a critical step in the forensic pipeline. The outcome of this stage may have a profound impact on identifying perpetrators as well as on the investigation process and the final outcome in court. Semen detection is usually based on the activity of acid phosphatase (AP), an enzyme found in high concentration in the seminal plasma. Amylase, an enzyme catalyzing starch hydrolysis is found in high concentrations in saliva and therefore is a useful target for its detection. To screen case-work items, both presumptive tests require transfer of biological material from the item to paper in a moisturized environment. Since semen and saliva may appear in the same item, it is required in some cases to perform the tests one after the other. This may reduce the chances of identifying all stains on the item and obtaining a DNA profile. In the present study, we applied the AP biochemical test on a Phadebas ® sheet, a commercial starch containing paper used to detect saliva. This approach was found to be sensitive enough to detect diluted semen (1:50) after performing the Phadebas ® press test. In addition, it enabled detection of adjacent saliva and semen stains and stains containing a semen-saliva mixture. Finally, a DNA profile was successfully obtained from the Phadebas ® sheets after semen detection, a useful feature if the original item is lost or damaged. Taken together, this method provides a practical, reliable and convenient tool for screening sexual assault items of evidence. Copyright © 2018 Elsevier B.V. All rights reserved.

  19. Investigating diagnostic bias in autism spectrum conditions: An item response theory analysis of sex bias in the AQ-10.

    PubMed

    Murray, Aja Louise; Allison, Carrie; Smith, Paula L; Baron-Cohen, Simon; Booth, Tom; Auyeung, Bonnie

    2017-05-01

    Diagnostic bias is a concern in autism spectrum conditions (ASC) where prevalence and presentation differ by sex. To ensure that females with ASC are not under-identified, it is important that ASC screening tools do not systematically underestimate autistic traits in females relative to males. We evaluated whether the AQ-10, a brief screen for ASC recommended by the National Institute of Clinical Excellence in cases of suspected ASC, exhibits such a bias. Using an item response theory approach, we evaluated differential item functioning and differential test functioning. We found that although individual items showed some sex bias, these biases at times favored males and at other times favored females. Thus, at the level of test scores the item-level biases cancelled out to give an unbiased overall score. Results support the continued use of the AQ-10 sum score in its current form; however, suggest that caution should be exercised when interpreting responses to individual items. The nature of the item level biases could serve as a guide for future research into how ASC affects males and females differently. Autism Res 2017, 10: 790-800. © 2016 International Society for Autism Research, Wiley Periodicals, Inc. © 2016 International Society for Autism Research, Wiley Periodicals, Inc.

  20. Multiple determinants of lifespan memory differences.

    PubMed

    Henson, Richard N; Campbell, Karen L; Davis, Simon W; Taylor, Jason R; Emery, Tina; Erzinclioglu, Sharon; Kievit, Rogier A

    2016-09-07

    Memory problems are among the most common complaints as people grow older. Using structural equation modeling of commensurate scores of anterograde memory from a large (N = 315), population-derived sample (www.cam-can.org), we provide evidence for three memory factors that are supported by distinct brain regions and show differential sensitivity to age. Associative memory and item memory are dramatically affected by age, even after adjusting for education level and fluid intelligence, whereas visual priming is not. Associative memory and item memory are differentially affected by emotional valence, and the age-related decline in associative memory is faster for negative than for positive or neutral stimuli. Gray-matter volume in the hippocampus, parahippocampus and fusiform cortex, and a white-matter index for the fornix, uncinate fasciculus and inferior longitudinal fasciculus, show differential contributions to the three memory factors. Together, these data demonstrate the extent to which differential ageing of the brain leads to differential patterns of memory loss.

  1. Armed Services Vocational Aptitude Battery: Differential Item Functioning on the High School Form.

    DTIC Science & Technology

    1988-04-01

    AD-RI93 693 ARMED SERVICES VOCATIONAL APTITUDE BATTERY:1/ DIFFERENTIAL ITEM FUNCTIONING..(U) UNIYERSAL ENERGY SYSTEMS INC DAYTON OH R L LINN ET AL...FUNCTIONING ON THE HIGH SCHOOL FORM - H U Robert L. Linn C. Nicholas Hastings Pei-Hua Gillian HuMKatherine E. Ryan A Universal Energy Systems , Inc. 40 Dayton...Period October 1985 - Ky 1987 0 U Approved for public release; distribution is unlimited. R ,. CES LABORATORY 1>2 Se DTIC AIR FORCE SYSTEMS COMMAND 0

  2. Visual attention to food cues is differentially modulated by gustatory-hedonic and post-ingestive attributes.

    PubMed

    Garcia-Burgos, David; Lao, Junpeng; Munsch, Simone; Caldara, Roberto

    2017-07-01

    Although attentional biases towards food cues may play a critical role in food choices and eating behaviours, it remains largely unexplored which specific food attribute governs visual attentional deployment. The allocation of visual attention might be modulated by anticipatory postingestive consequences, from taste sensations derived from eating itself, or both. Therefore, in order to obtain a comprehensive understanding of the attentional mechanisms involved in the processing of food-related cues, we recorded the eye movements to five categories of well-standardised pictures: neutral non-food, high-calorie, good taste, distaste and dangerous food. In particular, forty-four healthy adults of both sexes were assessed with an antisaccade paradigm (which requires the generation of a voluntary saccade and the suppression of a reflex one) and a free viewing paradigm (which implies the free visual exploration of two images). The results showed that observers directed their initial fixations more often and faster on items with high survival relevance such as nutrient and possible dangers; although an increase in antisaccade error rates was only detected for high-calorie items. We also found longer prosaccade fixation duration and initial fixation duration bias score related to maintained attention towards high-calorie, good taste and danger categories; while shorter reaction times to correct an incorrect prosaccade related to less difficulties in inhibiting distasteful images. Altogether, these findings suggest that visual attention is differentially modulated by both the accepted and rejected food attributes, but also that normal-weight, non-eating disordered individuals exhibit enhanced approach to food's postingestive effects and avoidance of distasteful items (such as bitter vegetables or pungent products). Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. A Bayesian Method for the Detection of Item Preknowledge in CAT. Law School Admission Council Computerized Testing Report. LSAC Research Report Series.

    ERIC Educational Resources Information Center

    McLeod, Lori D.; Lewis, Charles; Thissen, David.

    With the increased use of computerized adaptive testing, which allows for continuous testing, new concerns about test security have evolved, one being the assurance that items in an item pool are safeguarded from theft. In this paper, the risk of score inflation and procedures to detect test takers using item preknowledge are explored. When test…

  4. Detecting Item Drift in Large-Scale Testing

    ERIC Educational Resources Information Center

    Guo, Hongwen; Robin, Frederic; Dorans, Neil

    2017-01-01

    The early detection of item drift is an important issue for frequently administered testing programs because items are reused over time. Unfortunately, operational data tend to be very sparse and do not lend themselves to frequent monitoring analyses, particularly for on-demand testing. Building on existing residual analyses, the authors propose…

  5. Psychometric Examination of an Inventory of Self-Efficacy for the Holland Vocational Themes Using Item Response Theory

    ERIC Educational Resources Information Center

    Turner, Brandon M.; Betz, Nancy E.; Edwards, Michael C.; Borgen, Fred H.

    2010-01-01

    The psychometric properties of measures of self-efficacy for the six themes of Holland's theory were examined using item response theory. Item and scale quality were compared across levels of the trait continuum; all the scales were highly reliable but differentiated better at some levels of the continuum than others. Applications for adaptive…

  6. Do Self Concept Tests Test Self Concept? An Evaluation of the Validity of Items on the Piers Harris and Coopersmith Measures.

    ERIC Educational Resources Information Center

    Lynch, Mervin D.; Chaves, John

    Items from Peirs-Harris and Coopersmith self-concept tests were evaluated against independent measures on three self-constructs, idealized, empathic, and worth. Construct measurements were obtained with the semantic differential and D statistic. Ratings were obtained from 381 children, grades 4-6. For each test, item ratings and construct measures…

  7. RhinAsthma patient perspective: A Rasch validation study.

    PubMed

    Molinengo, Giorgia; Baiardini, Ilaria; Braido, Fulvio; Loera, Barbara

    2018-02-01

    In daily practice, Health-Related Quality of Life (HRQoL) tools are useful for supplementing clinical data with the patient's perspective. To encourage their use by clinicians, the availability of tools that can quickly provide valid results is crucial. A new HRQoL tool has been proposed for patients with asthma and rhinitis: the RhinAsthma Patient Perspective-RAPP. The aim of this study was to evaluate the psychometric robustness of the RAPP using the Item Response Theory (IRT) approach, to evaluate the scalability of items and test whether or not patients use the items response scale correctly. 155 patients (53.5% women, mean age 39.1, range 16-76) were recruited during a multicenter study. RAPP metric properties were investigated using IRT models. Differential item functioning (DIF) was used for gender, age, and asthma control test (ACT). The RAPP adequately fitted the Rating Scale model, demonstrating the equality of the rating scale structure for all items. All statistics on items were satisfactory. The RAPP had adequate internal reliability and showed good ability to discriminate among different groups of participants. DIF analysis indicated that there were no differential item functioning issues for gender. One item showed a DIF by age and four items by ACT. The psychometric evaluation performed using IRT models demonstrated that the RAPP met all the criteria to be considered a reliable and valid method of measurement. From a clinical perspective, this will allow physicians to confidently interpret scores as good indicators of Quality of Life of patients with asthma.

  8. Few items in the thyroid-related quality of life instrument ThyPRO exhibited differential item functioning.

    PubMed

    Watt, Torquil; Groenvold, Mogens; Hegedüs, Laszlo; Bonnema, Steen Joop; Rasmussen, Åse Krogh; Feldt-Rasmussen, Ulla; Bjorner, Jakob Bue

    2014-02-01

    To evaluate the extent of differential item functioning (DIF) within the thyroid-specific quality of life patient-reported outcome measure, ThyPRO, according to sex, age, education and thyroid diagnosis. A total of 838 patients with benign thyroid diseases completed the ThyPRO questionnaire (84 five-point items, 13 scales). Uniform and nonuniform DIF were investigated using ordinal logistic regression, testing for both statistical significance and magnitude (∆R(2) > 0.02). Scale level was estimated by the sum score, after purification. Twenty instances of DIF in 17 of the 84 items were found. Eight according to diagnosis, where the goiter scale was the one most affected, possibly due to differing perceptions in patients with auto-immune thyroid diseases compared to patients with simple goiter. Eight DIFs according to age were found, of which 5 were in positively worded items, which younger patients were more likely to endorse; one according to gender: women were more likely to report crying, and three according to educational level. The vast majority of DIF had only minor influence on the scale scores (0.1-2.3 points on the 0-100 scales), but two DIF corresponded to a difference of 4.6 and 9.8, respectively. Ordinal logistic regression identified DIF in 17 of 84 items. The potential impact of this on the present scales was low, but items displaying DIF could be avoided when developing abbreviated scales, where the potential impact of DIF (due to fewer items) will be larger.

  9. Using Explanatory Item Response Models to Evaluate Complex Scientific Tasks Designed for the Next Generation Science Standards

    NASA Astrophysics Data System (ADS)

    Chiu, Tina

    This dissertation includes three studies that analyze a new set of assessment tasks developed by the Learning Progressions in Middle School Science (LPS) Project. These assessment tasks were designed to measure science content knowledge on the structure of matter domain and scientific argumentation, while following the goals from the Next Generation Science Standards (NGSS). The three studies focus on the evidence available for the success of this design and its implementation, generally labelled as "validity" evidence. I use explanatory item response models (EIRMs) as the overarching framework to investigate these assessment tasks. These models can be useful when gathering validity evidence for assessments as they can help explain student learning and group differences. In the first study, I explore the dimensionality of the LPS assessment by comparing the fit of unidimensional, between-item multidimensional, and Rasch testlet models to see which is most appropriate for this data. By applying multidimensional item response models, multiple relationships can be investigated, and in turn, allow for a more substantive look into the assessment tasks. The second study focuses on person predictors through latent regression and differential item functioning (DIF) models. Latent regression models show the influence of certain person characteristics on item responses, while DIF models test whether one group is differentially affected by specific assessment items, after conditioning on latent ability. Finally, the last study applies the linear logistic test model (LLTM) to investigate whether item features can help explain differences in item difficulties.

  10. Assessing DSM-IV symptoms of panic attack in the general population: an item response analysis.

    PubMed

    Sunderland, Matthew; Hobbs, Megan J; Andrews, Gavin; Craske, Michelle G

    2012-12-20

    Unexpected panic attacks may represent a non-specific risk factor for future depression and anxiety disorders. The examination of panic symptoms and associated latent severity levels may lead to improvements in the identification, prevention, and treatment of panic attacks and subsequent psychopathology for 'at risk' individuals in the general population. The current study utilised item response theory to assess the DSM-IV symptoms of panic in relation to the latent severity level of the panic attack construct in a sample of 5913 respondents from the National Epidemiologic Survey on Alcohol and Related conditions. Additionally, differential item functioning (DIF) was assessed to determine if each symptom of panic targets the same level of latent severity between different sociodemographic groups (male/female, young/old). Symptoms indexing 'choking', 'fear of dying', and 'tingling/numbness' are some of the more severe symptoms of panic whilst 'heart racing', 'short of breath', 'tremble/shake', 'dizzy/faint', and 'perspire' are some of the least severe symptoms. Significant levels of DIF were detected in the 'perspire' symptom between males and females and the 'fear of dying' symptom between young and old respondents. The current study was limited to examining cross-sectional data from respondents who had experienced at least one panic attack across their lifetime. The findings of the current study provide additional information regarding panic symptoms in the general population that may enable researchers and clinicians to further refine the detection of 'at-risk' individuals who experience threshold and sub-threshold levels of panic. Copyright © 2012 Elsevier B.V. All rights reserved.

  11. Measuring everyday functional competence using the Rasch assessment of everyday activity limitations (REAL) item bank.

    PubMed

    Oude Voshaar, Martijn A H; Ten Klooster, Peter M; Vonkeman, Harald E; van de Laar, Mart A F J

    2017-11-01

    Traditional patient-reported physical function instruments often poorly differentiate patients with mild-to-moderate disability. We describe the development and psychometric evaluation of a generic item bank for measuring everyday activity limitations in outpatient populations. Seventy-two items generated from patient interviews and mapped to the International Classification of Functioning, Disability and Health (ICF) domestic life chapter were administered to 1128 adults representative of the Dutch population. The partial credit model was fitted to the item responses and evaluated with respect to its assumptions, model fit, and differential item functioning (DIF). Measurement performance of a computerized adaptive testing (CAT) algorithm was compared with the SF-36 physical functioning scale (PF-10). A final bank of 41 items was developed. All items demonstrated acceptable fit to the partial credit model and measurement invariance across age, sex, and educational level. Five- and ten-item CAT simulations were shown to have high measurement precision, which exceeded that of SF-36 physical functioning scale across the physical function continuum. Floor effects were absent for a 10-item empirical CAT simulation, and ceiling effects were low (13.5%) compared with SF-36 physical functioning (38.1%). CAT also discriminated better than SF-36 physical functioning between age groups, number of chronic conditions, and respondents with or without rheumatic conditions. The Rasch assessment of everyday activity limitations (REAL) item bank will hopefully prove a useful instrument for assessing everyday activity limitations. T-scores obtained using derived measures can be used to benchmark physical function outcomes against the general Dutch adult population.

  12. Rapid and Accurate Behavioral Health Diagnostic Screening: Initial Validation Study of a Web-Based, Self-Report Tool (the SAGE-SR)

    PubMed Central

    Purcell, Susan E; Rhea, Karen; Maier, Philip; First, Michael; Zweede, Lisa; Sinisterra, Manuela; Nunn, M Brad; Austin, Marie-Paule; Brodey, Inger S

    2018-01-01

    Background The Structured Clinical Interview for DSM (SCID) is considered the gold standard assessment for accurate, reliable psychiatric diagnoses; however, because of its length, complexity, and training required, the SCID is rarely used outside of research. Objective This paper aims to describe the development and initial validation of a Web-based, self-report screening instrument (the Screening Assessment for Guiding Evaluation-Self-Report, SAGE-SR) based on the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) and the SCID-5-Clinician Version (CV) intended to make accurate, broad-based behavioral health diagnostic screening more accessible within clinical care. Methods First, study staff drafted approximately 1200 self-report items representing individual granular symptoms in the diagnostic criteria for the 8 primary SCID-CV modules. An expert panel iteratively reviewed, critiqued, and revised items. The resulting items were iteratively administered and revised through 3 rounds of cognitive interviewing with community mental health center participants. In the first 2 rounds, the SCID was also administered to participants to directly compare their Likert self-report and SCID responses. A second expert panel evaluated the final pool of items from cognitive interviewing and criteria in the DSM-5 to construct the SAGE-SR, a computerized adaptive instrument that uses branching logic from a screener section to administer appropriate follow-up questions to refine the differential diagnoses. The SAGE-SR was administered to healthy controls and outpatient mental health clinic clients to assess test duration and test-retest reliability. Cutoff scores for screening into follow-up diagnostic sections and criteria for inclusion of diagnoses in the differential diagnosis were evaluated. Results The expert panel reduced the initial 1200 test items to 664 items that panel members agreed collectively represented the SCID items from the 8 targeted modules and DSM criteria for the covered diagnoses. These 664 items were iteratively submitted to 3 rounds of cognitive interviewing with 50 community mental health center participants; the expert panel reviewed session summaries and agreed on a final set of 661 clear and concise self-report items representing the desired criteria in the DSM-5. The SAGE-SR constructed from this item pool took an average of 14 min to complete in a nonclinical sample versus 24 min in a clinical sample. Responses to individual items can be combined to generate DSM criteria endorsements and differential diagnoses, as well as provide indices of individual symptom severity. Preliminary measures of test-retest reliability in a small, nonclinical sample were promising, with good to excellent reliability for screener items in 11 of 13 diagnostic screening modules (intraclass correlation coefficient [ICC] or kappa coefficients ranging from .60 to .90), with mania achieving fair test-retest reliability (ICC=.50) and other substance use endorsed too infrequently for analysis. Conclusions The SAGE-SR is a computerized adaptive self-report instrument designed to provide rigorous differential diagnostic information to clinicians. PMID:29572204

  13. Dissociable loss of the representations in visual short-term memory.

    PubMed

    Li, Jie

    2016-01-01

    The present study investigated in what manner the information in visual short-term memory (VSTM) is lost. Participants memorized four items, one of which was given higher priority later by a retro-cue. Then participants were required to detect a possible change, which could be either a large or small change, occurred to one of the items. The results showed that the detection performance for the small change of the uncued items was poorer than the cued item, yet large change that occurred to all four memory items could be detected perfectly, indicating that the uncued representations lost some detailed information yet still had some basic features retained in VSTM. The present study suggests that after being encoded into VSTM, the information is not lost in an object-based manner; rather, features of an item are still dissociable, so that they can be lost separately.

  14. Development of an abbreviated Career Indecision Profile-65 using item response theory: The CIP-Short.

    PubMed

    Xu, Hui; Tracey, Terence J G

    2017-03-01

    The current study developed an abbreviated version of the Career Indecision Profile-65 (CIP-65; Hacker, Carr, Abrams, & Brown, 2013) by using item response theory. In order to improve the efficiency of the CIP-65 in measuring career indecision, the individual item performance of the CIP-65 was examined with respect to the ordering of response occurrence and gender differential item functioning. The best 5 items of each scale of the CIP-65 (i.e., neuroticism/negative affectivity, choice/commitment anxiety, lack of readiness, and interpersonal conflicts) were retained in the CIP-Short using a sample of 588 college students. A validation sample (N = 174) supported the reliability and structural validity of the CIP-Short. The convergent and divergent validity of the CIP-Short was additionally supported in the findings of a hypothesized differential relational pattern in a separate sample (N = 360). While the current study supported the CIP-Short being a sound brief measure of career indecision, the limitations of this study and suggestions for future research were discussed as well. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  15. Development and Validation of the Homeostasis Concept Inventory

    PubMed Central

    McFarland, Jenny L.; Price, Rebecca M.; Wenderoth, Mary Pat; Martinková, Patrícia; Cliff, William; Michael, Joel; Modell, Harold; Wright, Ann

    2017-01-01

    We present the Homeostasis Concept Inventory (HCI), a 20-item multiple-choice instrument that assesses how well undergraduates understand this critical physiological concept. We used an iterative process to develop a set of questions based on elements in the Homeostasis Concept Framework. This process involved faculty experts and undergraduate students from associate’s colleges, primarily undergraduate institutions, regional and research-intensive universities, and professional schools. Statistical results provided strong evidence for the validity and reliability of the HCI. We found that graduate students performed better than undergraduates, biology majors performed better than nonmajors, and students performed better after receiving instruction about homeostasis. We used differential item analysis to assess whether students from different genders, races/ethnicities, and English language status performed differently on individual items of the HCI. We found no evidence of differential item functioning, suggesting that the items do not incorporate cultural or gender biases that would impact students’ performance on the test. Instructors can use the HCI to guide their teaching and student learning of homeostasis, a core concept of physiology. PMID:28572177

  16. Secondary Psychometric Examination of the Dimensional Obsessive-Compulsive Scale: Classical Testing, Item Response Theory, and Differential Item Functioning.

    PubMed

    Thibodeau, Michel A; Leonard, Rachel C; Abramowitz, Jonathan S; Riemann, Bradley C

    2015-12-01

    The Dimensional Obsessive-Compulsive Scale (DOCS) is a promising measure of obsessive-compulsive disorder (OCD) symptoms but has received minimal psychometric attention. We evaluated the utility and reliability of DOCS scores. The study included 832 students and 300 patients with OCD. Confirmatory factor analysis supported the originally proposed four-factor structure. DOCS total and subscale scores exhibited good to excellent internal consistency in both samples (α = .82 to α = .96). Patient DOCS total scores reduced substantially during treatment (t = 16.01, d = 1.02). DOCS total scores discriminated between students and patients (sensitivity = 0.76, 1 - specificity = 0.23). The measure did not exhibit gender-based differential item functioning as tested by Mantel-Haenszel chi-square tests. Expected response options for each item were plotted as a function of item response theory and demonstrated that DOCS scores incrementally discriminate OCD symptoms ranging from low to extremely high severity. Incremental differences in DOCS scores appear to represent unbiased and reliable differences in true OCD symptom severity. © The Author(s) 2014.

  17. Negative Symptom Dimensions of the Positive and Negative Syndrome Scale Across Geographical Regions: Implications for Social, Linguistic, and Cultural Consistency.

    PubMed

    Khan, Anzalee; Liharska, Lora; Harvey, Philip D; Atkins, Alexandra; Ulshen, Daniel; Keefe, Richard S E

    2017-12-01

    Objective: Recognizing the discrete dimensions that underlie negative symptoms in schizophrenia and how these dimensions are understood across localities might result in better understanding and treatment of these symptoms. To this end, the objectives of this study were to 1) identify the Positive and Negative Syndrome Scale negative symptom dimensions of expressive deficits and experiential deficits and 2) analyze performance on these dimensions over 15 geographical regions to determine whether the items defining them manifest similar reliability across these regions. Design: Data were obtained for the baseline Positive and Negative Syndrome Scale visits of 6,889 subjects across 15 geographical regions. Using confirmatory factor analysis, we examined whether a two-factor negative symptom structure that is found in schizophrenia (experiential deficits and expressive deficits) would be replicated in our sample, and using differential item functioning, we tested the degree to which specific items from each negative symptom subfactor performed across geographical regions in comparison with the United States. Results: The two-factor negative symptom solution was replicated in this sample. Most geographical regions showed moderate-to-large differential item functioning for Positive and Negative Syndrome Scale expressive deficit items, especially N3 Poor Rapport, as compared with Positive and Negative Syndrome Scale experiential deficit items, showing that these items might be interpreted or scored differently in different regions. Across countries, except for India, the differential item functioning values did not favor raters in the United States. Conclusion: These results suggest that the Positive and Negative Syndrome Scale negative symptom factor can be better represented by a two-factor model than by a single-factor model. Additionally, the results show significant differences in responses to items representing the Positive and Negative Syndrome Scale expressive factors, but not the experiential factors, across regions. This could be due to a lack of equivalence between the original and translated versions, cultural differences with the interpretation of items, dissimilarities in rater training, or diversity in the understanding of scoring anchors. Knowing which items are challenging for raters across regions can help to guide Positive and Negative Syndrome Scale training and improve the results of international clinical trials aimed at negative symptoms.

  18. A Review of the Effects on IRT Item Parameter Estimates with a Focus on Misbehaving Common Items in Test Equating

    PubMed Central

    Michaelides, Michalis P.

    2010-01-01

    Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items. PMID:21833230

  19. A Review of the Effects on IRT Item Parameter Estimates with a Focus on Misbehaving Common Items in Test Equating.

    PubMed

    Michaelides, Michalis P

    2010-01-01

    Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimates exhibiting differential behavior across test administrations are used as common for deriving equating transformations. This paper reviews the types of effects on IRT item parameter estimates and focuses on the impact of misbehaving or aberrant common items on equating transformations. Implications relating to test validity and the judgmental nature of the decision to keep or discard aberrant common items are discussed, with recommendations for future research into more informed and formal ways of dealing with misbehaving common items.

  20. Type I Error Inflation in DIF Identification with Mantel-Haenszel: An Explanation and a Solution

    ERIC Educational Resources Information Center

    Magis, David; De Boeck, Paul

    2014-01-01

    It is known that sum score-based methods for the identification of differential item functioning (DIF), such as the Mantel-Haenszel (MH) approach, can be affected by Type I error inflation in the absence of any DIF effect. This may happen when the items differ in discrimination and when there is item impact. On the other hand, outlier DIF methods…

  1. [Development and validation of an inventory of ego functions and self regulation (Hannover Self-Regulation Inventory, HSRI)].

    PubMed

    Jäger, B; Schmid-Ott, G; Ernst, G; Dölle-Lange, E; Sack, M

    2012-06-01

    The aim of this study was to construct and validate a short self-rating questionnaire for the assessment of ego functions and ability of self regulation. An item pool of 120 items covering 6 postulated dimensions was reduced by two steps in independent samples (n = 136 + 470) via factor and item analyses to the final version consisting of 35 items. The 5 resulting questionnaire scales "interpersonal disturbances", "frustration tolerance and impulse control", "identity disturbances", "affect differentiation and affect tolerance" and "self-esteem" were well interpretable and showed in confirmatory factor analysis the best fit to the data (CHI²/df = 3.48; RMSEA = 0.73). Total scores were found to differentiate well between diagnostic groups of patients with more or less ego pathology (FANOVA = 9.8; df = 11; p < 0.001), thus proving good concurrent validity. Reliability was shown by testing internal consistency and test-retest correlations. The "Hannover self-regulation questionnaire" (HSRQ) evidently is an appropriate and reliable screening instrument in order to assess ego functions and capacities of self regulation in an economic and user-friendly means. The scale structure allows differentiated diagnostics of weak vs. stable ego functions and may be used for detailed therapy planning. © Georg Thieme Verlag KG Stuttgart · New York.

  2. Using the Rasch Measurement Model in Psychometric Analysis of the Family Effectiveness Measure

    PubMed Central

    McCreary, Linda L.; Conrad, Karen M.; Conrad, Kendon J.; Scott, Christy K; Funk, Rodney R.; Dennis, Michael L.

    2013-01-01

    Background Valid assessment of family functioning can play a vital role in optimizing client outcomes. Because family functioning is influenced by family structure, socioeconomic context, and culture, existing measures of family functioning--primarily developed with nuclear, middle class European American families--may not be valid assessments of families in diverse populations. The Family Effectiveness Measure was developed to address this limitation. Objectives To test the Family Effectiveness Measure with data from a primarily low-income African American convenience sample, using the Rasch measurement model. Method A sample of 607 adult women completed the measure. Rasch analysis was used to assess unidimensionality, response category functioning, item fit, person reliability, differential item functioning by race and parental status, and item hierarchy. Criterion-related validity was tested using correlations with five other variables related to family functioning. Results The Family Effectiveness Measure measures two separate constructs: The effective family functioning construct was a psychometrically sound measure of the target construct that was more efficient due to the deletion of 22 items. The ineffective family functioning construct consisted of 16 of those deleted items but was not as strong psychometrically. Items in both constructs evidenced no differential item functioning by race. Criterion-related validity was supported for both. Discussion In contrast to the prevailing conceptualization that family functioning is a single construct, assessed by positively and negatively worded items, use of the Rasch analysis suggested the existence of two constructs. While the effective family functioning is a strong and efficient measure of family functioning, the ineffective family functioning will require additional item development and psychometric testing. PMID:23636342

  3. Geriatric Anxiety Scale: item response theory analysis, differential item functioning, and creation of a ten-item short form (GAS-10).

    PubMed

    Mueller, Anne E; Segal, Daniel L; Gavett, Brandon; Marty, Meghan A; Yochim, Brian; June, Andrea; Coolidge, Frederick L

    2015-07-01

    The Geriatric Anxiety Scale (GAS; Segal et al. (Segal, D. L., June, A., Payne, M., Coolidge, F. L. and Yochim, B. (2010). Journal of Anxiety Disorders, 24, 709-714. doi:10.1016/j.janxdis.2010.05.002) is a self-report measure of anxiety that was designed to address unique issues associated with anxiety assessment in older adults. This study is the first to use item response theory (IRT) to examine the psychometric properties of a measure of anxiety in older adults. A large sample of older adults (n = 581; mean age = 72.32 years, SD = 7.64 years, range = 60 to 96 years; 64% women; 88% European American) completed the GAS. IRT properties were examined. The presence of differential item functioning (DIF) or measurement bias by age and sex was assessed, and a ten-item short form of the GAS (called the GAS-10) was created. All GAS items had discrimination parameters of 1.07 or greater. Items from the somatic subscale tended to have lower discrimination parameters than items on the cognitive or affective subscales. Two items were flagged for DIF, but the impact of the DIF was negligible. Women scored significantly higher than men on the GAS and its subscales. Participants in the young-old group (60 to 79 years old) scored significantly higher on the cognitive subscale than participants in the old-old group (80 years old and older). Results from the IRT analyses indicated that the GAS and GAS-10 have strong psychometric properties among older adults. We conclude by discussing implications and future research directions.

  4. Differential Item Functioning (DIF) among Spanish-Speaking English Language Learners (ELLs) in State Science Tests

    NASA Astrophysics Data System (ADS)

    Ilich, Maria O.

    Psychometricians and test developers evaluate standardized tests for potential bias against groups of test-takers by using differential item functioning (DIF). English language learners (ELLs) are a diverse group of students whose native language is not English. While they are still learning the English language, they must take their standardized tests for their school subjects, including science, in English. In this study, linguistic complexity was examined as a possible source of DIF that may result in test scores that confound science knowledge with a lack of English proficiency among ELLs. Two years of fifth-grade state science tests were analyzed for evidence of DIF using two DIF methods, Simultaneous Item Bias Test (SIBTest) and logistic regression. The tests presented a unique challenge in that the test items were grouped together into testlets---groups of items referring to a scientific scenario to measure knowledge of different science content or skills. Very large samples of 10, 256 students in 2006 and 13,571 students in 2007 were examined. Half of each sample was composed of Spanish-speaking ELLs; the balance was comprised of native English speakers. The two DIF methods were in agreement about the items that favored non-ELLs and the items that favored ELLs. Logistic regression effect sizes were all negligible, while SIBTest flagged items with low to high DIF. A decrease in socioeconomic status and Spanish-speaking ELL diversity may have led to inconsistent SIBTest effect sizes for items used in both testing years. The DIF results for the testlets suggested that ELLs lacked sufficient opportunity to learn science content. The DIF results further suggest that those constructed response test items requiring the student to draw a conclusion about a scientific investigation or to plan a new investigation tended to favor ELLs.

  5. Development and validation of a vision-specific quality-of-life questionnaire for Timor-Leste.

    PubMed

    du Toit, Rènée; Palagyi, Anna; Ramke, Jacqueline; Brian, Garry; Lamoureux, Ecosse L

    2008-10-01

    To develop and determine the reliability and validity of a vision-specific quality-of-life instrument (TL-VSQOL) designed to assess the impact of distance and near vision impairment in adults living in Timor-Leste. A vision-specific quality-of-life questionnaire was developed, piloted, and administered to 704 Timorese aged >or=40 years during a population-based eye health rapid assessment. Rasch analysis was performed on the data of 457 participants with presenting near vision worse than N8 (78.5%) and/or distance vision worse than 6/18 (69.8%). Unidimensionality, item fit to the model, response category performance, differential item functioning, and targeting of items to participants were assessed. Initially, the questionnaire lacked fit to the Rasch model. Removal of two items concerning emotional well-being resulted in a fit of the data (overall item-trait interaction: chi(2) (df) = 81 (51); mean (SD) person and item fit residual values: -0.30 (1.02) and -0.32 (1.46), and good targeting of person ability and item difficulty was evident. Poorer distance and near visual acuities were significantly associated with worse quality-of-life scores (P < 0.001). Person separation reliability was substantial (0.93), indicating that the instrument can discriminate between groups with normal and impaired vision. All 17 items were free of differential item functioning, and there was no evidence of multidimensionality. This 17-item TL-VSQOL has high reliability, construct, and criterion validity and effective targeting. It can effectively assess the impact on quality of life of adult Timorese with distance and near vision impairment. The TL-VSQOL could be adapted for use in other low-resource settings.

  6. The Longer We Have to Forget the More We Remember: The Ironic Effect of Postcue Duration in Item-Based Directed Forgetting

    ERIC Educational Resources Information Center

    Bancroft, Tyler D.; Hockley, William E.; Farquhar, Riley

    2013-01-01

    The effects of the duration of remember and forget cues were examined to test the differential rehearsal account of item-based directed forgetting. In Experiments 1 and 2, cues were shown for 300, 600, or 900 ms, and a directed forgetting effect (better recognition of remember than forget items) was found at each duration. In addition, recognition…

  7. Age-Related Differences in Recognition Memory for Items and Associations: Contribution of Individual Differences in Working Memory and Metamemory

    PubMed Central

    Bender, Andrew R.; Raz, Naftali

    2012-01-01

    Ability to form new associations between unrelated items is particularly sensitive to aging, but the reasons for such differential vulnerability are unclear. In this study, we examined the role of objective and subjective factors (working memory and beliefs about memory strategies) on differential relations of age with recognition of items and associations. Healthy adults (N = 100, age 21 to 79) studied word pairs, completed item and association recognition tests, and rated the effectiveness of shallow (e.g., repetition) and deep (e.g., imagery or sentence generation) encoding strategies. Advanced age was associated with reduced working memory (WM) capacity and poorer associative recognition. In addition, reduced WM capacity, beliefs in the utility of ineffective encoding strategies, and lack of endorsement of effective ones were independently associated with impaired associative memory. Thus, maladaptive beliefs about memory in conjunction with reduced cognitive resources account in part for differences in associative memory commonly attributed to aging. PMID:22251381

  8. Differential Performance by English Language Learners on an Inquiry-Based Science Assessment

    NASA Astrophysics Data System (ADS)

    Turkan, Sultan; Liu, Ou Lydia

    2012-10-01

    The performance of English language learners (ELLs) has been a concern given the rapidly changing demographics in US K-12 education. This study aimed to examine whether students' English language status has an impact on their inquiry science performance. Differential item functioning (DIF) analysis was conducted with regard to ELL status on an inquiry-based science assessment, using a multifaceted Rasch DIF model. A total of 1,396 seventh- and eighth-grade students took the science test, including 313 ELL students. The results showed that, overall, non-ELLs significantly outperformed ELLs. Of the four items that showed DIF, three favored non-ELLs while one favored ELLs. The item that favored ELLs provided a graphic representation of a science concept within a family context. There is some evidence that constructed-response items may help ELLs articulate scientific reasoning using their own words. Assessment developers and teachers should pay attention to the possible interaction between linguistic challenges and science content when designing assessment for and providing instruction to ELLs.

  9. Disparities in Sense of Community: True race differences or differential item functioning?

    PubMed Central

    Coffman, Donna L.; BeLue, Rhonda

    2009-01-01

    The sense of community index (SCI) has been widely used to measure psychological sense of community (SOC). Furthermore, SOC has been found to differ among racial groups. Since different ethnic groups have different cultural and historical experiences that may lead to different interpretations of measurement items, it is important to know whether the instrument used to measure the construct of interest has equivalency in measurement across groups or if the instrument exhibits differential item functioning (DIF). Examining DIF in the SCI helps assure that subgroup comparisons identify true differences in SOC between Blacks and Whites. We did not find DIF between races but we did find that that the SCI question ‘I feel at home in my neighborhood’ was a more reliable measure of SOC for Whites than for Blacks. In other words, this item has less measurement error for Whites than for Blacks. Therefore, differences on the SCI may be attributable to true differences in SOC between races rather than DIF. PMID:19890462

  10. A Psychometric Evaluation of the DSM-IV Criteria for Antisocial Personality Disorder: Dimensionality, Local Reliability, and Differential Item Functioning Across Gender.

    PubMed

    Paap, Muirne C S; Braeken, Johan; Pedersen, Geir; Urnes, Øyvind; Karterud, Sigmund; Wilberg, Theresa; Hummelen, Benjamin

    2017-12-01

    This study aims at evaluating the psychometric properties of the antisocial personality disorder (ASPD) criteria in a large sample of patients, most of whom had one or more personality disorders (PD). PD diagnoses were assessed by experienced clinicians using the Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders, 4th edition, Axis II PDs. Analyses were performed within an item response theory framework. Results of the analyses indicated that ASPD is a unidimensional construct that can be measured reliably at the upper range of the latent trait scale. Differential item functioning across gender was restricted to two criteria and had little impact on the latent ASPD trait level. Patients fulfilling both the adult ASPD criteria and the conduct disorder criteria had similar latent trait distributions as patients fulfilling only the adult ASPD criteria. Overall, the ASPD items fit the purpose of a diagnostic instrument well, that is, distinguishing patients with moderate from those with high antisocial personality scores.

  11. Multiple determinants of lifespan memory differences

    PubMed Central

    Henson, Richard N.; Campbell, Karen L.; Davis, Simon W.; Taylor, Jason R.; Emery, Tina; Erzinclioglu, Sharon; Tyler, Lorraine K.; Brayne, Carol; Bullmore, Edward T.; Calder, Andrew C.; Cusack, Rhodri; Dalgleish, Tim; Duncan, John; Matthews, Fiona E.; Marslen-Wilson, William D.; Rowe, James B.; Shafto, Meredith A.; Cheung, Teresa; Geerligs, Linda; McCarrey, Anna; Mustafa, Abdur; Price, Darren; Samu, David; Treder, Matthias; Tsvetanov, Kamen A.; van Belle, Janna; Williams, Nitin; Bates, Lauren; Gadie, Andrew; Gerbase, Sofia; Georgieva, Stanimira; Hanley, Claire; Parkin, Beth; Troy, David; Auer, Tibor; Correia, Marta; Gao, Lu; Green, Emma; Henriques, Rafael; Allen, Jodie; Amery, Gillian; Amunts, Liana; Barcroft, Anne; Castle, Amanda; Dias, Cheryl; Dowrick, Jonathan; Fair, Melissa; Fisher, Hayley; Goulding, Anna; Grewa, Adarsh; Hale, Geoff; Hilton, Andrew; Johnson, Frances; Johnston, Patricia; Kavanagh-Williamson, Thea; Kwasniewska, Magdalena; McMinn, Alison; Norman, Kim; Penrose, Jessica; Roby, Fiona; Rowland, Diane; Sargeant, John; Squire, Maggie; Stevens, Beth; Stoddart, Aldabra; Stone, Cheryl; Thompson, Tracy; Yazlik, Ozlem; Barnes, Dan; Dixon, Marie; Hillman, Jaya; Mitchell, Joanne; Villis, Laura; Kievit, Rogier A.

    2016-01-01

    Memory problems are among the most common complaints as people grow older. Using structural equation modeling of commensurate scores of anterograde memory from a large (N = 315), population-derived sample (www.cam-can.org), we provide evidence for three memory factors that are supported by distinct brain regions and show differential sensitivity to age. Associative memory and item memory are dramatically affected by age, even after adjusting for education level and fluid intelligence, whereas visual priming is not. Associative memory and item memory are differentially affected by emotional valence, and the age-related decline in associative memory is faster for negative than for positive or neutral stimuli. Gray-matter volume in the hippocampus, parahippocampus and fusiform cortex, and a white-matter index for the fornix, uncinate fasciculus and inferior longitudinal fasciculus, show differential contributions to the three memory factors. Together, these data demonstrate the extent to which differential ageing of the brain leads to differential patterns of memory loss. PMID:27600595

  12. Comparison of Factor Simplicity Indices for Dichotomous Data: DETECT R, Bentler's Simplicity Index, and the Loading Simplicity Index

    ERIC Educational Resources Information Center

    Finch, Holmes; Stage, Alan Kirk; Monahan, Patrick

    2008-01-01

    A primary assumption underlying several of the common methods for modeling item response data is unidimensionality, that is, test items tap into only one latent trait. This assumption can be assessed several ways, using nonlinear factor analysis and DETECT, a method based on the item conditional covariances. When multidimensionality is identified,…

  13. Rapid and Accurate Behavioral Health Diagnostic Screening: Initial Validation Study of a Web-Based, Self-Report Tool (the SAGE-SR).

    PubMed

    Brodey, Benjamin; Purcell, Susan E; Rhea, Karen; Maier, Philip; First, Michael; Zweede, Lisa; Sinisterra, Manuela; Nunn, M Brad; Austin, Marie-Paule; Brodey, Inger S

    2018-03-23

    The Structured Clinical Interview for DSM (SCID) is considered the gold standard assessment for accurate, reliable psychiatric diagnoses; however, because of its length, complexity, and training required, the SCID is rarely used outside of research. This paper aims to describe the development and initial validation of a Web-based, self-report screening instrument (the Screening Assessment for Guiding Evaluation-Self-Report, SAGE-SR) based on the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) and the SCID-5-Clinician Version (CV) intended to make accurate, broad-based behavioral health diagnostic screening more accessible within clinical care. First, study staff drafted approximately 1200 self-report items representing individual granular symptoms in the diagnostic criteria for the 8 primary SCID-CV modules. An expert panel iteratively reviewed, critiqued, and revised items. The resulting items were iteratively administered and revised through 3 rounds of cognitive interviewing with community mental health center participants. In the first 2 rounds, the SCID was also administered to participants to directly compare their Likert self-report and SCID responses. A second expert panel evaluated the final pool of items from cognitive interviewing and criteria in the DSM-5 to construct the SAGE-SR, a computerized adaptive instrument that uses branching logic from a screener section to administer appropriate follow-up questions to refine the differential diagnoses. The SAGE-SR was administered to healthy controls and outpatient mental health clinic clients to assess test duration and test-retest reliability. Cutoff scores for screening into follow-up diagnostic sections and criteria for inclusion of diagnoses in the differential diagnosis were evaluated. The expert panel reduced the initial 1200 test items to 664 items that panel members agreed collectively represented the SCID items from the 8 targeted modules and DSM criteria for the covered diagnoses. These 664 items were iteratively submitted to 3 rounds of cognitive interviewing with 50 community mental health center participants; the expert panel reviewed session summaries and agreed on a final set of 661 clear and concise self-report items representing the desired criteria in the DSM-5. The SAGE-SR constructed from this item pool took an average of 14 min to complete in a nonclinical sample versus 24 min in a clinical sample. Responses to individual items can be combined to generate DSM criteria endorsements and differential diagnoses, as well as provide indices of individual symptom severity. Preliminary measures of test-retest reliability in a small, nonclinical sample were promising, with good to excellent reliability for screener items in 11 of 13 diagnostic screening modules (intraclass correlation coefficient [ICC] or kappa coefficients ranging from .60 to .90), with mania achieving fair test-retest reliability (ICC=.50) and other substance use endorsed too infrequently for analysis. The SAGE-SR is a computerized adaptive self-report instrument designed to provide rigorous differential diagnostic information to clinicians. ©Benjamin Brodey, Susan E Purcell, Karen Rhea, Philip Maier, Michael First, Lisa Zweede, Manuela Sinisterra, M Brad Nunn, Marie-Paule Austin, Inger S Brodey. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 23.03.2018.

  14. Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

    ERIC Educational Resources Information Center

    He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei

    2013-01-01

    Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…

  15. Robust Scale Transformation Methods in IRT True Score Equating under Common-Item Nonequivalent Groups Design

    ERIC Educational Resources Information Center

    He, Yong

    2013-01-01

    Common test items play an important role in equating multiple test forms under the common-item nonequivalent groups design. Inconsistent item parameter estimates among common items can lead to large bias in equated scores for IRT true score equating. Current methods extensively focus on detection and elimination of outlying common items, which…

  16. Solving Differential Equations Analytically. Elementary Differential Equations. Modules and Monographs in Undergraduate Mathematics and Its Applications Project. UMAP Unit 335.

    ERIC Educational Resources Information Center

    Goldston, J. W.

    This unit introduces analytic solutions of ordinary differential equations. The objective is to enable the student to decide whether a given function solves a given differential equation. Examples of problems from biology and chemistry are covered. Problem sets, quizzes, and a model exam are included, and answers to all items are provided. The…

  17. Item Analyses of Memory Differences

    PubMed Central

    Salthouse, Timothy A.

    2017-01-01

    Objective Although performance on memory and other cognitive tests is usually assessed with a score aggregated across multiple items, potentially valuable information is also available at the level of individual items. Method The current study illustrates how analyses of variance with item as one of the factors, and memorability analyses in which item accuracy in one group is plotted as a function of item accuracy in another group, can provide a more detailed characterization of the nature of group differences in memory. Data are reported for two memory tasks, word recall and story memory, across age, ability, repetition, delay, and longitudinal contrasts. Results The item-level analyses revealed evidence for largely uniform differences across items in the age, ability, and longitudinal contrasts, but differential patterns across items in the repetition contrast, and unsystematic item relations in the delay contrast. Conclusion Analyses at the level of individual items have the potential to indicate the manner by which group differences in the aggregate test score are achieved. PMID:27618285

  18. Increasing the power for detecting impairment in older adults with the Faces subtest from Wechsler Memory Scale-III: an empirical trial.

    PubMed

    Levy, Boaz

    2006-10-01

    Empirical studies have questioned the validity of the Faces subtest from the WMS-III for detecting impairment in visual memory, particularly among the elderly. A recent examination of the test norms revealed a significant age related floor effect already emerging on Faces I (immediate recall), implying excessive difficulty in the acquisition phase among unimpaired older adults. The current study compared the concurrent validity of the Faces subtest with an alternative measure between 16 Alzheimer's patients and 16 controls. The alternative measure was designed to facilitate acquisition by reducing the sequence of item presentation. Other changes aimed at increasing the retrieval challenge, decreasing error due to guessing and standardizing the administration. Analyses converged to indicate that the alternative measure provided a considerably greater differentiation than the Faces subtest between Alzheimer's patients and controls. Steps for revising the Faces subtest are discussed.

  19. A Differential Item Functioning (DIF) Analysis of the Communicative Participation Item Bank (CPIB): Comparing Individuals with Parkinson's Disease from the United States and New Zealand

    ERIC Educational Resources Information Center

    Baylor, Carolyn; McAuliffe, Megan J.; Hughes, Louise E.; Yorkston, Kathryn; Anderson, Tim; Jiseon, Kim; Amtmann, Dagmar

    2014-01-01

    Purpose: To examine the cross-cultural applicability of the Communicative Participation Item Bank (CPIB) through a comparison of respondents with Parkinson's disease (PD) from the United States and New Zealand. Method: A total of 428 respondents--218 from the United States and 210 from New Zealand-completed the self-report CPIB and a series of…

  20. Alternative Matching Scores to Control Type I Error of the Mantel-Haenszel Procedure for DIF in Dichotomously Scored Items Conforming to 3PL IRT and Nonparametric 4PBCB Models

    ERIC Educational Resources Information Center

    Monahan, Patrick O.; Ankenmann, Robert D.

    2010-01-01

    When the matching score is either less than perfectly reliable or not a sufficient statistic for determining latent proficiency in data conforming to item response theory (IRT) models, Type I error (TIE) inflation may occur for the Mantel-Haenszel (MH) procedure or any differential item functioning (DIF) procedure that matches on summed-item…

  1. Monitoring

    DOEpatents

    Orr, Christopher Henry; Luff, Craig Janson; Dockray, Thomas; Macarthur, Duncan Whittemore

    2004-11-23

    The invention provides apparatus and methods which facilitate movement of an instrument relative to an item or location being monitored and/or the item or location relative to the instrument, whilst successfully excluding extraneous ions from the detection location. Thus, ions generated by emissions from the item or location can successfully be monitored during movement. The technique employs sealing to exclude such ions, for instance, through an electro-field which attracts and discharges the ions prior to their entering the detecting location and/or using a magnetic field configured to repel the ions away from the detecting location.

  2. Item Response Theory and Health Outcomes Measurement in the 21st Century

    PubMed Central

    Hays, Ron D.; Morales, Leo S.; Reise, Steve P.

    2006-01-01

    Item response theory (IRT) has a number of potential advantages over classical test theory in assessing self-reported health outcomes. IRT models yield invariant item and latent trait estimates (within a linear transformation), standard errors conditional on trait level, and trait estimates anchored to item content. IRT also facilitates evaluation of differential item functioning, inclusion of items with different response formats in the same scale, and assessment of person fit and is ideally suited for implementing computer adaptive testing. Finally, IRT methods can be helpful in developing better health outcome measures and in assessing change over time. These issues are reviewed, along with a discussion of some of the methodological and practical challenges in applying IRT methods. PMID:10982088

  3. Using Shaping to Increase Foods Consumed by Children with Autism

    ERIC Educational Resources Information Center

    Hodges, Abby; Davis, Tonya; Crandall, Madison; Phipps, Laura; Weston, Regan

    2017-01-01

    The current study used differential reinforcement and shaping to increase the variety of foods accepted by children with autism who demonstrated significant feeding inflexibility. Participants were introduced to four new food items via a hierarchical exposure, which involved systematically increasing the desired response with the food item. Level…

  4. Comparing DIF Methods for Data with Dual Dependency

    ERIC Educational Resources Information Center

    Jin, Ying; Kang, Minsoo

    2016-01-01

    Background: The current study compared four differential item functioning (DIF) methods to examine their performances in terms of accounting for dual dependency (i.e., person and item clustering effects) simultaneously by a simulation study, which is not sufficiently studied under the current DIF literature. The four methods compared are logistic…

  5. Modeling the Discrimination Power of Physics Items

    ERIC Educational Resources Information Center

    Mesic, Vanes

    2011-01-01

    For the purposes of tailoring physics instruction in accordance with the needs and abilities of the students it is useful to explore the knowledge structure of students of different ability levels. In order to precisely differentiate the successive, characteristic states of student achievement it is necessary to use test items that possess…

  6. Are Teacher Course Evaluations Biased against Faculty That Teach Quantitative Methods Courses?

    ERIC Educational Resources Information Center

    Royal, Kenneth D.; Stockdale, Myrah R.

    2015-01-01

    The present study investigated graduate students' responses to teacher/course evaluations (TCE) to determine if students' responses were inherently biased against faculty who teach quantitative methods courses. Item response theory (IRT) and Differential Item Functioning (DIF) techniques were utilized for data analysis. Results indicate students…

  7. A Comparison of Strategies for Estimating Conditional DIF

    ERIC Educational Resources Information Center

    Moses, Tim; Miao, Jing; Dorans, Neil J.

    2010-01-01

    In this study, the accuracies of four strategies were compared for estimating conditional differential item functioning (DIF), including raw data, logistic regression, log-linear models, and kernel smoothing. Real data simulations were used to evaluate the estimation strategies across six items, DIF and No DIF situations, and four sample size…

  8. IRTs of the ABCs: Children's Letter Name Acquisition

    ERIC Educational Resources Information Center

    Phillips, Beth M.; Piasta, Shayne B.; Anthony, Jason L.; Lonigan, Christopher J.; Francis, David J.

    2012-01-01

    We examined the developmental sequence of letter name knowledge acquisition by children from 2 to five years of age. Data from 2 samples representing diverse regions, ethnicity, and socioeconomic backgrounds (ns=1074 and 500) were analyzed using item response theory (IRT) and differential item functioning techniques. Results from factor analyses…

  9. Psychometric evaluation of Persian Nomophobia Questionnaire: Differential item functioning and measurement invariance across gender.

    PubMed

    Lin, Chung-Ying; Griffiths, Mark D; Pakpour, Amir H

    2018-03-01

    Background and aims Research examining problematic mobile phone use has increased markedly over the past 5 years and has been related to "no mobile phone phobia" (so-called nomophobia). The 20-item Nomophobia Questionnaire (NMP-Q) is the only instrument that assesses nomophobia with an underlying theoretical structure and robust psychometric testing. This study aimed to confirm the construct validity of the Persian NMP-Q using Rasch and confirmatory factor analysis (CFA) models. Methods After ensuring the linguistic validity, Rasch models were used to examine the unidimensionality of each Persian NMP-Q factor among 3,216 Iranian adolescents and CFAs were used to confirm its four-factor structure. Differential item functioning (DIF) and multigroup CFA were used to examine whether males and females interpreted the NMP-Q similarly, including item content and NMP-Q structure. Results Each factor was unidimensional according to the Rach findings, and the four-factor structure was supported by CFA. Two items did not quite fit the Rasch models (Item 14: "I would be nervous because I could not know if someone had tried to get a hold of me;" Item 9: "If I could not check my smartphone for a while, I would feel a desire to check it"). No DIF items were found across gender and measurement invariance was supported in multigroup CFA across gender. Conclusions Due to the satisfactory psychometric properties, it is concluded that the Persian NMP-Q can be used to assess nomophobia among adolescents. Moreover, NMP-Q users may compare its scores between genders in the knowledge that there are no score differences contributed by different understandings of NMP-Q items.

  10. Examination of the PROMIS upper extremity item bank.

    PubMed

    Hung, Man; Voss, Maren W; Bounsanga, Jerry; Crum, Anthony B; Tyser, Andrew R

    Clinical measurement. The psychometric properties of the PROMIS v1.2 UE item bank were tested on various samples prior to its release, but have not been fully evaluated among the orthopaedic population. This study assesses the performance of the UE item bank within the UE orthopaedic patient population. The UE item bank was administered to 1197 adult patients presenting to a tertiary orthopaedic clinic specializing in hand and UE conditions and was examined using traditional statistics and Rasch analysis. The UE item bank fits a unidimensional model (outfit MNSQ range from 0.64 to 1.70) and has adequate reliabilities (person = 0.84; item = 0.82) and local independence (item residual correlations range from -0.37 to 0.34). Only one item exhibits gender differential item functioning. Most items target low levels of function. The UE item bank is a useful clinical assessment tool. Additional items covering higher functions are needed to enhance validity. Supplemental testing is recommended for patients at higher levels of function until more high function UE items are developed. 2c. Copyright © 2016 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.

  11. Item Response Theory Analyses of the Cambridge Face Memory Test (CFMT)

    PubMed Central

    Cho, Sun-Joo; Wilmer, Jeremy; Herzmann, Grit; McGugin, Rankin; Fiset, Daniel; Van Gulick, Ana E.; Ryan, Katie; Gauthier, Isabel

    2014-01-01

    We evaluated the psychometric properties of the Cambridge face memory test (CFMT; Duchaine & Nakayama, 2006). First, we assessed the dimensionality of the test with a bi-factor exploratory factor analysis (EFA). This EFA analysis revealed a general factor and three specific factors clustered by targets of CFMT. However, the three specific factors appeared to be minor factors that can be ignored. Second, we fit a unidimensional item response model. This item response model showed that the CFMT items could discriminate individuals at different ability levels and covered a wide range of the ability continuum. We found the CFMT to be particularly precise for a wide range of ability levels. Third, we implemented item response theory (IRT) differential item functioning (DIF) analyses for each gender group and two age groups (Age ≤ 20 versus Age > 21). This DIF analysis suggested little evidence of consequential differential functioning on the CFMT for these groups, supporting the use of the test to compare older to younger, or male to female, individuals. Fourth, we tested for a gender difference on the latent facial recognition ability with an explanatory item response model. We found a significant but small gender difference on the latent ability for face recognition, which was higher for women than men by 0.184, at age mean 23.2, controlling for linear and quadratic age effects. Finally, we discuss the practical considerations of the use of total scores versus IRT scale scores in applications of the CFMT. PMID:25642930

  12. Evaluating Differential Item Functioning in the English General Practice Patient Survey: Comparison of South Asian and White British Subgroups.

    PubMed

    Setodji, Claude M; Elliott, Marc N; Abel, Gary; Burt, Jenni; Roland, Martin; Campbell, John

    2015-09-01

    To evaluate two 5-item patient experience scales from the English General Practice (GP) Patient Survey for evidence of differential item functioning (DIF) given prior evidence of substantially worse reported health care experiences for South Asian compared with white British respondents. A national survey of English patients' primary care experiences. We used classic test and item response theory analysis to examine the possibility of DIF by patient ethnicity (South Asian, white British) after controlling for age, sex, health status, and quality of life in the English GP Patient Survey conducted in 2011/2012. Data were available for 873,051 respondents (818,219 white British/54,832 South Asian from 7795 English practices) who answered items relating to experiences of GP or nurses' care. Internal consistency reliability was high and similar for South Asian and white British patients. White British patients reported better average experiences than South Asians, but there was no evidence of DIF or different item response curves for white British and South Asian respondents, even in sensitivity analyses using matched samples. All communication items in the English GP Patient Survey showed similar South Asian versus white British differences, with no evidence of DIF. In contrast, differences due to scale use or expectations are typically variable rather than constant across scales. While other possibilities remain, these findings increase the likelihood that the observed negative responses of South Asian patients to this national survey reflect true differences in their experiences of care.

  13. Neural correlates of differential retrieval orientation: Sustained and item-related components.

    PubMed

    Woodruff, C Chad; Uncapher, Melina R; Rugg, Michael D

    2006-01-01

    Retrieval orientation refers to a cognitive state that biases processing of retrieval cues in service of a specific goal. The present study used a mixed fMRI design to investigate whether adoption of different retrieval orientations - as indexed by differences in the activity elicited by retrieval cues corresponding to unstudied items - is associated with differences in the state-related activity sustained across a block of test trials sharing a common retrieval goal. Subjects studied mixed lists comprising visually presented words and pictures. They then undertook a series of short test blocks in which all test items were visually presented words. The blocks varied according to whether the test items were used to cue retrieval of studied words or studied pictures. In several regions, neural activity elicited by correctly classified new items differed according to whether words or pictures were the targeted material. The loci of these effects suggest that one factor driving differential cue processing is modulation of the degree of overlap between cue and targeted memory representations. In addition to these item-related effects, neural activity sustained throughout the test blocks also differed according to the nature of the targeted material. These findings indicate that the adoption of different retrieval orientations is associated with distinct neural states. The loci of these sustained effects were distinct from those where new item activity varied, suggesting that the effects may play a role in biasing retrieval cue processing in favor of the current retrieval goal.

  14. Remembered but Unused: The Accessory Items in Working Memory that Do Not Guide Attention

    ERIC Educational Resources Information Center

    Peters, Judith C.; Goebel, Rainer; Roelfsema, Pieter R.

    2009-01-01

    If we search for an item, a representation of this item in our working memory guides attention to matching items in the visual scene. We can hold multiple items in working memory. Do all these items guide attention in parallel? We asked participants to detect a target object in a stream of objects while they maintained a second item in memory for…

  15. Differential Item Functioning in the SF-36 Physical Functioning and Mental Health Sub-Scales: A Population-Based Investigation in the Canadian Multicentre Osteoporosis Study.

    PubMed

    Lix, Lisa M; Wu, Xiuyun; Hopman, Wilma; Mayo, Nancy; Sajobi, Tolulope T; Liu, Juxin; Prior, Jerilynn C; Papaioannou, Alexandra; Josse, Robert G; Towheed, Tanveer E; Davison, K Shawn; Sawatzky, Richard

    2016-01-01

    Self-reported health status measures, like the Short Form 36-item Health Survey (SF-36), can provide rich information about the overall health of a population and its components, such as physical, mental, and social health. However, differential item functioning (DIF), which arises when population sub-groups with the same underlying (i.e., latent) level of health have different measured item response probabilities, may compromise the comparability of these measures. The purpose of this study was to test for DIF on the SF-36 physical functioning (PF) and mental health (MH) sub-scale items in a Canadian population-based sample. Study data were from the prospective Canadian Multicentre Osteoporosis Study (CaMos), which collected baseline data in 1996-1997. DIF was tested using a multiple indicators multiple causes (MIMIC) method. Confirmatory factor analysis defined the latent variable measurement model for the item responses and latent variable regression with demographic and health status covariates (i.e., sex, age group, body weight, self-perceived general health) produced estimates of the magnitude of DIF effects. The CaMos cohort consisted of 9423 respondents; 69.4% were female and 51.7% were less than 65 years. Eight of 10 items on the PF sub-scale and four of five items on the MH sub-scale exhibited DIF. Large DIF effects were observed on PF sub-scale items about vigorous and moderate activities, lifting and carrying groceries, walking one block, and bathing or dressing. On the MH sub-scale items, all DIF effects were small or moderate in size. SF-36 PF and MH sub-scale scores were not comparable across population sub-groups defined by demographic and health status variables due to the effects of DIF, although the magnitude of this bias was not large for most items. We recommend testing and adjusting for DIF to ensure comparability of the SF-36 in population-based investigations.

  16. Detecting a Gender-Related DIF Using Logistic Regression and Transformed Item Difficulty

    ERIC Educational Resources Information Center

    Abedlaziz, Nabeel; Ismail, Wail; Hussin, Zaharah

    2011-01-01

    Test items are designed to provide information about the examinees. Difficult items are designed to be more demanding and easy items are less so. However, sometimes, test items carry with their demands other than those intended by the test developer (Scheuneman & Gerritz, 1990). When personal attributes such as gender systematically affect…

  17. Harmonizing Measures of Cognitive Performance Across International Surveys of Aging Using Item Response Theory.

    PubMed

    Chan, Kitty S; Gross, Alden L; Pezzin, Liliana E; Brandt, Jason; Kasper, Judith D

    2015-12-01

    To harmonize measures of cognitive performance using item response theory (IRT) across two international aging studies. Data for persons ≥65 years from the Health and Retirement Study (HRS, N = 9,471) and the English Longitudinal Study of Aging (ELSA, N = 5,444). Cognitive performance measures varied (HRS fielded 25, ELSA 13); 9 were in common. Measurement precision was examined for IRT scores based on (a) common items, (b) common items adjusted for differential item functioning (DIF), and (c) DIF-adjusted all items. Three common items (day of date, immediate word recall, and delayed word recall) demonstrated DIF by survey. Adding survey-specific items improved precision but mainly for HRS respondents at lower cognitive levels. IRT offers a feasible strategy for harmonizing cognitive performance measures across other surveys and for other multi-item constructs of interest in studies of aging. Practical implications depend on sample distribution and the difficulty mix of in-common and survey-specific items. © The Author(s) 2015.

  18. Construct Validity of the Multidimensional Structure of Bullying and Victimization: An Application of Exploratory Structural Equation Modeling

    ERIC Educational Resources Information Center

    Marsh, Herbert W.; Nagengast, Benjamin; Morin, Alexandre J. S.; Parada, Roberto H.; Craven, Rhonda G.; Hamilton, Linda R.

    2011-01-01

    Existing research posits multiple dimensions of bullying and victimization but has not identified well-differentiated facets of these constructs that meet standards of good measurement: goodness of fit, measurement invariance, lack of differential item functioning, and well-differentiated factors that are not so highly correlated as to detract…

  19. Application of Item Response Theory to Tests of Substance-related Associative Memory

    PubMed Central

    Shono, Yusuke; Grenard, Jerry L.; Ames, Susan L.; Stacy, Alan W.

    2015-01-01

    A substance-related word association test (WAT) is one of the commonly used indirect tests of substance-related implicit associative memory and has been shown to predict substance use. This study applied an item response theory (IRT) modeling approach to evaluate psychometric properties of the alcohol- and marijuana-related WATs and their items among 775 ethnically diverse at-risk adolescents. After examining the IRT assumptions, item fit, and differential item functioning (DIF) across gender and age groups, the original 18 WAT items were reduced to 14- and 15-items in the alcohol- and marijuana-related WAT, respectively. Thereafter, unidimensional one- and two-parameter logistic models (1PL and 2PL models) were fitted to the revised WAT items. The results demonstrated that both alcohol- and marijuana-related WATs have good psychometric properties. These results were discussed in light of the framework of a unified concept of construct validity (Messick, 1975, 1989, 1995). PMID:25134051

  20. Detection of small orientation changes and the precision of visual working memory.

    PubMed

    Salmela, Viljami R; Saarinen, Jussi

    2013-01-14

    We investigated the precision of orientation representations with two tasks, change detection and recall. Previously change detection has been measured only with relatively large orientation changes compared to psychophysical thresholds. In the first experiment, we measured the observers' ability (d') to detect small changes in orientation (5-30°) with 1-4 Gabor items. With one item even a 10° change was well detected (average d'=2.5). As the amount of change increased to 30°, the d' increased to 5.2. When the number of items was increased, the d's gradually decreased. In the second experiment, we used a recall task and the observers adjusted the orientation of a probe Gabor to match the orientation of a Gabor held in the memory. The standard deviation (s.d.) of errors was calculated from the Gaussian distribution fitted to the data. As the number of items increased from 1 to 6, the s.d. increased from 8.6° to 19.6°. Even with six items, the observers did not make any random adjustments. The results show a square root relation between the d'/s.d. and the number of items. The d' in change detection is directly proportional to the square root of (1/n) and the orientation change. The increase of the s.d. in recall task is inversely proportional to square root of (1/n). The results suggest that limited resources and precision of representations, without additional assumptions, determine the memory performance. Copyright © 2012 Elsevier Ltd. All rights reserved.

  1. A 67-Item Stress Resilience item bank showing high content validity was developed in a psychosomatic sample.

    PubMed

    Obbarius, Nina; Fischer, Felix; Obbarius, Alexander; Nolte, Sandra; Liegl, Gregor; Rose, Matthias

    2018-04-10

    To develop the first item bank to measure Stress Resilience (SR) in clinical populations. Qualitative item development resulted in an initial pool of 131 items covering a broad theoretical SR concept. These items were tested in n=521 patients at a psychosomatic outpatient clinic. Exploratory and Confirmatory Factor Analysis (CFA), as well as other state-of-the-art item analyses and IRT were used for item evaluation and calibration of the final item bank. Out of the initial item pool of 131 items, we excluded 64 items (54 factor loading <.5, 4 residual correlations >.3, 2 non-discriminative Item Response Curves, 4 Differential Item Functioning). The final set of 67 items indicated sufficient model fit in CFA and IRT analyses. Additionally, a 10-item short form with high measurement precision (SE≤.32 in a theta range between -1.8 and +1.5) was derived. Both the SR item bank and the SR short form were highly correlated with an existing static legacy tool (Connor-Davidson Resilience Scale). The final SR item bank and 10-item short form showed good psychometric properties. When further validated, they will be ready to be used within a framework of Computer-Adaptive Tests for a comprehensive assessment of the Stress-Construct. Copyright © 2018. Published by Elsevier Inc.

  2. Screening for hazardous drinking using the Michigan Alcohol Screening Test-Geriatric Version (MAST-G) in elderly persons with acute cerebrovascular accidents.

    PubMed

    Johnson-Greene, Doug; McCaul, Mary E; Roger, Patricia

    2009-09-01

    Effective and valid screening methods are needed to identify hazardous drinking in elderly persons with new onset acute medical illness. The goal of the current study was to examine the effectiveness of the Michigan Alcohol Screening Test-Geriatric Version (MAST-G) in identifying hazardous drinking among elderly patients with acute cerebrovascular accidents (CVA) and to compare the effectiveness of 2 shorter versions of the MAST-G with the full instrument. The study sample included 100 men and women who averaged 12 days posthemorrhagic or ischemic CVA admitted to a rehabilitation unit and who were at least 50 years of age and free of substance use other than alcohol. This cross-sectional validation study compared the 24-item full MAST-G, the 10-item Short MAST-G (SMAST-G), and a 2-item regression analysis derived Mini MAST-G (MMAST-G) to the reference standard of hazardous drinking during the past 3 months. Alcohol use was collected using the Timeline Followback (TLFB). Recent and lifetime alcohol-related consequences were collected using the Short Inventory of Problems (SIP). Nearly one-third (28%) of the study sample met the World Health Organization (WHO) criteria for hazardous drinking. Moderately strong associations were found for the MAST-G, SMAST-G, and MMAST-G with alcohol quantity and frequency and recent and lifetime alcohol consequences. All 3 MAST-G versions could differentiate hazardous from nonhazardous drinkers and had nearly identical area under the curve characteristics. Comparable sensitivity was found across the 3 MAST-G measures. The optimal screening threshold for hazardous drinking was 5 for the MAST-G, 2 for the SMAST-G, and 1 for the MMAST-G. The 10-item SMAST-G and 2-item MMAST-G are brief screening tests that show comparable effectiveness in detecting hazardous drinking in elderly patients with acute CVA compared with the full 24-item MAST-G. Implications for research and clinical practice are discussed.

  3. Assessing the Straightforwardly-Worded Brief Fear of Negative Evaluation Scale for Differential Item Functioning Across Gender and Ethnicity.

    PubMed

    Harpole, Jared K; Levinson, Cheri A; Woods, Carol M; Rodebaugh, Thomas L; Weeks, Justin W; Brown, Patrick J; Heimberg, Richard G; Menatti, Andrew R; Blanco, Carlos; Schneier, Franklin; Liebowitz, Michael

    2015-06-01

    The Brief Fear of Negative Evaluation Scale (BFNE; Leary Personality and Social Psychology Bulletin , 9, 371-375, 1983) assesses fear and worry about receiving negative evaluation from others. Rodebaugh et al. Psychological Assessment, 16 , 169-181, (2004) found that the BFNE is composed of a reverse-worded factor (BFNE-R) and straightforwardly-worded factor (BFNE-S). Further, they found the BFNE-S to have better psychometric properties and provide more information than the BFNE-R. Currently there is a lack of research regarding the measurement invariance of the BFNE-S across gender and ethnicity with respect to item thresholds. The present study uses item response theory (IRT) to test the BFNE-S for differential item functioning (DIF) related to gender and ethnicity (White, Asian, and Black). Six data sets consisting of clinical, community, and undergraduate participants were utilized ( N =2,109). The factor structure of the BFNE-S was confirmed using categorical confirmatory factor analysis, IRT model assumptions were tested, and the BFNE-S was evaluated for DIF. Item nine demonstrated significant non-uniform DIF between White and Black participants. No other items showed significant uniform or non-uniform DIF across gender or ethnicity. Results suggest the BFNE-S can be used reliably with men and women and Asian and White participants. More research is needed to understand the implications of using the BFNE-S with Black participants.

  4. An interfering Go/No-go task does not affect accuracy in a Concealed Information Test.

    PubMed

    Ambach, Wolfgang; Stark, Rudolf; Peper, Martin; Vaitl, Dieter

    2008-04-01

    Following the idea that response inhibition processes play a central role in concealing information, the present study investigated the influence of a Go/No-go task as an interfering mental activity, performed parallel to the Concealed Information Test (CIT), on the detectability of concealed information. 40 undergraduate students participated in a mock-crime experiment and simultaneously performed a CIT and a Go/No-go task. Electrodermal activity (EDA), respiration line length (RLL), heart rate (HR) and finger pulse waveform length (FPWL) were registered. Reaction times were recorded as behavioral measures in the Go/No-go task as well as in the CIT. As a within-subject control condition, the CIT was also applied without an additional task. The parallel task did not influence the mean differences of the physiological measures of the mock-crime-related probe and the irrelevant items. This finding might possibly be due to the fact that the applied parallel task induced a tonic rather than a phasic mental activity, which did not influence differential responding to CIT items. No physiological evidence for an interaction between the parallel task and sub-processes of deception (e.g. inhibition) was found. Subjects' performance in the Go/No-go parallel task did not contribute to the detection of concealed information. Generalizability needs further investigations of different variations of the parallel task.

  5. Evaluation of psychometric properties and differential item functioning of 8-item Child Perceptions Questionnaires using item response theory.

    PubMed

    Yau, David T W; Wong, May C M; Lam, K F; McGrath, Colman

    2015-08-19

    Four-factor structure of the two 8-item short forms of Child Perceptions Questionnaire CPQ11-14 (RSF:8 and ISF:8) has been confirmed. However, the sum scores are typically reported in practice as a proxy of Oral health-related Quality of Life (OHRQoL), which implied a unidimensional structure. This study first assessed the unidimensionality of 8-item short forms of CPQ11-14. Item response theory (IRT) was employed to offer an alternative and complementary approach of validation and to overcome the limitations of classical test theory assumptions. A random sample of 649 12-year-old school children in Hong Kong was analyzed. Unidimensionality of the scale was tested by confirmatory factor analysis (CFA), principle component analysis (PCA) and local dependency (LD) statistic. Graded response model was fitted to the data. Contribution of each item to the scale was assessed by item information function (IIF). Reliability of the scale was assessed by test information function (TIF). Differential item functioning (DIF) across gender was identified by Wald test and expected score functions. Both CPQ11-14 RSF:8 and ISF:8 did not deviate much from the unidimensionality assumption. Results from CFA indicated acceptable fit of the one-factor model. PCA indicated that the first principle component explained >30 % of the total variation with high factor loadings for both RSF:8 and ISF:8. Almost all LD statistic <10 indicated the absence of local dependency. Flat and low IIFs were observed in the oral symptoms items suggesting little contribution of information to the scale and item removal caused little practical impact. Comparing the TIFs, RSF:8 showed slightly better information than ISF:8. In addition to oral symptoms items, the item "Concerned with what other people think" demonstrated a uniform DIF (p < 0.001). The expected score functions were not much different between boys and girls. Items related to oral symptoms were not informative to OHRQoL and deletion of these items is suggested. The impact of DIF across gender on the overall score was minimal. CPQ11-14 RSF:8 performed slightly better than ISF:8 in measurement precision. The 6-item short forms suggested by IRT validation should be further investigated to ensure their robustness, responsiveness and discriminative performance.

  6. Sequential detection of learning in cognitive diagnosis.

    PubMed

    Ye, Sangbeak; Fellouris, Georgios; Culpepper, Steven; Douglas, Jeff

    2016-05-01

    In order to look more closely at the many particular skills examinees utilize to answer items, cognitive diagnosis models have received much attention, and perhaps are preferable to item response models that ordinarily involve just one or a few broadly defined skills, when the objective is to hasten learning. If these fine-grained skills can be identified, a sharpened focus on learning and remediation can be achieved. The focus here is on how to detect when learning has taken place for a particular attribute and efficiently guide a student through a sequence of items to ultimately attain mastery of all attributes while administering as few items as possible. This can be seen as a problem in sequential change-point detection for which there is a long history and a well-developed literature. Though some ad hoc rules for determining learning may be used, such as stopping after M consecutive items have been successfully answered, more efficient methods that are optimal under various conditions are available. The CUSUM, Shiryaev-Roberts and Shiryaev procedures can dramatically reduce the time required to detect learning while maintaining rigorous Type I error control, and they are studied in this context through simulation. Future directions for modelling and detection of learning are discussed. © 2016 The British Psychological Society.

  7. Early-Emerging Social Adaptive Skills in Toddlers with Autism Spectrum Disorders: An Item Analysis

    ERIC Educational Resources Information Center

    Ventola, Pamela; Saulnier, Celine A.; Steinberg, Elizabeth; Chawarska, Katarzyna; Klin, Ami

    2014-01-01

    Individuals with ASD have significant impairments in adaptive skills, particularly adaptive socialization skills. The present study examined the extent to which 20 items from the Vineland Adaptive Behavior Scales-Socialization Domain differentiated between ASD and developmentally delayed (DD) groups. Participants included 108 toddlers with ASD or…

  8. Effects of Learning Experience on Forgetting Rates of Item and Associative Memories

    ERIC Educational Resources Information Center

    Yang, Jiongjiong; Zhan, Lexia; Wang, Yingying; Du, Xiaoya; Zhou, Wenxi; Ning, Xueling; Sun, Qing; Moscovitch, Morris

    2016-01-01

    Are associative memories forgotten more quickly than item memories, and does the level of original learning differentially influence forgetting rates? In this study, we addressed these questions by having participants learn single words and word pairs once (Experiment 1), three times (Experiment 2), and six times (Experiment 3) in a massed…

  9. Effect of Purification Procedures on DIF Analysis in IRTPRO

    ERIC Educational Resources Information Center

    Fikis, David R. J.; Oshima, T. C.

    2017-01-01

    Purification of the test has been a well-accepted procedure in enhancing the performance of tests for differential item functioning (DIF). As defined by Lord, purification requires reestimation of ability parameters after removing DIF items before conducting the final DIF analysis. IRTPRO 3 is a recently updated program for analyses in item…

  10. Dynamic switching between semantic and episodic memory systems.

    PubMed

    Kompus, Kristiina; Olsson, Carl-Johan; Larsson, Anne; Nyberg, Lars

    2009-09-01

    It has been suggested that episodic and semantic long-term memory systems interact during retrieval. Here we examined the flexibility of memory retrieval in an associative task taxing memories of different strength, assumed to differentially engage episodic and semantic memory. Healthy volunteers were pre-trained on a set of 36 face-name pairs over a 6-week period. Another set of 36 items was shown only once during the same time period. About 3 months after the training period all items were presented in a randomly intermixed order in an event-related fMRI study of face-name memory. Once presented items differentially activated anterior cingulate cortex and a right prefrontal region that previously have been associated with episodic retrieval mode. High-familiar items were associated with stronger activation of posterior cortices and a left frontal region. These findings fit a model of memory retrieval by which early processes determine, on a trial-by-trial basis, if the task can be solved by the default semantic system. If not, there is a dynamic shift to cognitive control processes that guide retrieval from episodic memory.

  11. Adolescent Depression: Differential Symptom Presentations in Deaf and Hard-of-Hearing Youth Using the Patient Health Questionnaire-9.

    PubMed

    Bozzay, Melanie L; O'Leary, Kimberly N; De Nadai, Alessandro S; Gryglewicz, Kim; Romero, Gabriela; Karver, Marc S

    2017-04-01

    The present study examined differences in symptom presentation in screening for pediatric depression via evaluation of the Patient Health Questionnaire-9 (PHQ-9). In particular, we examined whether PHQ-9 items function differentially among deaf and hard-of-hearing (DHH; n = 75) and hearing (n = 75) youth based on participants recruited from crisis assessment services. Multiple indicators multiple causes models were used to examine whether items of the PHQ-9 functioned differently between groups as well as whether there were group differences in the mean severity of depressive symptoms. Results indicate that DHH youth were more likely to endorse psychosomatic items, and less likely to endorse an affective item. These findings indicate that the PHQ-9 functions differently when used with DHH youth. Implications of these findings are discussed, including both for future work with the PHQ-9 and with regard to the conceptualization of depression across hearing groups. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  12. The Piper Fatigue Scale-12 (PFS-12): psychometric findings and item reduction in a cohort of breast cancer survivors.

    PubMed

    Reeve, Bryce B; Stover, Angela M; Alfano, Catherine M; Smith, Ashley Wilder; Ballard-Barbash, Rachel; Bernstein, Leslie; McTiernan, Anne; Baumgartner, Kathy B; Piper, Barbara F

    2012-11-01

    Brief, valid measures of fatigue, a prevalent and distressing cancer symptom, are needed for use in research. This study's primary aim was to create a shortened version of the revised Piper Fatigue Scale (PFS-R) based on data from a diverse cohort of breast cancer survivors. A secondary aim was to determine whether the PFS captured multiple distinct aspects of fatigue (a multidimensional model) or a single overall fatigue factor (a unidimensional model). Breast cancer survivors (n = 799; stages in situ through IIIa; ages 29-86 years) were recruited through three SEER registries (New Mexico, Western Washington, and Los Angeles, CA) as part of the Health, Eating, Activity, and Lifestyle (HEAL) study. Fatigue was measured approximately 3 years post-diagnosis using the 22-item PFS-R that has four subscales (Behavior, Affect, Sensory, and Cognition). Confirmatory factor analysis was used to compare unidimensional and multidimensional models. Six criteria were used to make item selections to shorten the PFS-R: scale's content validity, items' relationship with fatigue, content redundancy, differential item functioning by race and/or education, scale reliability, and literacy demand. Factor analyses supported the original 4-factor structure. There was also evidence from the bi-factor model for a dominant underlying fatigue factor. Six items tested positive for differential item functioning between African-American and Caucasian survivors. Four additional items either showed poor association, local dependence, or content validity concerns. After removing these 10 items, the reliability of the PFS-12 subscales ranged from 0.87 to 0.89, compared to 0.90-0.94 prior to item removal. The newly developed PFS-12 can be used to assess fatigue in African-American and Caucasian breast cancer survivors and reduces response burden without compromising reliability or validity. This is the first study to determine PFS literacy demand and to compare PFS-R responses in African-Americans and Caucasian breast cancer survivors. Further testing in diverse populations is warranted.

  13. Evaluation of Internal Construct Validity and Unidimensionality of the Brachial Assessment Tool, A Patient-Reported Outcome Measure for Brachial Plexus Injury.

    PubMed

    Hill, Bridget; Pallant, Julie; Williams, Gavin; Olver, John; Ferris, Scott; Bialocerkowski, Andrea

    2016-12-01

    To evaluate the internal construct validity and dimensionality of a new patient-reported outcome measure for people with traumatic brachial plexus injury (BPI) based on the International Classification of Functioning, Disability and Health definition of activity. Cross-sectional study. Outpatient clinics. Adults (age range, 18-82y) with a traumatic BPI (N=106). There were 106 people with BPI who completed a 51-item 5-response questionnaire. Responses were analyzed in 4 phases (missing responses, item correlations, exploratory factor analysis, and Rasch analysis) to evaluate the properties of fit to the Rasch model, threshold response, local dependency, dimensionality, differential item functioning, and targeting. Not applicable, as this study addresses the development of an outcome measure. Six items were deleted for missing responses, and 10 were deleted for high interitem correlations >.81. The remaining 35 items, while demonstrating fit to the Rasch model, showed evidence of local dependency and multidimensionality. Items were divided into 3 subscales: dressing and grooming (8 items), arm and hand (17 items), and no hand (6 items). All 3 subscales demonstrated fit to the model with no local dependency, minimal disordered thresholds, no unidimensionality or differential item functioning for age, time postinjury, or self-selected dominance. Subscales were combined into 3 subtests and demonstrated fit to the model, no misfit, and unidimensionality, allowing calculation of a summary score. This preliminary analysis supports the internal construct validity of the Brachial Assessment Tool, a unidimensional targeted 4-response patient-reported outcome measure designed to solely assess activity after traumatic BPI regardless of level of injury, age at recruitment, premorbid limb dominance, and time postinjury. Further examination is required to determine test-retest reliability and responsiveness. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  14. Method variation in the impact of missing data on response shift detection.

    PubMed

    Schwartz, Carolyn E; Sajobi, Tolulope T; Verdam, Mathilde G E; Sebille, Veronique; Lix, Lisa M; Guilleux, Alice; Sprangers, Mirjam A G

    2015-03-01

    Missing data due to attrition or item non-response can result in biased estimates and loss of power in longitudinal quality-of-life (QOL) research. The impact of missing data on response shift (RS) detection is relatively unknown. This overview article synthesizes the findings of three methods tested in this special section regarding the impact of missing data patterns on RS detection in incomplete longitudinal data. The RS detection methods investigated include: (1) Relative importance analysis to detect reprioritization RS in stroke caregivers; (2) Oort's structural equation modeling (SEM) to detect recalibration, reprioritization, and reconceptualization RS in cancer patients; and (3) Rasch-based item-response theory-based (IRT) models as compared to SEM models to detect recalibration and reprioritization RS in hospitalized chronic disease patients. Each method dealt with missing data differently, either with imputation (1), attrition-based multi-group analysis (2), or probabilistic analysis that is robust to missingness due to the specific objectivity property (3). Relative importance analyses were sensitive to the type and amount of missing data and imputation method, with multiple imputation showing the largest RS effects. The attrition-based multi-group SEM revealed differential effects of both the changes in health-related QOL and the occurrence of response shift by attrition stratum, and enabled a more complete interpretation of findings. The IRT RS algorithm found evidence of small recalibration and reprioritization effects in General Health, whereas SEM mostly evidenced small recalibration effects. These differences may be due to differences between the two methods in handling of missing data. Missing data imputation techniques result in different conclusions about the presence of reprioritization RS using the relative importance method, while the attrition-based SEM approach highlighted different recalibration and reprioritization RS effects by attrition group. The IRT analyses detected more recalibration and reprioritization RS effects than SEM, presumably due to IRT's robustness to missing data. Future research should apply simulation techniques in order to make conclusive statements about the impacts of missing data according to the type and amount of RS.

  15. Comparison of Procedures for Detecting Test-Item Bias with Both Internal and External Ability Criteria.

    ERIC Educational Resources Information Center

    Shepard, Lorrie, And Others

    1981-01-01

    Sixteen approaches for detecting item bias were compared on samples of Black, White, and Chicano elementary school pupils using the Lorge-Thorndike and Raven's Coloured Progressive Matrices tests. Recommendations for practical use are made. (JKS)

  16. Perceptual integration of motion and form information: evidence of parallel-continuous processing.

    PubMed

    von Mühlenen, A; Müller, H J

    2000-04-01

    In three visual search experiments, the processes involved in the efficient detection of motion-form conjunction targets were investigated. Experiment 1 was designed to estimate the relative contributions of stationary and moving nontargets to the search rate. Search rates were primarily determined by the number of moving nontargets; stationary nontargets sharing the target form also exerted a significant effect, but this was only about half as strong as that of moving nontargets; stationary nontargets not sharing the target form had little influence. In Experiments 2 and 3, the effects of display factors influencing the visual (form) quality of moving items (movement speed and item size) were examined. Increasing the speed of the moving items (> 1.5 degrees/sec) facilitated target detection when the task required segregation of the moving from the stationary items. When no segregation was necessary, increasing the movement speed impaired performance: With large display items, motion speed had little effect on target detection, but with small items, search efficiency declined when items moved faster than 1.5 degrees/sec. This pattern indicates that moving nontargets exert a strong effect on the search rate (Experiment 1) because of the loss of visual quality for moving items above a certain movement speed. A parallel-continuous processing account of motion-form conjunction search is proposed, which combines aspects of Guided Search (Wolfe, 1994) and attentional engagement theory (Duncan & Humphreys, 1989).

  17. Examining Multiple Sources of Differential Item Functioning on the Clinician & Group CAHPS® Survey

    PubMed Central

    Rodriguez, Hector P; Crane, Paul K

    2011-01-01

    Objective To evaluate psychometric properties of a widely used patient experience survey. Data Sources English-language responses to the Clinician & Group Consumer Assessment of Healthcare Providers and Systems (CG-CAHPS®) survey (n = 12,244) from a 2008 quality improvement initiative involving eight southern California medical groups. Methods We used an iterative hybrid ordinal logistic regression/item response theory differential item functioning (DIF) algorithm to identify items with DIF related to patient sociodemographic characteristics, duration of the physician–patient relationship, number of physician visits, and self-rated physical and mental health. We accounted for all sources of DIF and determined its cumulative impact. Principal Findings The upper end of the CG-CAHPS® performance range is measured with low precision. With sensitive settings, some items were found to have DIF. However, overall DIF impact was negligible, as 0.14 percent of participants had salient DIF impact. Latinos who spoke predominantly English at home had the highest prevalence of salient DIF impact at 0.26 percent. Conclusions The CG-CAHPS® functions similarly across commercially insured respondents from diverse backgrounds. Consequently, previously documented racial and ethnic group differences likely reflect true differences rather than measurement bias. The impact of low precision at the upper end of the scale should be clarified. PMID:22092021

  18. A comparison of three methods of assessing differential item functioning (DIF) in the Hospital Anxiety Depression Scale: ordinal logistic regression, Rasch analysis and the Mantel chi-square procedure.

    PubMed

    Cameron, Isobel M; Scott, Neil W; Adler, Mats; Reid, Ian C

    2014-12-01

    It is important for clinical practice and research that measurement scales of well-being and quality of life exhibit only minimal differential item functioning (DIF). DIF occurs where different groups of people endorse items in a scale to different extents after being matched by the intended scale attribute. We investigate the equivalence or otherwise of common methods of assessing DIF. Three methods of measuring age- and sex-related DIF (ordinal logistic regression, Rasch analysis and Mantel χ(2) procedure) were applied to Hospital Anxiety Depression Scale (HADS) data pertaining to a sample of 1,068 patients consulting primary care practitioners. Three items were flagged by all three approaches as having either age- or sex-related DIF with a consistent direction of effect; a further three items identified did not meet stricter criteria for important DIF using at least one method. When applying strict criteria for significant DIF, ordinal logistic regression was slightly less sensitive. Ordinal logistic regression, Rasch analysis and contingency table methods yielded consistent results when identifying DIF in the HADS depression and HADS anxiety scales. Regardless of methods applied, investigators should use a combination of statistical significance, magnitude of the DIF effect and investigator judgement when interpreting the results.

  19. Detection of abnormal item based on time intervals for recommender systems.

    PubMed

    Gao, Min; Yuan, Quan; Ling, Bin; Xiong, Qingyu

    2014-01-01

    With the rapid development of e-business, personalized recommendation has become core competence for enterprises to gain profits and improve customer satisfaction. Although collaborative filtering is the most successful approach for building a recommender system, it suffers from "shilling" attacks. In recent years, the research on shilling attacks has been greatly improved. However, the approaches suffer from serious problem in attack model dependency and high computational cost. To solve the problem, an approach for the detection of abnormal item is proposed in this paper. In the paper, two common features of all attack models are analyzed at first. A revised bottom-up discretized approach is then proposed based on time intervals and the features for the detection. The distributions of ratings in different time intervals are compared to detect anomaly based on the calculation of chi square distribution (χ(2)). We evaluated our approach on four types of items which are defined according to the life cycles of these items. The experimental results show that the proposed approach achieves a high detection rate with low computational cost when the number of attack profiles is more than 15. It improves the efficiency in shilling attacks detection by narrowing down the suspicious users.

  20. Assessing normative cut points through differential item functioning analysis: an example from the adaptation of the Middlesex Elderly Assessment of Mental State (MEAMS) for use as a cognitive screening test in Turkey.

    PubMed

    Tennant, Alan; Küçükdeveci, Ayse A; Kutlay, Sehim; Elhan, Atilla H

    2006-03-23

    The Middlesex Elderly Assessment of Mental State (MEAMS) was developed as a screening test to detect cognitive impairment in the elderly. It includes 12 subtests, each having a 'pass score'. A series of tasks were undertaken to adapt the measure for use in the adult population in Turkey and to determine the validity of existing cut points for passing subtests, given the wide range of educational level in the Turkish population. This study focuses on identifying and validating the scoring system of the MEAMS for Turkish adult population. After the translation procedure, 350 normal subjects and 158 acquired brain injury patients were assessed by the Turkish version of MEAMS. Initially, appropriate pass scores for the normal population were determined through ANOVA post-hoc tests according to age, gender and education. Rasch analysis was then used to test the internal construct validity of the scale and the validity of the cut points for pass scores on the pooled data by using Differential Item Functioning (DIF) analysis within the framework of the Rasch model. Data with the initially modified pass scores were analyzed. DIF was found for certain subtests by age and education, but not for gender. Following this, pass scores were further adjusted and data re-fitted to the model. All subtests were found to fit the Rasch model (mean item fit 0.184, SD 0.319; person fit -0.224, SD 0.557) and DIF was then found to be absent. Thus the final pass scores for all subtests were determined. The MEAMS offers a valid assessment of cognitive state for the adult Turkish population, and the revised cut points accommodate for age and education. Further studies are required to ascertain the validity in different diagnostic groups.

  1. Investigating Separate and Concurrent Approaches for Item Parameter Drift in 3PL Item Response Theory Equating

    ERIC Educational Resources Information Center

    Arce-Ferrer, Alvaro J.; Bulut, Okan

    2017-01-01

    This study examines separate and concurrent approaches to combine the detection of item parameter drift (IPD) and the estimation of scale transformation coefficients in the context of the common item nonequivalent groups design with the three-parameter item response theory equating. The study uses real and synthetic data sets to compare the two…

  2. Is the Factor Observed in Investigations on the Item-Position Effect Actually the Difficulty Factor?

    PubMed

    Schweizer, Karl; Troche, Stefan

    2018-02-01

    In confirmatory factor analysis quite similar models of measurement serve the detection of the difficulty factor and the factor due to the item-position effect. The item-position effect refers to the increasing dependency among the responses to successively presented items of a test whereas the difficulty factor is ascribed to the wide range of item difficulties. The similarity of the models of measurement hampers the dissociation of these factors. Since the item-position effect should theoretically be independent of the item difficulties, the statistical ex post manipulation of the difficulties should enable the discrimination of the two types of factors. This method was investigated in two studies. In the first study, Advanced Progressive Matrices (APM) data of 300 participants were investigated. As expected, the factor thought to be due to the item-position effect was observed. In the second study, using data simulated to show the major characteristics of the APM data, the wide range of items with various difficulties was set to zero to reduce the likelihood of detecting the difficulty factor. Despite this reduction, however, the factor now identified as item-position factor, was observed in virtually all simulated datasets.

  3. The Role of Medial Temporal Lobe Regions in Incidental and Intentional Retrieval of Item and Relational Information in Aging.

    PubMed

    Wang, Wei-Chun; Giovanello, Kelly S

    2016-06-01

    Considerable neuropsychological and neuroimaging work indicates that the medial temporal lobes are critical for both item and relational memory retrieval. However, there remain outstanding issues in the literature, namely the extent to which medial temporal lobe regions are differentially recruited during incidental and intentional retrieval of item and relational information, and the extent to which aging may affect these neural substrates. The current fMRI study sought to address these questions; participants incidentally encoded word pairs embedded in sentences and incidental item and relational retrieval were assessed through speeded reading of intact, rearranged, and new word-pair sentences, while intentional item and relational retrieval were assessed through old/new associative recognition of a separate set of intact, rearranged, and new word pairs. Results indicated that, in both younger and older adults, anterior hippocampus and perirhinal cortex indexed incidental and intentional item retrieval in the same manner. In contrast, posterior hippocampus supported incidental and intentional relational retrieval in both age groups and an adjacent cluster in posterior hippocampus was recruited during both forms of relational retrieval for older, but not younger, adults. Our findings suggest that while medial temporal lobe regions do not differentiate between incidental and intentional forms of retrieval, there are distinct roles for anterior and posterior medial temporal lobe regions during retrieval of item and relational information, respectively, and further indicate that posterior regions may, under certain conditions, be over-recruited in healthy aging. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  4. Clinical characteristics of patients with major depressive disorder with and without hypothyroidism: a comparative study.

    PubMed

    Mowla, Arash; Kalantarhormozi, Mohammad Reza; Khazraee, Samaneh

    2011-01-01

    Differentiating major depressive disorder (MDD) without hypothyroidism from MDD associated with hypothyroidism can be challenging. Therefore some authors have suggested that thyroid function should be tested in all depressed patients. This study compared the clinical characteristics of patients with MDD associated with hypothyroidism with those of patients with MDD without hypothyroidism. Thyroid function tests were administered to 75 patients (60 female and 15 male) who met DSM-IV criteria for MDD. The 15 patients with hypothyroidism (8 with subclinical hypothyroidism and 7 with overt hypothyroidism) were compared with the other 60 patients with regard to depressive characteristics. The primary measure of depressive signs and symptoms used to assess depression severity and symptoms was the Hamilton Rating Scale for Depression, first 17 items (Ham-D-17). Baseline demographic data, including age and sex, were also compared. The two groups did not differ significantly in severity of overall depression at baseline, as measured by total score on the Ham-D-17 (P=0.471, Z=0.970). Patients with MDD without hypothyroidism had worse scores on item 1 (depressed mood), item 2 (feelings of guilt), item 3 (suicidality), item 6 (late insomnia), and item 16 (loss of weight). In contrast, depressed patients with hypothyroidism had more severe anxiety symptoms and greater agitation (items 9, 10, and 11). Our results may help clinicians differentiate MDD associated with hypothyroidism from MDD without hypothyroidism. Depressed patients with hypothyroidism had more anxiety symptoms and greater agitation, but they had fewer severe core depressive symptoms and biological signs of MDD. (Journal of Psychiatric Practice. 2011;17:67-71).

  5. Mayo-Portland adaptability inventory: comparing psychometrics in cerebrovascular accident to traumatic brain injury.

    PubMed

    Malec, James F; Kean, Jacob; Altman, Irwin M; Swick, Shannon

    2012-12-01

    (1) To evaluate the measurement reliability and construct validity of the Mayo-Portland Adaptability Inventory, 4th revision (MPAI-4) in a sample consisting exclusively of patients with cerebrovascular accident (CVA) using single parameter (Rasch) item-response methods; (2) to examine the differential item functioning (DIF) by sex within the CVA population; and (3) to examine DIF and differential test functioning (DTF) across traumatic brain injury (TBI) and CVA samples. Retrospective psychometric analysis of rating scale data. Home- and community-based brain injury rehabilitation program. Individuals post-CVA (n=861) and individuals with TBI (n=603). Not applicable. MPAI-4. Item data on admission to community-based rehabilitation were submitted to Rasch, DIF, and DTF analyses. The final calibration in the CVA sample revealed satisfactory reliability/separation for persons (.91/3.16) and items (1.00/23.64). DIF showed that items for pain, anger, audition, and memory were associated with higher levels of disability for CVA than TBI patients; whereas, self-care, mobility, and use of hands indicated greater overall disability for TBI patients. DTF analyses showed a high degree of association between the 2 sets of items (R=.92; R(2)=.85) and, at most, a 3.7 point difference in raw scores. The MPAI-4 demonstrates satisfactory psychometric properties for use with individuals with CVA applying for interdisciplinary posthospital rehabilitation. DIF reveals clinically meaningful differences between CVA and TBI groups that should be considered in results at the item and subscale level. Copyright © 2012 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  6. Construct and Differential Item Functioning in the Assessment of Prescription Opioid Use Disorders among American Adolescents

    ERIC Educational Resources Information Center

    Wu, Li-Tzy; Ringwalt, Christopher L.; Yang, Chongming; Reeve, Bryce B.; Pan, Jeng-Jong; Blazer, Dan G.

    2009-01-01

    DSM-IV's hierarchical distinction between abuse of and dependence on prescription opioids is not supported since the symptoms of abuse in adolescents are not less severe than dependence. The finding is based on the examination of the DSM-IV criteria for opioid use disorders using item response theory.

  7. Differential Item Functioning on a Measure of Perceptions of Preparation for Teachers, Teacher Candidates, and Program Personnel

    ERIC Educational Resources Information Center

    Donovan, Courtney; Green, Kathy E.; Seidel, Kent

    2017-01-01

    Core competencies essential for effective teaching were identified via a literature review and a review of standards for teacher education, and vetted by state groups with interests in teacher education. Survey items based on these competencies asked teacher candidates, graduates, and teacher education program faculty how well the program prepared…

  8. Measurement of Teen Dating Violence Attitudes: An Item Response Theory Evaluation of Differential Item Functioning According to Gender

    ERIC Educational Resources Information Center

    Edelen, Maria Orlando; McCaffrey, Daniel F.; Marshall, Grant N.; Jaycox, Lisa H.

    2009-01-01

    Accurate assessment of attitudes about intimate partner violence is important for evaluation of prevention and early intervention programs. Assessment of attitudes about cross-gender interactions is particularly susceptible to bias because it requires specifying the gender of the perpetrator and the victim. As it is likely that respondents will…

  9. Testing the Item-Order Account of Design Effects Using the Production Effect

    ERIC Educational Resources Information Center

    Jonker, Tanya R.; Levene, Merrick; MacLeod, Colin M.

    2014-01-01

    A number of memory phenomena evident in recall in within-subject, mixed-lists designs are reduced or eliminated in between-subject, pure-list designs. The item-order account (McDaniel & Bugg, 2008) proposes that differential retention of order information might underlie this pattern. According to this account, order information may be encoded…

  10. Testing for DIF in a Model with Single Peaked Item Characteristic Curves: The PARELLA Model.

    ERIC Educational Resources Information Center

    Hoijtink, Herbert; Molenaar, Ivo W.

    1992-01-01

    The PARallELogram Analysis (PARELLA) model is a probabilistic parallelogram model that can be used for the measurement of latent attitudes or latent preferences. A method is presented for testing for differential item functioning (DIF) for the PARELLA model using the approach of D. Thissen and others (1988). (SLD)

  11. Disparities in Sense of Community: True Race Differences or Differential Item Functioning?

    ERIC Educational Resources Information Center

    Coffman, Donna L.; BeLue, Rhonda

    2009-01-01

    The sense of community index (SCI) has been widely used to measure psychological sense of community (SOC). Furthermore, SOC has been found to differ among racial groups. Because different ethnic groups have different cultural and historical experiences that may lead to different interpretations of measurement items, it is important to know whether…

  12. Small-Sample DIF Estimation Using SIBTEST, Cochran's Z, and Log-Linear Smoothing

    ERIC Educational Resources Information Center

    Lei, Pui-Wa; Li, Hongli

    2013-01-01

    Minimum sample sizes of about 200 to 250 per group are often recommended for differential item functioning (DIF) analyses. However, there are times when sample sizes for one or both groups of interest are smaller than 200 due to practical constraints. This study attempts to examine the performance of Simultaneous Item Bias Test (SIBTEST),…

  13. Examination of a Social-Networking Site Activities Scale (SNSAS) Using Rasch Analysis

    ERIC Educational Resources Information Center

    Alhaythami, Hassan; Karpinski, Aryn; Kirschner, Paul; Bolden, Edward

    2017-01-01

    This study examined the psychometric properties of a social-networking site (SNS) activities scale (SNSAS) using Rasch Analysis. Items were also examined with Rasch Principal Components Analysis (PCA) and Differential Item Functioning (DIF) across groups of university students (i.e., males and females from the United States [US] and Europe; N =…

  14. Examining Gender DIF on a Multiple-Choice Test of Mathematics: A Confirmatory Approach.

    ERIC Educational Resources Information Center

    Ryan, Katherine E.; Fan, Meichu

    1996-01-01

    Results for 3,244 female and 3,033 male junior high school students from the Second International Mathematics Study show that applied items in algebra, geometry, and computation were easier for males but arithmetic items were differentially easier for females. Implications of these findings for assessment and instruction are discussed. (SLD)

  15. An Anthropologist among the Psychometricians: Assessment Events, Ethnography, and Differential Item Functioning in the Mongolian Gobi

    ERIC Educational Resources Information Center

    Maddox, Bryan; Zumbo, Bruno D.; Tay-Lim, Brenda; Qu, Demin

    2015-01-01

    This article explores the potential for ethnographic observations to inform the analysis of test item performance. In 2010, a standardized, large-scale adult literacy assessment took place in Mongolia as part of the United Nations Educational, Scientific and Cultural Organization Literacy Assessment and Monitoring Programme (LAMP). In a novel form…

  16. Evaluation of MIMIC-Model Methods for DIF Testing with Comparison to Two-Group Analysis

    ERIC Educational Resources Information Center

    Woods, Carol M.

    2009-01-01

    Differential item functioning (DIF) occurs when an item on a test or questionnaire has different measurement properties for 1 group of people versus another, irrespective of mean differences on the construct. This study focuses on the use of multiple-indicator multiple-cause (MIMIC) structural equation models for DIF testing, parameterized as item…

  17. A Mixture Rasch Model with a Covariate: A Simulation Study via Bayesian Markov Chain Monte Carlo Estimation

    ERIC Educational Resources Information Center

    Dai, Yunyun

    2013-01-01

    Mixtures of item response theory (IRT) models have been proposed as a technique to explore response patterns in test data related to cognitive strategies, instructional sensitivity, and differential item functioning (DIF). Estimation proves challenging due to difficulties in identification and questions of effect size needed to recover underlying…

  18. To Sum or Not to Sum: Taxometric Analysis with Ordered Categorical Assessment Items

    ERIC Educational Resources Information Center

    Walters, Glenn D.; Ruscio, John

    2009-01-01

    Meehl's taxometric method has been shown to differentiate between categorical and dimensional data, but there are many ways to implement taxometric procedures. When analyzing the ordered categorical data typically provided by assessment instruments, summing items to form input indicators has been a popular practice for more than 20 years. A Monte…

  19. Examining Measurement Invariance and Differential Item Functioning with Discrete Latent Construct Indicators: A Note on a Multiple Testing Procedure

    ERIC Educational Resources Information Center

    Raykov, Tenko; Dimitrov, Dimiter M.; Marcoulides, George A.; Li, Tatyana; Menold, Natalja

    2018-01-01

    A latent variable modeling method for studying measurement invariance when evaluating latent constructs with multiple binary or binary scored items with no guessing is outlined. The approach extends the continuous indicator procedure described by Raykov and colleagues, utilizes similarly the false discovery rate approach to multiple testing, and…

  20. Influence of item distribution pattern and abundance on efficiency of benthic core sampling

    USGS Publications Warehouse

    Behney, Adam C.; O'Shaughnessy, Ryan; Eichholz, Michael W.; Stafford, Joshua D.

    2014-01-01

    ore sampling is a commonly used method to estimate benthic item density, but little information exists about factors influencing the accuracy and time-efficiency of this method. We simulated core sampling in a Geographic Information System framework by generating points (benthic items) and polygons (core samplers) to assess how sample size (number of core samples), core sampler size (cm2), distribution of benthic items, and item density affected the bias and precision of estimates of density, the detection probability of items, and the time-costs. When items were distributed randomly versus clumped, bias decreased and precision increased with increasing sample size and increased slightly with increasing core sampler size. Bias and precision were only affected by benthic item density at very low values (500–1,000 items/m2). Detection probability (the probability of capturing ≥ 1 item in a core sample if it is available for sampling) was substantially greater when items were distributed randomly as opposed to clumped. Taking more small diameter core samples was always more time-efficient than taking fewer large diameter samples. We are unable to present a single, optimal sample size, but provide information for researchers and managers to derive optimal sample sizes dependent on their research goals and environmental conditions.

  1. Cross-informant and cross-national equivalence using item-response theory (IRT) linking: A case study using the behavioral assessment for children of African heritage in the United States and Jamaica.

    PubMed

    Lambert, Michael Canute; Ferguson, Gail M; Rowan, George T

    2016-03-01

    Cross-national study of adolescents' psychological adjustment requires measures that permit reliable and valid assessment across informants and nations, but such measures are virtually nonexistent. Item-response-theory-based linking is a promising yet underutilized methodological procedure that permits more accurate assessment across informants and nations. To demonstrate this procedure, the Resilience Scale of the Behavioral Assessment for Children of African Heritage (Lambert et al., 2005) was administered to 250 African American and 294 Jamaican nonreferred adolescents and their caregivers. Multiple items without significant differential item functioning emerged, allowing scale linking across informants and nations. Calibrating item parameters via item response theory linking can permit cross-informant cross-national assessment of youth. (c) 2016 APA, all rights reserved).

  2. Multiple Hypnotizabilities: Differentiating the Building Blocks of Hypnotic Response

    ERIC Educational Resources Information Center

    Woody, Erik Z.; Barnier, Amanda J.; McConkey, Kevin M.

    2005-01-01

    Although hypnotizability can be conceptualized as involving component subskills, standard measures do not differentiate them from a more general unitary trait, partly because the measures include limited sets of dichotomous items. To overcome this, the authors applied full-information factor analysis, a sophisticated analytic approach for…

  3. Tree-Based Global Model Tests for Polytomous Rasch Models

    ERIC Educational Resources Information Center

    Komboz, Basil; Strobl, Carolin; Zeileis, Achim

    2018-01-01

    Psychometric measurement models are only valid if measurement invariance holds between test takers of different groups. Global model tests, such as the well-established likelihood ratio (LR) test, are sensitive to violations of measurement invariance, such as differential item functioning and differential step functioning. However, these…

  4. Measuring psychological trauma after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Psychological Trauma item bank and short form

    PubMed Central

    Kisala, Pamela A.; Victorson, David; Pace, Natalie; Heinemann, Allen W.; Choi, Seung W.; Tulsky, David S.

    2015-01-01

    Objective To describe the development and psychometric properties of the SCI-QOL Psychological Trauma item bank and short form. Design Using a mixed-methods design, we developed and tested a Psychological Trauma item bank with patient and provider focus groups, cognitive interviews, and item response theory based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. Setting We tested a 31-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Veterans Administration hospital. Participants A total of 716 individuals with SCI completed the trauma items Results The 31 items fit a unidimensional model (CFI=0.952; RMSEA=0.061) and demonstrated good precision (theta range between 0.6 and 2.5). Nine items demonstrated negligible DIF with little impact on score estimates. The final calibrated item bank contains 19 items Conclusion The SCI-QOL Psychological Trauma item bank is a psychometrically robust measurement tool from which a short form and a computer adaptive test (CAT) version are available. PMID:26010967

  5. Assessing psychological well-being: self-report instruments for the NIH Toolbox.

    PubMed

    Salsman, John M; Lai, Jin-Shei; Hendrie, Hugh C; Butt, Zeeshan; Zill, Nicholas; Pilkonis, Paul A; Peterson, Christopher; Stoney, Catherine M; Brouwers, Pim; Cella, David

    2014-02-01

    Psychological well-being (PWB) has a significant relationship with physical and mental health. As a part of the NIH Toolbox for the Assessment of Neurological and Behavioral Function, we developed self-report item banks and short forms to assess PWB. Expert feedback and literature review informed the selection of PWB concepts and the development of item pools for positive affect, life satisfaction, and meaning and purpose. Items were tested with a community-dwelling US Internet panel sample of adults aged 18 and above (N = 552). Classical and item response theory (IRT) approaches were used to evaluate unidimensionality, fit of items to the overall measure, and calibrations of those items, including differential item function (DIF). IRT-calibrated item banks were produced for positive affect (34 items), life satisfaction (16 items), and meaning and purpose (18 items). Their psychometric properties were supported based on the results of factor analysis, fit statistics, and DIF evaluation. All banks measured the concepts precisely (reliability ≥0.90) for more than 98% of participants. These adult scales and item banks for PWB provide the flexibility, efficiency, and precision necessary to promote future epidemiological, observational, and intervention research on the relationship of PWB with physical and mental health.

  6. Restricted interests and teacher presentation of items.

    PubMed

    Stocco, Corey S; Thompson, Rachel H; Rodriguez, Nicole M

    2011-01-01

    Restricted and repetitive behavior (RRB) is more pervasive, prevalent, frequent, and severe in individuals with autism spectrum disorders (ASDs) than in their typical peers. One subtype of RRB is restricted interests in items or activities, which is evident in the manner in which individuals engage with items (e.g., repetitious wheel spinning), the types of items or activities they select (e.g., preoccupation with a phone book), or the range of items or activities they select (i.e., narrow range of items). We sought to describe the relation between restricted interests and teacher presentation of items. Overall, we observed 5 teachers interacting with 2 pairs of students diagnosed with an ASD. Each pair included 1 student with restricted interests. During these observations, teachers were free to present any items from an array of 4 stimuli selected by experimenters. We recorded student responses to teacher presentation of items and analyzed the data to determine the relation between teacher presentation of items and the consequences for presentation provided by the students. Teacher presentation of items corresponded with differential responses provided by students with ASD, and those with restricted preferences experienced a narrower array of items.

  7. Detection and segmentation of multiple touching product inspection items

    NASA Astrophysics Data System (ADS)

    Casasent, David P.; Talukder, Ashit; Cox, Westley; Chang, Hsuan-Ting; Weber, David

    1996-12-01

    X-ray images of pistachio nuts on conveyor trays for product inspection are considered. The first step in such a processor is to locate each individual item and place it in a separate file for input to a classifier to determine the quality of each nut. This paper considers new techniques to: detect each item (each nut can be in any orientation, we employ new rotation-invariant filters to locate each item independent of its orientation), produce separate image files for each item [a new blob coloring algorithm provides this for isolated (non-touching) input items], segmentation to provide separate image files for touching or overlapping input items (we use a morphological watershed transform to achieve this), and morphological processing to remove the shell and produce an image of only the nutmeat. Each of these operations and algorithms are detailed and quantitative data for each are presented for the x-ray image nut inspection problem noted. These techniques are of general use in many different product inspection problems in agriculture and other areas.

  8. A Procedure To Detect Test Bias Present Simultaneously in Several Items.

    ERIC Educational Resources Information Center

    Shealy, Robin; Stout, William

    A statistical procedure is presented that is designed to test for unidirectional test bias existing simultaneously in several items of an ability test, based on the assumption that test bias is incipient within the two groups' ability differences. The proposed procedure--Simultaneous Item Bias (SIB)--is based on a multidimensional item response…

  9. Use of expert consensus to improve atherogenic dyslipidemia management.

    PubMed

    Millán Núñez-Cortés, Jesús; Pedro-Botet, Juan; Brea-Hernando, Ángel; Díaz-Rodríguez, Ángel; González-Santos, Pedro; Hernández-Mijares, Antonio; Mantilla-Morató, Teresa; Pintó-Sala, Xavier; Simó, Rafael

    2014-01-01

    Although atherogenic dyslipidemia is a recognized cardiovascular risk factor, it is often underassessed and thus undertreated and poorly controlled in clinical practice. The objective of this study was to reach a multidisciplinary consensus for the establishment of a set of clinical recommendations on atherogenic dyslipidemia to optimize its prevention, early detection, diagnostic evaluation, therapeutic approach, and follow-up. After a review of the scientific evidence, a scientific committee formulated 87 recommendations related to atherogenic dyslipidemia, which were grouped into 5 subject areas: general concepts (10 items), impact and epidemiology (4 items), cardiovascular risk (32 items), detection and diagnosis (19 items), and treatment (22 items). A 2-round modified Delphi method was conducted to compare the opinions of a panel of 65 specialists in cardiology (23%), endocrinology (24.6%), family medicine (27.7%), and internal medicine (24.6%) on these issues. After the first round, the panel reached consensus on 65 of the 87 items discussed, and agreed on 76 items by the end of the second round. Insufficient consensus was reached on 3 items related to the detection and diagnosis of atherogenic dyslipidemia and 3 items related to the therapeutic goals to be achieved in these patients. The external assessment conducted by experts on atherogenic dyslipidemia showed a high level of professional agreement with the proposed clinical recommendations. These recommendations represent a useful tool for improving the clinical management of patients with atherogenic dyslipidemia. A detailed analysis of the current scientific evidence is required for those statements that eluded consensus. Copyright © 2013 Sociedad Española de Cardiología. Published by Elsevier Espana. All rights reserved.

  10. Teacher Perceived Difficulty in Implementing Differentiated Instructional Strategies in Primary School

    ERIC Educational Resources Information Center

    Gaitas, Sérgio; Alves Martins, Margarida

    2017-01-01

    This study analyses teacher perceived difficulty in implementing differentiated instructional strategies in regular classes. The participants were 273 Portuguese primary school teachers with teaching experience ranging from 1 to 33 years. A 39-item questionnaire was used to evaluate teacher perceived difficulty in relation to different…

  11. Item Response Theory Analysis of the Psychopathic Personality Inventory-Revised.

    PubMed

    Eichenbaum, Alexander E; Marcus, David K; French, Brian F

    2017-06-01

    This study examined item and scale functioning in the Psychopathic Personality Inventory-Revised (PPI-R) using an item response theory analysis. PPI-R protocols from 1,052 college student participants (348 male, 704 female) were analyzed. Analyses were conducted on the 131 self-report items comprising the PPI-R's eight content scales, using a graded response model. Scales collected a majority of their information about respondents possessing higher than average levels of the traits being measured. Each scale contained at least some items that evidenced limited ability to differentiate between respondents with differing levels of the trait being measured. Moreover, 80 items (61.1%) yielded significantly different responses between men and women presumably possessing similar levels of the trait being measured. Item performance was also influenced by the scoring format (directly scored vs. reverse-scored) of the items. Overall, the results suggest that the PPI-R, despite identifying psychopathic personality traits in individuals possessing high levels of those traits, may not identify these traits equally well for men and women, and scores are likely influenced by the scoring format of the individual item and scale.

  12. Development of an Item Bank for the Assessment of Knowledge on Biology in Argentine University Students.

    PubMed

    Cupani, Marcos; Zamparella, Tatiana Castro; Piumatti, Gisella; Vinculado, Grupo

    The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants' test burden. This study aims to develop a bank of items to measure the level of Knowledge on Biology using the Rasch model. The sample consisted of 1219 participants that studied in different faculties of the National University of Cordoba (mean age = 21.85 years, SD = 4.66; 66.9% are women). The items were organized in different forms and into separate subtests, with some common items across subtests. The students were told they had to answer 60 questions of knowledge on biology. Evaluation of Rasch model fit (Zstd >|2.0|), differential item functioning, dimensionality, local independence, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 180 items with good psychometric properties. The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. The contribution of this work is significant in the field of educational assessment in Argentina.

  13. A Comparison of Linking Methods for Estimating National Trends in International Comparative Large-Scale Assessments in the Presence of Cross-national DIF

    ERIC Educational Resources Information Center

    Sachse, Karoline A.; Roppelt, Alexander; Haag, Nicole

    2016-01-01

    Trend estimation in international comparative large-scale assessments relies on measurement invariance between countries. However, cross-national differential item functioning (DIF) has been repeatedly documented. We ran a simulation study using national item parameters, which required trends to be computed separately for each country, to compare…

  14. An Attenuation of the "Normal" Category Effect in Patients with Alzheimer's Disease: A Review and Bootstrap Analysis

    ERIC Educational Resources Information Center

    Moreno-Martinez, F. Javier; Laws, Kieth R.

    2007-01-01

    There is a consensus that Alzheimer's disease (AD) impairs semantic information, with one of the first markers being anomia i.e. an impaired ability to name items. Doubts remain, however, about whether this naming impairment differentially affects items from the living and nonliving knowledge domains. Most studies have reported an impairment for…

  15. Evaluating the Mathematics Interest Inventory Using Item Response Theory: Differential Item Functioning across Gender and Ethnicities

    ERIC Educational Resources Information Center

    Wei, Tianlan; Chesnut, Steven R.; Barnard-Brak, Lucy; Stevens, Tara; Olivárez, Arturo, Jr.

    2014-01-01

    As the United States has begun to lag behind other developed countries in performance on mathematics and science, researchers have sought to explain this with theories of teaching, knowledge, and motivation. We expand this examination by further analyzing a measure of interest that has been linked to student performance in mathematics and…

  16. Small-Sample DIF Estimation Using Log-Linear Smoothing: A SIBTEST Application. Research Report. ETS RR-07-10

    ERIC Educational Resources Information Center

    Puhan, Gautam; Moses, Tim P.; Yu, Lei; Dorans, Neil J.

    2007-01-01

    The purpose of the current study was to examine whether log-linear smoothing of observed score distributions in small samples results in more accurate differential item functioning (DIF) estimates under the simultaneous item bias test (SIBTEST) framework. Data from a teacher certification test were analyzed using White candidates in the reference…

  17. Propensity Score Matching Helps to Understand Sources of DIF and Mathematics Performance Differences of Indonesian, Turkish, Australian, and Dutch Students in PISA

    ERIC Educational Resources Information Center

    Arikan, Serkan; van de Vijver, Fons J. R.; Yagmur, Kutlay

    2018-01-01

    We examined Differential Item Functioning (DIF) and the size of cross-cultural performance differences in the Programme for International Student Assessment (PISA) 2012 mathematics data before and after application of propensity score matching. The mathematics performance of Indonesian, Turkish, Australian, and Dutch students on released items was…

  18. Investigating Causal DIF via Propensity Score Methods

    ERIC Educational Resources Information Center

    Liu, Yan; Zumbo, Bruno D.; Gustafson, Paul; Huang, Yi; Kroc, Edward; Wu, Amery D.

    2016-01-01

    A variety of differential item functioning (DIF) methods have been proposed and used for ensuring that a test is fair to all test takers in a target population in the situations of, for example, a test being translated to other languages. However, once a method flags an item as DIF, it is difficult to conclude that the grouping variable (e.g.,…

  19. Contextual Differential Item Functioning: Examining the Validity of Teaching Self-Efficacy Instruments Using Hierarchical Generalized Linear Modeling

    ERIC Educational Resources Information Center

    Zhao, Jing

    2012-01-01

    The purpose of the study is to further investigate the validity of instruments used for collecting preservice teachers' perceptions of self-efficacy adapting the three-level IRT model described in Cheong's study (2006). The focus of the present study is to investigate whether the polytomously-scored items on the preservice teachers' self-efficacy…

  20. Comparing Future Teachers' Beliefs across Countries: Approximate Measurement Invariance with Bayesian Elastic Constraints for Local Item Dependence and Differential Item Functioning

    ERIC Educational Resources Information Center

    Braeken, Johan; Blömeke, Sigrid

    2016-01-01

    Using data from the international Teacher Education and Development Study: Learning to Teach Mathematics (TEDS-M), the measurement equivalence of teachers' beliefs across countries is investigated for the case of "mathematics-as-a fixed-ability". Measurement equivalence is a crucial topic in all international large-scale assessments and…

  1. The Psychometric Structure of Items Assessing Autogynephilia.

    PubMed

    Hsu, Kevin J; Rosenthal, A M; Bailey, J Michael

    2015-07-01

    Autogynephilia, or paraphilic sexual arousal in a man to the thought or image of himself as a woman, manifests in a variety of different behaviors and fantasies. We examined the psychometric structure of 22 items assessing five known types of autogynephilia by subjecting them to exploratory factor analysis in a sample of 149 autogynephilic men. Results of oblique factor analyses supported the ability to distinguish five group factors with suitable items. Results of hierarchical factor analyses suggest that the five group factors were strongly underlain by a general factor of autogynephilia. Because the general factor accounted for a much greater amount of the total variance of the 22 items than did the group factors, the types of autogynephilia that a man has seem less important than the degree to which he has autogynephilia. However, the five types of autogynephilia remain conceptually useful because meaningful distinctions were found among them, including differential rates of endorsement and differential ability to predict other relevant variables like gender dysphoria. Factor-derived scales and subscales demonstrated good internal consistency reliabilities, and validity, with large differences found between autogynephilic men and heterosexual male controls. Future research should attempt to replicate our findings, which were mostly exploratory.

  2. Conditional Covariance Theory and DETECT for Polytomous Items. Research Report. ETS RR-04-50

    ERIC Educational Resources Information Center

    Zhang, Jinming

    2004-01-01

    This paper extends the theory of conditional covariances to polytomous items. It has been mathematically proven that under some mild conditions, commonly assumed in the analysis of response data, the conditional covariance of two items, dichotomously or polytomously scored, is positive if the two items are dimensionally homogeneous and negative…

  3. Price, promotion, and availability of nutrition information: a descriptive study of a popular fast food chain in New York City.

    PubMed

    Basch, Corey Hannah; Ethan, Danna; Rajan, Sonali

    2013-08-25

    Legislation in NYC requires chain restaurants to post calorie information on menu boards in an effort to help consumers make more informed decisions about food and beverage items they are purchasing. While this is a step in the right direction in light of the current obesity epidemic, there are other issues that warrant attention in a fast food setting, namely the pricing of healthy food options, promotional strategies, and access to comprehensive nutrition information. This study focused on a popular fast-food chain in NYC. The study's aims were threefold: (1) to determine the cost differential between the healthiest meal item on the chain's general menu and meal items available specifically on a reduced cost menu for one dollar (US$1.00); (2) to identify and describe the promotions advertised in the windows of these restaurants, as well as the nutrition content of promoted items; and (3) to ascertain availability of comprehensive nutrition information to consumers within the restaurants. We found the healthiest meal item to be significantly higher in price than less nutritious meal items available for $1.00 (t=146.9, p<.001), with the mean cost differential equal to $4.33 (95% CI: $4.27, $4.39). Window promotions generally advertised less healthful menu items, which may aid in priming customers to purchase these versus more healthful options. Comprehensive nutrition information beyond calorie counts was not readily accessible prior to purchasing. In addition to improving access to comprehensive nutrition information, advertising more of and lowering the prices of nutritious options may encourage consumers to purchase healthier foods in a fast food setting. Additional research in this area is needed in other geographic locations and restaurant chains. 

  4. Negligible impact of differential item functioning between Black and White dialysis patients on the Kidney Disease Quality of Life 36-item short form survey (KDQOLTM-36).

    PubMed

    Peipert, John D; Bentler, Peter; Klicko, Kristi; Hays, Ron D

    2018-05-14

    Black dialysis patients report better health-related quality of life (HRQOL) than White patients, which may be explained if Black and White patients respond systematically differently to HRQOL survey items. We examined differential item functioning (DIF) of the Kidney Disease Quality of Life 36-item (KDQOL TM -36) Burden of Kidney Disease, Symptoms and Problems with Kidney Disease, and Effects of Kidney Disease scales between Black (n = 18,404) and White (n = 21,439) dialysis patients. We fit multiple group confirmatory factor analysis models with increasing invariance: a Configural model (invariant factor structure), a Metric model (invariant factor loadings), and a Scalar model (invariant intercepts). Criteria for invariance included non-significant χ 2 tests, > 0.002 difference in the models' CFI, and > 0.015 difference in RMSEA and SRMR. Next, starting with a fully invariant model, we freed loadings and intercepts item-by-item to determine if DIF impacted estimated KDQOL TM -36 scale means. ΔCFI was 0.006 between the metric and scalar models but was reduced to 0.001 when we freed intercepts for the burdens and symptoms and problems of kidney disease scales. In comparison to standardized means of 0 in the White group, those for the Black group on the Burdens, Symptoms and Problems, and Effects of Kidney Disease scales were 0.218, 0.061, and 0.161, respectively. When loadings and thresholds were released sequentially, differences in means between models ranged between 0.001 and 0.048. Despite some DIF, impacts on KDQOL TM -36 responses appear to be minimal. We conclude that the KDQOL TM -36 is appropriate to make substantive comparisons of HRQOL between Black and White dialysis patients.

  5. Psychometric Validation of the Pulmonary Arterial Hypertension-Symptoms and Impact (PAH-SYMPACT®) Questionnaire: Results of the SYMPHONY Trial.

    PubMed

    Chin, Kelly M; Gomberg-Maitland, Mardi; Channick, Richard N; Cuttica, Michael J; Fischer, Aryeh; Frantz, Robert P; Hunsche, Elke; Kleinman, Leah; McConnell, John W; McLaughlin, Vallerie V; Miller, Chad E; Zamanian, Roham T; Zastrow, Michael S; Badesch, David B

    2018-04-26

    Disease-specific patient-reported outcome (PRO) instruments are important in assessing the impact of disease and treatment. PAH-SYMPACT ® is the first questionnaire for quantifying pulmonary arterial hypertension (PAH) symptoms and impacts developed following the 2009 FDA PRO guidance; previous qualitative research with PAH patients supported its initial content validity. Content finalization and psychometric validation were conducted using data from SYMPHONY, a single-arm, 16-week study with macitentan 10mg in US patients with PAH. Item performance, Rasch, and factor analyses were used to select final item content of the PRO and define its domain structure. Internal consistency, test-retest reliability, known-group and construct validity, sensitivity to change, and influence of oxygen on item performance were evaluated. Data from 278 patients (79% female, mean age 60 years) were analyzed. Following removal of redundant/misfitting items, the final questionnaire has 11 symptom items across 2 domains (cardiopulmonary and cardiovascular symptoms) and 11 impact items across 2 domains (physical and cognitive/emotional impacts). Differential item function analysis confirmed PRO scoring is unaffected by oxygen use. For all 4 domains, internal consistency reliability was high (Cronbach's alpha >0.80) and scores were highly reproducible in stable patients (intra-class correlation coefficient 0.84-0.94). Correlations with CAMPHOR and SF-36 were moderate-to-high ([r]=0.34-0.80). The questionnaire differentiated well between patients with different disease severity levels, and was sensitive to improvements in clinician- and patient-reported disease severity. The PAH-SYMPACT ® is a brief, disease-specific PRO instrument possessing good psychometric properties which can be administered in clinical practice and clinical studies. Copyright © 2018. Published by Elsevier Inc.

  6. Rasch analysis of the participation scale (P-scale): usefulness of the P-scale to a rehabilitation services network.

    PubMed

    Souza, Mariana Angélica Peixoto; Coster, Wendy Jane; Mancini, Marisa Cotta; Dutra, Fabiana Caetano Martins Silva; Kramer, Jessica; Sampaio, Rosana Ferreira

    2017-12-08

    A person's participation is acknowledged as an important outcome of the rehabilitation process. The Participation Scale (P-Scale) is an instrument that was designed to assess the participation of individuals with a health condition or disability. The scale was developed in an effort to better describe the participation of people living in middle-income and low-income countries. The aim of this study was to use Rasch analysis to examine whether the Participation Scale is suitable to assess the perceived ability to take part in participation situations by patients with diverse levels of function. The sample was comprised by 302 patients from a public rehabilitation services network. Participants had orthopaedic or neurological health conditions, were at least 18 years old, and completed the Participation Scale. Rasch analysis was conducted using the Winsteps software. The mean age of all participants was 45.5 years (standard deviation = 14.4), 52% were male, 86% had orthopaedic conditions, and 52% had chronic symptoms. Rasch analysis was performed using a dichotomous rating scale, and only one item showed misfit. Dimensionality analysis supported the existence of only one Rasch dimension. The person separation index was 1.51, and the item separation index was 6.38. Items N2 and N14 showed Differential Item Functioning between men and women. Items N6 and N12 showed Differential Item Functioning between acute and chronic conditions. The item difficulty range was -1.78 to 2.09 logits, while the sample ability range was -2.41 to 4.61 logits. The P-Scale was found to be useful as a screening tool for participation problems reported by patients in a rehabilitation context, despite some issues that should be addressed to further improve the scale.

  7. Price, Promotion, and Availability of Nutrition Information: A Descriptive Study of a Popular Fast Food Chain in New York City

    PubMed Central

    Basch, Corey Hannah; Ethan, Danna; Rajan, Sonali

    2013-01-01

    Legislation in NYC requires chain restaurants to post calorie information on menu boards in an effort to help consumers make more informed decisions about food and beverage items they are purchasing. While this is a step in the right direction in light of the current obesity epidemic, there are other issues that warrant attention in a fast food setting, namely the pricing of healthy food options, promotional strategies, and access to comprehensive nutrition information. This study focused on a popular fast-food chain in NYC. The study’s aims were threefold: (1) to determine the cost differential between the healthiest meal item on the chain’s general menu and meal items available specifically on a reduced cost menu for one dollar (US$1.00); (2) to identify and describe the promotions advertised in the windows of these restaurants, as well as the nutrition content of promoted items; and (3) to ascertain availability of comprehensive nutrition information to consumers within the restaurants. We found the healthiest meal item to be significantly higher in price than less nutritious meal items available for $1.00 (t = 146.9, p < .001), with the mean cost differential equal to $4.33 (95% CI $4.27, $4.39). Window promotions generally advertised less healthful menu items, which may aid in priming customers to purchase these versus more healthful options. Comprehensive nutrition information beyond calorie counts was not readily accessible prior to purchasing. In addition to improving access to comprehensive nutrition information, advertising more of and lowering the prices of nutritious options may encourage consumers to purchase healthier foods in a fast food setting. Additional research in this area is needed in other geographic locations and restaurant chains. PMID:24171876

  8. [Differential item functioning: a bibliometric analysis of journals published in Spanish].

    PubMed

    Guilera, Georgina; Gómez, Juana; Hidalgo, M Dolores

    2006-11-01

    Differential item functioning: a bibliometric analysis of journals published in Spanish. This study aims to provide an overview of scientific productivity with respect to articles published in Spanish on the issue of DIF. The documents included in the study were identified using the Psicodoc database, as well as the Science Citation Index and Social Science Citation Index from the Web of Science. The analyses carried out are focused mainly on presenting the frequencies and percentages of publications with respect to various bibliometric indicators. The results reveal that interest in the issue of DIF has increased, and that the universities are the most productive institutions. The majority of articles have been published in the journal Psicothema.

  9. Are Atypical Things More Popular?

    PubMed

    Berger, Jonah; Packard, Grant

    2018-04-01

    Why do some cultural items become popular? Although some researchers have argued that success is random, we suggest that how similar items are to each other plays an important role. Using natural language processing of thousands of songs, we examined the relationship between lyrical differentiation (i.e., atypicality) and song popularity. Results indicated that the more different a song's lyrics are from its genre, the more popular it becomes. This relationship is weaker in genres where lyrics matter less (e.g., dance) or where differentiation matters less (e.g., pop) and occurs for lyrical topics but not style. The results shed light on cultural dynamics, why things become popular, and the psychological foundations of culture more broadly.

  10. Measuring patient-provider communication skills in Rwanda: Selection, adaptation and assessment of psychometric properties of the Communication Assessment Tool.

    PubMed

    Cubaka, Vincent Kalumire; Schriver, Michael; Vedsted, Peter; Makoul, Gregory; Kallestrup, Per

    2018-04-23

    To identify, adapt and validate a measure for providers' communication and interpersonal skills in Rwanda. After selection, translation and piloting of the measure, structural validity, test-retest reliability, and differential item functioning were assessed. Identification and adaptation: The 14-item Communication Assessment Tool (CAT) was selected and adapted. Content validation found all items highly relevant in the local context except two, which were retained upon understanding the reasoning applied by patients. Eleven providers and 291 patients were involved in the field-testing. Confirmatory factor analysis showed a good fit for the original one factor model. Test-retest reliability assessment revealed a mean quadratic weighted Kappa = 0.81 (range: 0.69-0.89, N = 57). The average proportion of excellent scores was 15.7% (SD: 24.7, range: 9.9-21.8%, N = 180). Differential item functioning was not observed except for item 1, which focuses on greetings, for age groups (p = 0.02, N = 180). The Kinyarwanda version of CAT (K-CAT) is a reliable and valid patient-reported measure of providers' communication and interpersonal skills. K-CAT was validated on nurses and its use on other types of providers may require further validation. K-CAT is expected to be a valuable feedback tool for providers in practice and in training. Copyright © 2018 Elsevier B.V. All rights reserved.

  11. Optimal segmentation and packaging process

    DOEpatents

    Kostelnik, Kevin M.; Meservey, Richard H.; Landon, Mark D.

    1999-01-01

    A process for improving packaging efficiency uses three dimensional, computer simulated models with various optimization algorithms to determine the optimal segmentation process and packaging configurations based on constraints including container limitations. The present invention is applied to a process for decontaminating, decommissioning (D&D), and remediating a nuclear facility involving the segmentation and packaging of contaminated items in waste containers in order to minimize the number of cuts, maximize packaging density, and reduce worker radiation exposure. A three-dimensional, computer simulated, facility model of the contaminated items are created. The contaminated items are differentiated. The optimal location, orientation and sequence of the segmentation and packaging of the contaminated items is determined using the simulated model, the algorithms, and various constraints including container limitations. The cut locations and orientations are transposed to the simulated model. The contaminated items are actually segmented and packaged. The segmentation and packaging may be simulated beforehand. In addition, the contaminated items may be cataloged and recorded.

  12. "A violation of the conditional independence assumption in the two-high-threshold Model of recognition memory": Correction to Chen, Starns, and Rotello (2015).

    PubMed

    2016-01-01

    Reports an error in "A violation of the conditional independence assumption in the two-high-threshold model of recognition memory" by Tina Chen, Jeffrey J. Starns and Caren M. Rotello (Journal of Experimental Psychology: Learning, Memory, and Cognition, 2015[Jul], Vol 41[4], 1215-1222). In the article, Chen et al. compared three models: a continuous signal detection model (SDT), a standard two-high-threshold discrete-state model in which detect states always led to correct responses (2HT), and a full-mapping version of the 2HT model in which detect states could lead to either correct or incorrect responses. After publication, Rani Moran (personal communication, April 21, 2015) identified two errors that impact the reported fit statistics for the Bayesian information criterion (BIC) metric of all models as well as the Akaike information criterion (AIC) results for the full-mapping model. The errors are described in the erratum. (The following abstract of the original article appeared in record 2014-56216-001.) The 2-high-threshold (2HT) model of recognition memory assumes that test items result in distinct internal states: they are either detected or not, and the probability of responding at a particular confidence level that an item is "old" or "new" depends on the state-response mapping parameters. The mapping parameters are independent of the probability that an item yields a particular state (e.g., both strong and weak items that are detected as old have the same probability of producing a highest-confidence "old" response). We tested this conditional independence assumption by presenting nouns 1, 2, or 4 times. To maximize the strength of some items, "superstrong" items were repeated 4 times and encoded in conjunction with pleasantness, imageability, anagram, and survival processing tasks. The 2HT model failed to simultaneously capture the response rate data for all item classes, demonstrating that the data violated the conditional independence assumption. In contrast, a Gaussian signal detection model, which posits that the level of confidence that an item is "old" or "new" is a function of its continuous strength value, provided a good account of the data. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  13. Is the Hospital Anxiety and Depression Scale (HADS) a valid measure in a general population 65-80 years old? A psychometric evaluation study.

    PubMed

    Djukanovic, Ingrid; Carlsson, Jörg; Årestedt, Kristofer

    2017-10-04

    The HADS (Hospital Anxiety and Depression Scale) aims to measure symptoms of anxiety (HADS Anxiety) and depression (HADS Depression). The HADS is widely used but has shown ambiguous results both regarding the factor structure and sex differences in the prevalence of depressive symptoms. There is also a lack of psychometric evaluations of the HADS in non-clinical samples of older people. The aim of the study was to evaluate the factor structure of the HADS in a general population 65-80 years old and to exam possible presence of differential item functioning (DIF) with respect to sex. This study was based on data from a Swedish sample, randomized from the total population in the age group 65-80 years (n = 6659). Confirmatory factor analyses (CFA) were performed to examine the factor structure. Ordinal regression analyses were conducted to detect DIF for sex. Reliability was examined by both ordinal as well as traditional Cronbach's alpha. The CFA showed a two-factor model with cross-loadings for two items (7 and 8) had excellent model fit. Internal consistency was good in both subscales, measured with ordinal and traditional alpha. Floor effects were presented for all items. No indication for meaningful DIF regarding sex was found for any of the subscales. HADS Anxiety and HADS Depression are unidimensional measures with acceptable internal consistency and are invariant with regard to sex. Despite pronounced ceiling effects and cross-loadings for item 7 and 8, the hypothesized two-factor model of HADS can be recommended to assess psychological distress among a general population 65-80 years old.

  14. Basic needs of Universiti Utara Malaysia students

    NASA Astrophysics Data System (ADS)

    Ismail, Suzilah; Ahmad, Yuhaniz; Enn, Chang Tzu

    2017-11-01

    Basic needs are defined as goods or services that are essential for human to live and function. Wants on the other hand, are goods or services that are not necessary but we desire or wish for in order to fulfil our needs. In university, students' needs and wants are not always easily detectable due to different generations of students. The students' desires are also caused by peer interactions, course needs and cultural differences. For example older generations requires typewriter but new generations need a laptop. Many university students have difficulty to differentiate between basic needs and wants. This leads to financial management problem which can affect their academic performance. The purpose of this study is to identify students of Universiti Utara Malaysia (UUM) basic needs. Based on past studies conducted by 3 universities, 12 items related to students' basic needs were identified. However, only 9 items are considered relevant to UUM students. A study on a focus group consist of 18 students from different background was conducted to validate the 9 items of basic needs by using in depth interviews. The findings indicated food, clothing, books, stationery, photocopying, printing & binding, information & communication technology (ICT), mobile phone bills, transportation and others (which includes toiletries, groceries, sport, & entertainment) as the 9 items. The findings also revealed that student basic needs for ICT are not only laptop and printer but also a smartphone. As for clothing, requirements are different according to programs the student majors in. A business student need full business attire, law students need a proper robe for moot courts and curriculum activities require the students to be in uniform. These are basic needs and not desires or wants.

  15. Detection of Item Preknowledge Using Likelihood Ratio Test and Score Test

    ERIC Educational Resources Information Center

    Sinharay, Sandip

    2017-01-01

    An increasing concern of producers of educational assessments is fraudulent behavior during the assessment (van der Linden, 2009). Benefiting from item preknowledge (e.g., Eckerly, 2017; McLeod, Lewis, & Thissen, 2003) is one type of fraudulent behavior. This article suggests two new test statistics for detecting individuals who may have…

  16. Measurement equivalence and differential item functioning in family psychology.

    PubMed

    Bingenheimer, Jeffrey B; Raudenbush, Stephen W; Leventhal, Tama; Brooks-Gunn, Jeanne

    2005-09-01

    Several hypotheses in family psychology involve comparisons of sociocultural groups. Yet the potential for cross-cultural inequivalence in widely used psychological measurement instruments threatens the validity of inferences about group differences. Methods for dealing with these issues have been developed via the framework of item response theory. These methods deal with an important type of measurement inequivalence, called differential item functioning (DIF). The authors introduce DIF analytic methods, linking them to a well-established framework for conceptualizing cross-cultural measurement equivalence in psychology (C.H. Hui and H.C. Triandis, 1985). They illustrate the use of DIF methods using data from the Project on Human Development in Chicago Neighborhoods (PHDCN). Focusing on the Caregiver Warmth and Environmental Organization scales from the PHDCN's adaptation of the Home Observation for Measurement of the Environment Inventory, the authors obtain results that exemplify the range of outcomes that may result when these methods are applied to psychological measurement instruments. (c) 2005 APA, all rights reserved

  17. [Evaluation of the factorial and metric equivalence of the Sexual Assertiveness Scale (SAS) by sex].

    PubMed

    Sierra, Juan Carlos; Santos-Iglesias, Pablo; Vallejo-Medina, Pablo

    2012-05-01

    Sexual assertiveness refers to the ability to initiate sexual activity, refuse unwanted sexual activity, and use contraceptive methods to avoid sexually transmitted diseases, developing healthy sexual behaviors. The Sexual Assertiveness Scale (SAS) assesses these three dimensions. The purpose of this study is to evaluate, using structural equation modeling and differential item functioning, the equivalence of the scale between men and women. Standard scores are also provided. A total of 4,034 participants from 21 Spanish provinces took part in the study. Quota sampling method was used. Results indicate a strict equivalent dimensionality of the Sexual Assertiveness Scale across sexes. One item was flagged by differential item functioning, although it does not affect the scale. Therefore, there is no significant bias in the scale when comparing across sexes. Standard scores show similar Initiation assertiveness scores for men and women, and higher scores on Refusal and Sexually Transmitted Disease Prevention for women. This scale can be used on men and women with sufficient psychometric guarantees.

  18. An in-depth psychometric analysis of the Connor-Davidson Resilience Scale: calibration with Rasch-Andrich model.

    PubMed

    Arias González, Víctor B; Crespo Sierra, María Teresa; Arias Martínez, Benito; Martínez-Molina, Agustín; Ponce, Fernando P

    2015-09-23

    The Connor-Davidson Resilience Scale (CD-RISC) is inarguably one of the best-known instruments in the field of resilience assessment. However, the criteria for the psychometric quality of the instrument were based only on classical test theory. The aim of this paper has focused on the calibration of the CD-RISC with a nonclinical sample of 444 adults using the Rasch-Andrich Rating Scale Model, in order to clarify its structure and analyze its psychometric properties at the level of item. Two items showed misfit to the model and were eliminated. The remaining 22 items form basically a unidimensional scale. The CD-RISC has good psychometric properties. The fit of both the items and the persons to the Rasch model was good, and the response categories were functioning properly. Two of the items showed differential item functioning. The CD-RISC has an obvious ceiling effect, which suggests to include more difficult items in future versions of the scale.

  19. Standardized UXO Technology Demonstration Site Scoring Record NO. 934 Technology Type/Platform: EM61 MKII/Towed

    DTIC Science & Technology

    2009-07-01

    nonferrous metallic objects. The applicability of the instrument for ordnance and explosives (OE) detection has been widely demonstrated at sites...was cleared of all metallic items. This clearing of the metallic anomalies from the 2 acre Active Response Demonstration Site was broken into three...with their Multiple Towed Array Detection System (MTADS). This system is known for its effectiveness and ability to detect metallic items. Once the

  20. Computer-adaptive test to measure community reintegration of Veterans.

    PubMed

    Resnik, Linda; Tian, Feng; Ni, Pengsheng; Jette, Alan

    2012-01-01

    The Community Reintegration of Injured Service Members (CRIS) measure consists of three scales measuring extent of, perceived limitations in, and satisfaction with community reintegration. Length of the CRIS may be a barrier to its widespread use. Using item response theory (IRT) and computer-adaptive test (CAT) methodologies, this study developed and evaluated a briefer community reintegration measure called the CRIS-CAT. Large item banks for each CRIS scale were constructed. A convenience sample of 517 Veterans responded to all items. Exploratory and confirmatory factor analyses (CFAs) were used to identify the dimensionality within each domain, and IRT methods were used to calibrate items. Accuracy and precision of CATs of different lengths were compared with the full-item bank, and data were examined for differential item functioning (DIF). CFAs supported unidimensionality of scales. Acceptable item fit statistics were found for final models. Accuracy of 10-, 15-, 20-, and variable-item CATs for all three scales was 0.88 or above. CAT precision increased with number of items administered and decreased at the upper ranges of each scale. Three items exhibited moderate DIF by sex. The CRIS-CAT demonstrated promising measurement properties and is recommended for use in community reintegration assessment.

  1. Scientific literacy: Factor structure and gender differences

    NASA Astrophysics Data System (ADS)

    Manhart, James Joseph

    The purpose of this study was to investigate the factor structure of scientific literacy and to document any gender differences with respect to each factor. Participants included 1139 students (574 females, 565 males) in grades 9 through 12 who were taking a science class at one of four Midwestern high schools. Based on National Science Education Standards, a 100 item multiple-choice test was constructed to assess scientific literacy. Confirmatory factor analysis of item parcels suggested a three factor model was the best way to explain the data resulting from the administration of this test. The factors were labeled constructs of science, abilities necessary to do scientific inquiry, and social aspects of science. Gender differences with respect to these factors were examined using analysis of variance procedures. Because differential enrollment in science classes could cause gender differences in grades 11 and 12, parallel analyses were conducted on the grades 9 and 10 subsample and the grades 11 and 12 subsample. However, the results of the two analyses were similar. The most consistent gender difference observed was that females performed better than males on the social aspects of science factor. Males tended to perform better than females on the constructs of science factor, although no consistent gender difference was noted for items dealing with life science. With respect to the abilities necessary to do scientific inquiry factor, females tended to perform better than males in grades 9 and 10, while no consistent gender difference was observed in grades 11 and 12. Gender differences were also examined using the Mantel-Haenszel procedure to flag individual items that functioned differently for females and males of the same ability. Twelve items were flagged for grades 9 and 10 (8 in favor of females, 4 in favor of males). Fourteen items were flagged for grades 11 and 12 (7 in favor of females, 7 in favor of males). All of the flagged items exhibited only small to moderate differential item functioning (DIF). Only three items were similarly flagged in both subsamples, one item from each factor.

  2. What Does Ipsilateral Delay Activity Reflect? Inferences from Slow Potentials in a Lateralized Visual Working Memory Task

    ERIC Educational Resources Information Center

    Arend, Anna M.; Zimmer, Hubert D.

    2011-01-01

    In the lateralized change detection task, two item arrays are presented, one on each side of the display. Participants have to remember the items in the relevant hemifield and ignore the items in the irrelevant hemifield. A difference wave between contralateral and ipsilateral slow potentials with respect to the relevant items, the contralateral…

  3. Is the Factor Observed in Investigations on the Item-Position Effect Actually the Difficulty Factor?

    ERIC Educational Resources Information Center

    Schweizer, Karl; Troche, Stefan

    2018-01-01

    In confirmatory factor analysis quite similar models of measurement serve the detection of the difficulty factor and the factor due to the item-position effect. The item-position effect refers to the increasing dependency among the responses to successively presented items of a test whereas the difficulty factor is ascribed to the wide range of…

  4. Three Classes of Nonparametric Differential Step Functioning Effect Estimators

    ERIC Educational Resources Information Center

    Penfield, Randall D.

    2008-01-01

    The examination of measurement invariance in polytomous items is complicated by the possibility that the magnitude and sign of lack of invariance may vary across the steps underlying the set of polytomous response options, a concept referred to as differential step functioning (DSF). This article describes three classes of nonparametric DSF effect…

  5. Social Concepts and Judgments: A Semantic Differential Analysis of the Concepts Feminist, Man, and Woman

    ERIC Educational Resources Information Center

    Pierce, W. David; Sydie, R. A.; Stratkotter, Rainer

    2003-01-01

    Male and female participants (N = 274) made judgments about the social concepts of "feminist," "man," and "woman" on 63 semantic differential items. Factor analysis identified three basic dimensions termed evaluative, potency, and activity as well as two secondary factors called expressiveness and sexuality. Results for the evaluative dimension…

  6. Examining Sources of Gender DIF in Mathematics Assessments Using a Confirmatory Multidimensional Model Approach

    ERIC Educational Resources Information Center

    Mendes-Barnett, Sharon; Ercikan, Kadriye

    2006-01-01

    This study contributes to understanding sources of gender differential item functioning (DIF) on mathematics tests. This study focused on identifying sources of DIF and differential bundle functioning for boys and girls on the British Columbia Principles of Mathematics Exam (Grade 12) using a confirmatory SIBTEST approach based on a…

  7. Sex Differential Item Functioning in the Inventory of Early Development III Social-Emotional Skills

    ERIC Educational Resources Information Center

    Beaver, Jessica L.; French, Brian F.; Finch, W. Holmes; Ullrich-French, Sarah C.

    2014-01-01

    Social-emotional (SE) skills in the early developmental years of children influence outcomes in psychological, behavioral, and learning domains. The adult ratings of a child's SE skills can be influenced by sex stereotypes. These rating differences could lead to differential conclusions about developmental progress or risk. To ensure that…

  8. Comparing the Lexical Features of EAP Students' Essays by Prompt and Rating

    ERIC Educational Resources Information Center

    Lavallée, Maxime; McDonough, Kim

    2015-01-01

    Previous research has shown that high frequency lexical items, such as AWL words and formulaic expressions, may differentiate between texts written by expert and novice writers (Chen & Baker, 2010; Hancioglu, 2009), and that lexical features related to breadth, depth, and accessibility differentiate among texts from L2 writers of different…

  9. Child-rearing in the context of childhood cancer: perspectives of parents and professionals.

    PubMed

    Long, Kristin A; Keeley, Lauren; Reiter-Purtill, Jennifer; Vannatta, Kathryn; Gerhardt, Cynthia A; Noll, Robert B

    2014-02-01

    Elevated distress has been well documented among parents of children with cancer. Family systems theories suggest that cancer-related stressors and parental distress have the potential to affect child-rearing practices, but this topic has received limited empirical attention. The present work examined self-reported child-rearing practices among mothers and fathers of children with cancer and matched comparisons. Medical and psychosocial professionals with expertise in pediatric oncology selected items from the Child-Rearing Practices Report (CRPR) likely to differentiate parents of children with cancer from matched comparison parents. Then, responses on these targeted items were compared between parents of children with cancer (94 mothers, 67 fathers) and matched comparisons (98 mothers, 75 fathers). Effect sizes of between-group differences were compared for mothers versus fathers. Pediatric oncology healthcare providers predicted that 14 items would differentiate child-rearing practices of parents of children with cancer from parents of typically developing children. Differences emerged on six of the 14 CRPR items. Parents of children with cancer reported higher levels of spoiling and concern about their child's health and development than comparison parents. Items assessing overprotection and emotional responsiveness did not distinguish the two groups of parents. The effect size for the group difference between mothers in the cancer versus comparison groups was significantly greater than that for fathers on one item related to worry about the child's health. Parents of children with cancer report differences in some, but not all, domains of child-rearing, as predicted by healthcare professionals. © 2013 Wiley Periodicals, Inc.

  10. Measuring Advance Care Planning: Optimizing the Advance Care Planning Engagement Survey.

    PubMed

    Sudore, Rebecca L; Heyland, Daren K; Barnes, Deborah E; Howard, Michelle; Fassbender, Konrad; Robinson, Carole A; Boscardin, John; You, John J

    2017-04-01

    A validated 82-item Advance Care Planning (ACP) Engagement Survey measures a broad range of behaviors. However, concise surveys are needed. The objective of this study was to validate shorter versions of the survey. The survey included 57 process (e.g., readiness) and 25 action items (e.g., discussions). For item reduction, we systematically eliminated questions based on face validity, item nonresponse, redundancy, ceiling effects, and factor analysis. We assessed internal consistency (Cronbach's alpha) and construct validity with cross-sectional correlations and the ability of the progressively shorter survey versions to detect change one week after exposure to an ACP intervention (Pearson correlation coefficients). Five hundred one participants (four Canadian and three US sites) were included in item reduction (mean age 69 years [±10], 41% nonwhite). Because of high correlations between readiness and action items, all action items were removed. Because of high correlations and ceiling effects, two process items were removed. Successive factor analysis then created 55-, 34-, 15-, nine-, and four-item versions; 664 participants (from three US ACP clinical trials) were included in validity analysis (age 65 years [±8], 72% nonwhite, 34% Spanish speaking). Cronbach's alphas were high for all versions (four items 0.84-55 items 0.97). Compared with the original survey, cross-sectional correlations were high (four items 0.85; 55 items 0.97) as were delta correlations (four items 0.68; 55 items 0.93). Shorter versions of the ACP Engagement Survey are valid, internally consistent, and able to detect change across a broad range of ACP behaviors for English and Spanish speakers. Shorter ACP surveys can efficiently measure broad ACP behaviors in research and clinical settings. Published by Elsevier Inc.

  11. Nickel and cobalt release from jewellery and metal clothing items in Korea.

    PubMed

    Cheong, Seung Hyun; Choi, You Won; Choi, Hae Young; Byun, Ji Yeon

    2014-01-01

    In Korea, the prevalence of nickel allergy has shown a sharply increasing trend. Cobalt contact allergy is often associated with concomitant reactions to nickel, and is more common in Korea than in western countries. The aim of the present study was to investigate the prevalence of items that release nickel and cobalt on the Korean market. A total of 471 items that included 193 branded jewellery, 202 non-branded jewellery and 76 metal clothing items were sampled and studied with a dimethylglyoxime (DMG) test and a cobalt spot test to detect nickel and cobalt release, respectively. Nickel release was detected in 47.8% of the tested items. The positive rates in the DMG test were 12.4% for the branded jewellery, 70.8% for the non-branded jewellery, and 76.3% for the metal clothing items. Cobalt release was found in 6.2% of items. Among the types of jewellery, belts and hair pins showed higher positive rates in both the DMG test and the cobalt spot test. Our study shows that the prevalence of items that release nickel or cobalt among jewellery and metal clothing items is high in Korea. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  12. DIF Analysis across Genders for Reading Comprehension Part of English Language Achievement Exam as a Foreign Language

    ERIC Educational Resources Information Center

    Ögretmen, Tuncay

    2015-01-01

    The purpose of this study is to carry out differential item functioning (DIF) analysis for content areas of a reading comprehension subtest using four area indices within Item Response Theory (IRT) framework. The differences in the magnitudes of the area indices were compared based on the subject areas. The DIF analysis was carried out across…

  13. The Langer-Improved Wald Test for DIF Testing with Multiple Groups: Evaluation and Comparison to Two-Group IRT

    ERIC Educational Resources Information Center

    Woods, Carol M.; Cai, Li; Wang, Mian

    2013-01-01

    Differential item functioning (DIF) occurs when the probability of responding in a particular category to an item differs for members of different groups who are matched on the construct being measured. The identification of DIF is important for valid measurement. This research evaluates an improved version of Lord's chi [superscript 2]…

  14. An Analysis of Differential Response Patterns on the Peabody Picture Vocabulary Test-IIIB in Struggling Adult Readers and Third-Grade Children

    ERIC Educational Resources Information Center

    Pae, Hye K.; Greenberg, Daphne; Williams, Rihana S.

    2012-01-01

    This study examines the Peabody Picture Vocabulary Test-IIIB (PPVT-IIIB) performance of 130 adults identified as struggling readers, in comparison to 175 third-grade children. Response patterns to the items on the PPVT-IIIB by these two groups were investigated, focusing on items, semantic categories, and lexical features, including word length,…

  15. Conservativeness in Rejection of the Null Hypothesis when Using the Continuity Correction in the MH Chi-Square Test in DIF Applications

    ERIC Educational Resources Information Center

    Paek, Insu

    2010-01-01

    Conservative bias in rejection of a null hypothesis from using the continuity correction in the Mantel-Haenszel (MH) procedure was examined through simulation in a differential item functioning (DIF) investigation context in which statistical testing uses a prespecified level [alpha] for the decision on an item with respect to DIF. The standard MH…

  16. Differential Weight Procedure of the Conditional P.D.F. Approach for Estimating the Operating Characteristics of Discrete Item Responses.

    ERIC Educational Resources Information Center

    Samejima, Fumiko

    A method is proposed that increases the accuracies of estimation of the operating characteristics of discrete item responses, especially when the true operating characteristic is represented by a steep curve, and also at the lower and upper ends of the ability distribution where the estimation tends to be inaccurate because of the smaller number…

  17. The Consequences of Differentiation in Episodic Memory: Similarity and the Strength Based Mirror Effect

    ERIC Educational Resources Information Center

    Criss, Amy H.

    2006-01-01

    When items on one list receive more encoding than items on another list, the improvement in performance usually manifests as an increase in the hit rate and a decrease in the false alarm rate (FAR). A common account of this strength based mirror effect is that participants adopt a more strict criterion following a strongly than weakly encoded list…

  18. Evaluating the Comparability of Paper-and-Pencil and Computerized Versions of a Large-Scale Certification Test. Research Report. ETS RR-05-21

    ERIC Educational Resources Information Center

    Puhan, Gautam; Boughton, Keith A.; Kim, Sooyeon

    2005-01-01

    The study evaluated the comparability of two versions of a teacher certification test: a paper-and-pencil test (PPT) and computer-based test (CBT). Standardized mean difference (SMD) and differential item functioning (DIF) analyses were used as measures of comparability at the test and item levels, respectively. Results indicated that effect sizes…

  19. Construct validity of the items on the Stroke Specific Quality of Life (SS-QOL) questionnaire that evaluate the participation component of the International Classification of Functioning, Disability and Health.

    PubMed

    Silva, Soraia Micaela; Corrêa, Fernanda Ishida; Pereira, Gabriela Santos; Faria, Christina Danielli Coelho de Morais; Corrêa, João Carlos Ferrari

    2018-01-01

    Analyze the construct validity and internal consistency of the Stroke Specific Quality of Life (SS-QOL) items that address the participation component of the ICF as well as analyze the ceiling and floor effects. One hundred subjects were analyzed: 85 community-dwelling and 15 institutionalized individuals. The analysis of construct validity was performed using classic psychometrics: (1) the comparison of known groups (individuals without restriction to participation vs. those with restriction to participation) using the Mann-Whitney test and (2) convergent validity - correlation between the scores on the SS-QOL items that address participation and the subscale scores of measures used to evaluate the similar constructs and concepts [the Short-Form Health Survey (SF-36), Functional Independence Measure (FIM) and grip strength test]. Spearman's correlation coefficients were calculated for this analysis. Cronbach's α was used for the analysis of internal consistency and both the ceiling and floor effects were analyzed. The level of significance for all analyses was α = 0.05. The a priori hypotheses regarding construct validity were partially demonstrated, as only five of the eight domains exhibited positive moderate to strong correlations (r > 0.40) with measures that address constructs similar to those addressed on the SS-QOL questionnaire. The items demonstrated adequate internal consistency and are capable of differentiating individuals with and without restriction to participation. The ceiling and floor effects were considered adequate for the total SS-QOL score, but beyond acceptable standards for some domains. The 26 items of the SS-QOL questionnaire measure a multidimensional construct and therefore do not only address participation. However, the items demonstrated adequate internal consistency and are capable of differentiating individuals with and without restriction to participation. Implications for rehabilitation The 26 items of the SS-QOL questionnaire demonstrated adequate internal consistency and are capable of differentiating individuals with and without restriction to participation. The present findings can guide healthcare professionals regarding the selection of an assessment tool for the evaluation of post-stroke participation. The findings can lead to consistent and standardization evaluations, which facilitates comparisons and discussion on functional health and social participation after stroke.

  20. Computerized Adaptive Testing Provides Reliable and Efficient Depression Measurement Using the CES-D Scale

    PubMed Central

    2017-01-01

    Background The Center for Epidemiologic Studies Depression Scale (CES-D) is a measure of depressive symptomatology which is widely used internationally. Though previous attempts were made to shorten the CES-D scale, few have attempted to develop a Computerized Adaptive Test (CAT) version for the CES-D. Objective The aim of this study was to provide evidence on the efficiency and accuracy of the CES-D when administered using CAT using an American sample group. Methods We obtained a sample of 2060 responses to the CESD-D from US participants using the myPersonality application. The average age of participants was 26 years (range 19-77). We randomly split the sample into two groups to evaluate and validate the psychometric models. We used evaluation group data (n=1018) to assess dimensionality with both confirmatory factor and Mokken analysis. We conducted further psychometric assessments using item response theory (IRT), including assessments of item and scale fit to Samejima’s graded response model (GRM), local dependency and differential item functioning. We subsequently conducted two CAT simulations to evaluate the CES-D CAT using the validation group (n=1042). Results Initial CFA results indicated a poor fit to the model and Mokken analysis revealed 3 items which did not conform to the same dimension as the rest of the items. We removed the 3 items and fit the remaining 17 items to GRM. We found no evidence of differential item functioning (DIF) between age and gender groups. Estimates of the level of CES-D trait score provided by the simulated CAT algorithm and the original CES-D trait score derived from original scale were correlated highly. The second CAT simulation conducted using real participant data demonstrated higher precision at the higher levels of depression spectrum. Conclusions Depression assessments using the CES-D CAT can be more accurate and efficient than those made using the fixed-length assessment. PMID:28931496

Top