Item Purification Does Not Always Improve DIF Detection: A Counterexample with Angoff's Delta Plot
ERIC Educational Resources Information Center
Magis, David; Facon, Bruno
2013-01-01
Item purification is an iterative process that is often advocated as improving the identification of items affected by differential item functioning (DIF). With test-score-based DIF detection methods, item purification iteratively removes the items currently flagged as DIF from the test scores to get purified sets of items, unaffected by DIF. The…
Explaining Crossing DIF in Polytomous Items Using Differential Step Functioning Effects
ERIC Educational Resources Information Center
Penfield, Randall D.
2010-01-01
Crossing, or intersecting, differential item functioning (DIF) is a form of nonuniform DIF that exists when the sign of the between-group difference in expected item performance changes across the latent trait continuum. The presence of crossing DIF presents a problem for many statistics developed for evaluating DIF because positive and negative…
Assessment of Differential Item Functioning in Testlet-Based Items Using the Rasch Testlet Model
ERIC Educational Resources Information Center
Wang, Wen-Chung; Wilson, Mark
2005-01-01
This study presents a procedure for detecting differential item functioning (DIF) for dichotomous and polytomous items in testlet-based tests, whereby DIF is taken into account by adding DIF parameters into the Rasch testlet model. Simulations were conducted to assess recovery of the DIF and other parameters. Two independent variables, test type…
Detection of Differential Item Functioning Using the Lasso Approach
ERIC Educational Resources Information Center
Magis, David; Tuerlinckx, Francis; De Boeck, Paul
2015-01-01
This article proposes a novel approach to detect differential item functioning (DIF) among dichotomously scored items. Unlike standard DIF methods that perform an item-by-item analysis, we propose the "LR lasso DIF method": logistic regression (LR) model is formulated for all item responses. The model contains item-specific intercepts,…
NASA Astrophysics Data System (ADS)
Roth, Wolff-Michael; Oliveri, Maria Elena; Dallie Sandilands, Debra; Lyons-Thomas, Juliette; Ercikan, Kadriye
2013-03-01
Even if national and international assessments are designed to be comparable, subsequent psychometric analyses often reveal differential item functioning (DIF). Central to achieving comparability is to examine the presence of DIF, and if DIF is found, to investigate its sources to ensure differentially functioning items that do not lead to bias. In this study, sources of DIF were examined using think-aloud protocols. The think-aloud protocols of expert reviewers were conducted for comparing the English and French versions of 40 items previously identified as DIF (N = 20) and non-DIF (N = 20). Three highly trained and experienced experts in verifying and accepting/rejecting multi-lingual versions of curriculum and testing materials for government purposes participated in this study. Although there is a considerable amount of agreement in the identification of differentially functioning items, experts do not consistently identify and distinguish DIF and non-DIF items. Our analyses of the think-aloud protocols identified particular linguistic, general pedagogical, content-related, and cognitive factors related to sources of DIF. Implications are provided for the process of arriving at the identification of DIF, prior to the actual administration of tests at national and international levels.
Aggregating Polytomous DIF Results over Multiple Test Administrations
ERIC Educational Resources Information Center
Zwick, Rebecca; Ye, Lei; Isham, Steven
2018-01-01
In typical differential item functioning (DIF) assessments, an item's DIF status is not influenced by its status in previous test administrations. An item that has shown DIF at multiple administrations may be treated the same way as an item that has shown DIF in only the most recent administration. Therefore, much useful information about the…
ERIC Educational Resources Information Center
Fukuhara, Hirotaka; Kamata, Akihito
2011-01-01
A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into…
An NCME Instructional Module on Latent DIF Analysis Using Mixture Item Response Models
ERIC Educational Resources Information Center
Cho, Sun-Joo; Suh, Youngsuk; Lee, Woo-yeol
2016-01-01
The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called…
Applying a Mixed Methods Framework to Differential Item Function Analyses
ERIC Educational Resources Information Center
Hitchcock, John H.; Johanson, George A.
2015-01-01
Understanding the reason(s) for Differential Item Functioning (DIF) in the context of measurement is difficult. Although identifying potential DIF items is typically a statistical endeavor, understanding the reasons for DIF (and item repair or replacement) might require investigations that can be informed by qualitative work. Such work is…
ERIC Educational Resources Information Center
Penfield, Randall D.; Alvarez, Karina; Lee, Okhee
2009-01-01
The assessment of differential item functioning (DIF) in polytomous items addresses between-group differences in measurement properties at the item level, but typically does not inform which score levels may be involved in the DIF effect. The framework of differential step functioning (DSF) addresses this issue by examining between-group…
Hagquist, Curt; Andrich, David
2017-09-19
Rasch analysis with a focus on Differential Item Functioning (DIF) is increasingly used for examination of psychometric properties of health outcome measures. To take account of DIF in order to retain precision of measurement, split of DIF-items into separate sample specific items has become a frequently used technique. The purpose of the paper is to present and summarise recent advances of analysis of DIF in a unified methodology. In particular, the paper focuses on the use of analysis of variance (ANOVA) as a method to simultaneously detect uniform and non-uniform DIF, the need to distinguish between real and artificial DIF and the trade-off between reliability and validity. An illustrative example from health research is used to demonstrate how DIF, in this case between genders, can be identified, quantified and under specific circumstances accounted for using the Rasch model. Rasch analyses of DIF were conducted of a composite measure of psychosomatic problems using Swedish data from the Health Behaviour in School-aged Children study for grade 9 students collected during the 1985-2014 time periods. The procedures demonstrate how DIF can be identified efficiently by ANOVA of residuals, and how the magnitude of DIF can be quantified and potentially accounted for by resolving items according to identifiable groups and using principles of test equating on the resolved items. The results of the analysis also show that the real DIF in some items does affect person measurement estimates. Firstly, in order to distinguish between real and artificial DIF, the items showing DIF initially should not be resolved simultaneously but sequentially. Secondly, while resolving instead of deleting a DIF item may retain reliability, both options may affect the content validity negatively. Resolving items with DIF is not justified if the source of the DIF is relevant for the content of the variable; then resolving DIF may deteriorate the validity of the instrument. Generally, decisions on resolving items to deal with DIF should also rely on external information.
ERIC Educational Resources Information Center
Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.
2016-01-01
In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…
ERIC Educational Resources Information Center
Bilir, Mustafa Kuzey
2009-01-01
This study uses a new psychometric model (mixture item response theory-MIMIC model) that simultaneously estimates differential item functioning (DIF) across manifest groups and latent classes. Current DIF detection methods investigate DIF from only one side, either across manifest groups (e.g., gender, ethnicity, etc.), or across latent classes…
Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald
2006-11-01
We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.
ERIC Educational Resources Information Center
Penfield, Randall D.; Algina, James
2006-01-01
One approach to measuring unsigned differential test functioning is to estimate the variance of the differential item functioning (DIF) effect across the items of the test. This article proposes two estimators of the DIF effect variance for tests containing dichotomous and polytomous items. The proposed estimators are direct extensions of the…
Teresi, Jeanne A.; Ocepek-Welikson, Katja; Kleinman, Marjorie; Ramirez, Mildred; Kim, Giyeon
2017-01-01
Short form measures from the Patient Reported Outcomes Measurement Information System® (PROMIS®) are used widely. The present study was among the first to examine differential item functioning (DIF) in the PROMIS Depression short form scales in a sample of over 5000 racially/ethnically diverse patients with cancer. DIF analyses were conducted across different racial/ethnic, educational, age, gender and language groups. Methods DIF hypotheses, generated by content experts, informed the evaluation of the DIF analyses. The graded item response theory (IRT) model was used to evaluate the five-level ordinal items. The primary tests of DIF were Wald tests; sensitivity analyses were conducted using the IRT ordinal logistic regression procedure. Magnitude was evaluated using expected item score functions, and the non-compensatory differential item functioning (NCDIF) and T1 indexes, both based on group differences in the item curves. Aggregate impact was evaluated with expected scale score (test) response functions; individual impact was assessed through examination of differences in DIF adjusted and unadjusted depression estimates. Results Many items evidenced DIF; however, only a few had slightly elevated magnitude. No items evidenced salient DIF with respect to NCDIF and the scale-level impact was minimal for all group comparisons. The following short form items might be targeted for further study because they were also hypothesized to evidence DIF. One item showed slightly higher magnitude of DIF for age: nothing to look forward to; conditional on depression, this item was more likely to be endorsed in the depressed direction by individuals in older groups as contrasted with the cohort aged 21 to 49. This item was also hypothesized to show age DIF. Only one item (failure) showed DIF of slightly higher magnitude (just above threshold) for Whites vs. Asians/Pacific Islanders in the direction of higher likelihood of endorsement for Asians/Pacific Islanders. This item was also hypothesized to show DIF for minority groups. The impact of DIF was negligible. Conditional on depression, the items, worthless and hopeless were more likely to be endorsed in the depressed direction by respondents with less than high school education vs. those with a graduate degree; the magnitude of DIF was slightly above the T1 threshold, but not that of NCDIF. These items were also hypothesized to show DIF in the direction of more feelings of worthlessness by groups with lower education. While the magnitude and aggregate impact of DIF was small, in a few instances, individual impact was observed. Information provided was relatively high, particularly in the middle upper (depressed) tail of the distribution. Reliability estimates were high (> 0.90) across all studied groups, regardless of estimation method. Conclusions This was the first study to evaluate measurement equivalence of the PROMIS Depression short forms across large samples of ethnically diverse groups. There were few items with DIF, and none of high magnitude, thus supporting the use of PROMIS Depression short form measures across such groups. These results could be informative for those using the short forms in minority populations or clinicians evaluating individuals with the depression short forms. PMID:28553573
ERIC Educational Resources Information Center
French, Brian F.; Maller, Susan J.
2007-01-01
Two unresolved implementation issues with logistic regression (LR) for differential item functioning (DIF) detection include ability purification and effect size use. Purification is suggested to control inaccuracies in DIF detection as a result of DIF items in the ability estimate. Additionally, effect size use may be beneficial in controlling…
Testing for Nonuniform Differential Item Functioning with Multiple Indicator Multiple Cause Models
ERIC Educational Resources Information Center
Woods, Carol M.; Grimm, Kevin J.
2011-01-01
In extant literature, multiple indicator multiple cause (MIMIC) models have been presented for identifying items that display uniform differential item functioning (DIF) only, not nonuniform DIF. This article addresses, for apparently the first time, the use of MIMIC models for testing both uniform and nonuniform DIF with categorical indicators. A…
ERIC Educational Resources Information Center
Keiffer, Elizabeth Ann
2011-01-01
A differential item functioning (DIF) simulation study was conducted to explore the type and level of impact that contamination had on type I error and power rates in DIF analyses when the suspect item favored the same or opposite group as the DIF items in the matching subtest. Type I error and power rates were displayed separately for the…
ERIC Educational Resources Information Center
Chalmers, R. Philip; Counsell, Alyssa; Flora, David B.
2016-01-01
Differential test functioning, or DTF, occurs when one or more items in a test demonstrate differential item functioning (DIF) and the aggregate of these effects are witnessed at the test level. In many applications, DTF can be more important than DIF when the overall effects of DIF at the test level can be quantified. However, optimal statistical…
ERIC Educational Resources Information Center
Zhang, Yanling; Dorans, Neil J.; Matthews-López, Joy L.
2005-01-01
Statistical procedures for detecting differential item functioning (DIF) are often used as an initial step to screen items for construct irrelevant variance. This research applies a DIF dissection method and a two-way classification scheme to SAT Reasoning Test™ verbal section data and explores the effects of deleting sizable DIF items on reported…
Item-focussed Trees for the Identification of Items in Differential Item Functioning.
Tutz, Gerhard; Berger, Moritz
2016-09-01
A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method.
Differential Item Functioning Analysis Using Rasch Item Information Functions
ERIC Educational Resources Information Center
Wyse, Adam E.; Mapuranga, Raymond
2009-01-01
Differential item functioning (DIF) analysis is a statistical technique used for ensuring the equity and fairness of educational assessments. This study formulates a new DIF analysis method using the information similarity index (ISI). ISI compares item information functions when data fits the Rasch model. Through simulations and an international…
Detection of Uniform and Nonuniform Differential Item Functioning by Item-Focused Trees
ERIC Educational Resources Information Center
Berger, Moritz; Tutz, Gerhard
2016-01-01
Detection of differential item functioning (DIF) by use of the logistic modeling approach has a long tradition. One big advantage of the approach is that it can be used to investigate nonuniform (NUDIF) as well as uniform DIF (UDIF). The classical approach allows one to detect DIF by distinguishing between multiple groups. We propose an…
Lix, Lisa M; Wu, Xiuyun; Hopman, Wilma; Mayo, Nancy; Sajobi, Tolulope T; Liu, Juxin; Prior, Jerilynn C; Papaioannou, Alexandra; Josse, Robert G; Towheed, Tanveer E; Davison, K Shawn; Sawatzky, Richard
2016-01-01
Self-reported health status measures, like the Short Form 36-item Health Survey (SF-36), can provide rich information about the overall health of a population and its components, such as physical, mental, and social health. However, differential item functioning (DIF), which arises when population sub-groups with the same underlying (i.e., latent) level of health have different measured item response probabilities, may compromise the comparability of these measures. The purpose of this study was to test for DIF on the SF-36 physical functioning (PF) and mental health (MH) sub-scale items in a Canadian population-based sample. Study data were from the prospective Canadian Multicentre Osteoporosis Study (CaMos), which collected baseline data in 1996-1997. DIF was tested using a multiple indicators multiple causes (MIMIC) method. Confirmatory factor analysis defined the latent variable measurement model for the item responses and latent variable regression with demographic and health status covariates (i.e., sex, age group, body weight, self-perceived general health) produced estimates of the magnitude of DIF effects. The CaMos cohort consisted of 9423 respondents; 69.4% were female and 51.7% were less than 65 years. Eight of 10 items on the PF sub-scale and four of five items on the MH sub-scale exhibited DIF. Large DIF effects were observed on PF sub-scale items about vigorous and moderate activities, lifting and carrying groceries, walking one block, and bathing or dressing. On the MH sub-scale items, all DIF effects were small or moderate in size. SF-36 PF and MH sub-scale scores were not comparable across population sub-groups defined by demographic and health status variables due to the effects of DIF, although the magnitude of this bias was not large for most items. We recommend testing and adjusting for DIF to ensure comparability of the SF-36 in population-based investigations.
ERIC Educational Resources Information Center
Tan, Xuan; Xiang, Bihua; Dorans, Neil J.; Qu, Yanxuan
2010-01-01
The nature of the matching criterion (usually the total score) in the study of differential item functioning (DIF) has been shown to impact the accuracy of different DIF detection procedures. One of the topics related to the nature of the matching criterion is whether the studied item should be included. Although many studies exist that suggest…
ERIC Educational Resources Information Center
Shih, Ching-Lin; Wang, Wen-Chung
2009-01-01
The multiple indicators, multiple causes (MIMIC) method with a pure short anchor was proposed to detect differential item functioning (DIF). A simulation study showed that the MIMIC method with an anchor of 1, 2, 4, or 10 DIF-free items yielded a well-controlled Type I error rate even when such tests contained as many as 40% DIF items. In general,…
Fischer, H Felix; Wahl, Inka; Nolte, Sandra; Liegl, Gregor; Brähler, Elmar; Löwe, Bernd; Rose, Matthias
2017-12-01
To investigate differential item functioning (DIF) of PROMIS Depression items between US and German samples we compared data from the US PROMIS calibration sample (n = 780), a German general population survey (n = 2,500) and a German clinical sample (n = 621). DIF was assessed in an ordinal logistic regression framework, with 0.02 as criterion for R 2 -change and 0.096 for Raju's non-compensatory DIF. Item parameters were initially fixed to the PROMIS Depression metric; we used plausible values to account for uncertainty in depression estimates. Only four items showed DIF. Accounting for DIF led to negligible effects for the full item bank as well as a post hoc simulated computer-adaptive test (< 0.1 point on the PROMIS metric [mean = 50, standard deviation =10]), while the effect on the short forms was small (< 1 point). The mean depression severity (43.6) in the German general population sample was considerably lower compared to the US reference value of 50. Overall, we found little evidence for language DIF between US and German samples, which could be addressed by either replacing the DIF items by items not showing DIF or by scoring the short form in German samples with the corrected item parameters reported. Copyright © 2016 John Wiley & Sons, Ltd.
ERIC Educational Resources Information Center
Lee, HwaYoung; Beretvas, S. Natasha
2014-01-01
Conventional differential item functioning (DIF) detection methods (e.g., the Mantel-Haenszel test) can be used to detect DIF only across observed groups, such as gender or ethnicity. However, research has found that DIF is not typically fully explained by an observed variable. True sources of DIF may include unobserved, latent variables, such as…
ERIC Educational Resources Information Center
Walker, Cindy M.; Gocer Sahin, Sakine
2017-01-01
The theoretical reason for the presence of differential item functioning (DIF) is that data are multidimensional and two groups of examinees differ in their underlying ability distribution for the secondary dimension(s). Therefore, the purpose of this study was to determine how much the secondary ability distributions must differ before DIF is…
Screening Test Items for Differential Item Functioning
ERIC Educational Resources Information Center
Longford, Nicholas T.
2014-01-01
A method for medical screening is adapted to differential item functioning (DIF). Its essential elements are explicit declarations of the level of DIF that is acceptable and of the loss function that quantifies the consequences of the two kinds of inappropriate classification of an item. Instead of a single level and a single function, sets of…
Differential item functioning by sex and race in the Hogan Personality Inventory.
Sheppard, Richard; Han, Kyunghee; Colarelli, Stephen M; Dai, Guangdong; King, Daniel W
2006-12-01
The authors examined measurement bias in the Hogan Personality Inventory by investigating differential item functioning (DIF) across sex and two racial groups (Caucasian and Black). The sample consisted of 1,579 Caucasians (1,023 men, 556 women) and 523 Blacks (321 men, 202 women) who were applying for entry-level, unskilled jobs in factories. Although the group mean differences were trivial, more than a third of the items showed DIF by sex (38.4%) and by race (37.3%). A content analysis of potentially biased items indicated that the themes of items displaying DIF were slightly more cohesive for sex than for race. The authors discuss possible explanations for differing clustering tendencies of items displaying DIF and some practical and theoretical implications of DIF in the development and interpretation of personality inventories.
Watt, Torquil; Groenvold, Mogens; Hegedüs, Laszlo; Bonnema, Steen Joop; Rasmussen, Åse Krogh; Feldt-Rasmussen, Ulla; Bjorner, Jakob Bue
2014-02-01
To evaluate the extent of differential item functioning (DIF) within the thyroid-specific quality of life patient-reported outcome measure, ThyPRO, according to sex, age, education and thyroid diagnosis. A total of 838 patients with benign thyroid diseases completed the ThyPRO questionnaire (84 five-point items, 13 scales). Uniform and nonuniform DIF were investigated using ordinal logistic regression, testing for both statistical significance and magnitude (∆R(2) > 0.02). Scale level was estimated by the sum score, after purification. Twenty instances of DIF in 17 of the 84 items were found. Eight according to diagnosis, where the goiter scale was the one most affected, possibly due to differing perceptions in patients with auto-immune thyroid diseases compared to patients with simple goiter. Eight DIFs according to age were found, of which 5 were in positively worded items, which younger patients were more likely to endorse; one according to gender: women were more likely to report crying, and three according to educational level. The vast majority of DIF had only minor influence on the scale scores (0.1-2.3 points on the 0-100 scales), but two DIF corresponded to a difference of 4.6 and 9.8, respectively. Ordinal logistic regression identified DIF in 17 of 84 items. The potential impact of this on the present scales was low, but items displaying DIF could be avoided when developing abbreviated scales, where the potential impact of DIF (due to fewer items) will be larger.
Multidimensional Extension of Multiple Indicators Multiple Causes Models to Detect DIF
ERIC Educational Resources Information Center
Lee, Soo; Bulut, Okan; Suh, Youngsuk
2017-01-01
A number of studies have found multiple indicators multiple causes (MIMIC) models to be an effective tool in detecting uniform differential item functioning (DIF) for individual items and item bundles. A recently developed MIMIC-interaction model is capable of detecting both uniform and nonuniform DIF in the unidimensional item response theory…
Gibbons, Laura E; McCurry, Susan; Rhoads, Kristoffer; Masaki, Kamal; White, Lon; Borenstein, Amy R; Larson, Eric B; Crane, Paul K
2009-02-01
The Cognitive Abilities Screening Instrument (CASI) was designed for use in cross-cultural studies of Japanese and Japanese-American elderly in Japan and the U.S.A. The measurement equivalence in Japanese and English had not been confirmed in prior studies. We analyzed the 40 CASI items for differential item functioning (DIF) related to test language, as well as self-reported proficiency with written Japanese, age, and educational attainment in two large epidemiologic studies of Japanese-American elderly: the Kame Project (n=1708) and the Honolulu-Asia Aging Study (HAAS; n = 3148). DIF was present if the demographic groups differed in the probability of success on an item, after controlling for their underlying cognitive functioning ability. While seven CASI items had DIF related to language of testing in Kame (registration of one item; recall of one item; similes; judgment; repeating a phrase; reading and performing a command; and following a three-step instruction), the impact of DIF on participants' scores was minimal. Mean scores for Japanese and English speakers in Kame changed by <0.1 SD after accounting for DIF related to test language. In HAAS, insufficient numbers of participants were tested in Japanese to assess DIF related to test language. In both studies, DIF related to written Japanese proficiency, age, and educational attainment had minimal impact. To the extent that DIF could be assessed, the CASI appeared to meet the goal of measuring cognitive function equivalently in Japanese and English. Stratified data collection would be needed to confirm this conclusion. DIF assessment should be used in other studies with multiple language groups to confirm that measures function equivalently or, if not, form scores that account for DIF.
NASA Astrophysics Data System (ADS)
Greenberg, Ariela Caren
Differential item functioning (DIF) and differential distractor functioning (DDF) are methods used to screen for item bias (Camilli & Shepard, 1994; Penfield, 2008). Using an applied empirical example, this mixed-methods study examined the congruency and relationship of DIF and DDF methods in screening multiple-choice items. Data for Study I were drawn from item responses of 271 female and 236 male low-income children on a preschool science assessment. Item analyses employed a common statistical approach of the Mantel-Haenszel log-odds ratio (MH-LOR) to detect DIF in dichotomously scored items (Holland & Thayer, 1988), and extended the approach to identify DDF (Penfield, 2008). Findings demonstrated that the using MH-LOR to detect DIF and DDF supported the theoretical relationship that the magnitude and form of DIF and are dependent on the DDF effects, and demonstrated the advantages of studying DIF and DDF in multiple-choice items. A total of 4 items with DIF and DDF and 5 items with only DDF were detected. Study II incorporated an item content review, an important but often overlooked and under-published step of DIF and DDF studies (Camilli & Shepard). Interviews with 25 female and 22 male low-income preschool children and an expert review helped to interpret the DIF and DDF results and their comparison, and determined that a content review process of studied items can reveal reasons for potential item bias that are often congruent with the statistical results. Patterns emerged and are discussed in detail. The quantitative and qualitative analyses were conducted in an applied framework of examining the validity of the preschool science assessment scores for evaluating science programs serving low-income children, however, the techniques can be generalized for use with measures across various disciplines of research.
ERIC Educational Resources Information Center
Moses, Tim; Miao, Jing; Dorans, Neil
2010-01-01
This study compared the accuracies of four differential item functioning (DIF) estimation methods, where each method makes use of only one of the following: raw data, logistic regression, loglinear models, or kernel smoothing. The major focus was on the estimation strategies' potential for estimating score-level, conditional DIF. A secondary focus…
Detection of Gender-Based Differential Item Functioning in a Mathematics Performance Assessment.
ERIC Educational Resources Information Center
Wang, Ning; Lane, Suzanne
This study used three different differential item functioning (DIF) procedures to examine the extent to which items in a mathematics performance assessment functioned differently for matched gender groups. In addition to examining the appropriateness of individual items in terms of DIF with respect to gender, an attempt was made to identify…
Cameron, Isobel M; Scott, Neil W; Adler, Mats; Reid, Ian C
2014-12-01
It is important for clinical practice and research that measurement scales of well-being and quality of life exhibit only minimal differential item functioning (DIF). DIF occurs where different groups of people endorse items in a scale to different extents after being matched by the intended scale attribute. We investigate the equivalence or otherwise of common methods of assessing DIF. Three methods of measuring age- and sex-related DIF (ordinal logistic regression, Rasch analysis and Mantel χ(2) procedure) were applied to Hospital Anxiety Depression Scale (HADS) data pertaining to a sample of 1,068 patients consulting primary care practitioners. Three items were flagged by all three approaches as having either age- or sex-related DIF with a consistent direction of effect; a further three items identified did not meet stricter criteria for important DIF using at least one method. When applying strict criteria for significant DIF, ordinal logistic regression was slightly less sensitive. Ordinal logistic regression, Rasch analysis and contingency table methods yielded consistent results when identifying DIF in the HADS depression and HADS anxiety scales. Regardless of methods applied, investigators should use a combination of statistical significance, magnitude of the DIF effect and investigator judgement when interpreting the results.
Type I Error Inflation in DIF Identification with Mantel-Haenszel: An Explanation and a Solution
ERIC Educational Resources Information Center
Magis, David; De Boeck, Paul
2014-01-01
It is known that sum score-based methods for the identification of differential item functioning (DIF), such as the Mantel-Haenszel (MH) approach, can be affected by Type I error inflation in the absence of any DIF effect. This may happen when the items differ in discrimination and when there is item impact. On the other hand, outlier DIF methods…
A Monte Carlo Study of an Iterative Wald Test Procedure for DIF Analysis
ERIC Educational Resources Information Center
Cao, Mengyang; Tay, Louis; Liu, Yaowu
2017-01-01
This study examined the performance of a proposed iterative Wald approach for detecting differential item functioning (DIF) between two groups when preknowledge of anchor items is absent. The iterative approach utilizes the Wald-2 approach to identify anchor items and then iteratively tests for DIF items with the Wald-1 approach. Monte Carlo…
A Comparison of Strategies for Estimating Conditional DIF
ERIC Educational Resources Information Center
Moses, Tim; Miao, Jing; Dorans, Neil J.
2010-01-01
In this study, the accuracies of four strategies were compared for estimating conditional differential item functioning (DIF), including raw data, logistic regression, log-linear models, and kernel smoothing. Real data simulations were used to evaluate the estimation strategies across six items, DIF and No DIF situations, and four sample size…
Likelihood-Ratio DIF Testing: Effects of Nonnormality
ERIC Educational Resources Information Center
Woods, Carol M.
2008-01-01
Differential item functioning (DIF) occurs when an item has different measurement properties for members of one group versus another. Likelihood-ratio (LR) tests for DIF based on item response theory (IRT) involve statistically comparing IRT models that vary with respect to their constraints. A simulation study evaluated how violation of the…
ERIC Educational Resources Information Center
Scheuneman, Janice Dowd; Gerritz, Kalle
1990-01-01
Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)
ERIC Educational Resources Information Center
Tay, Louis; Vermunt, Jeroen K.; Wang, Chun
2013-01-01
We evaluate the item response theory with covariates (IRT-C) procedure for assessing differential item functioning (DIF) without preknowledge of anchor items (Tay, Newman, & Vermunt, 2011). This procedure begins with a fully constrained baseline model, and candidate items are tested for uniform and/or nonuniform DIF using the Wald statistic.…
Real and Artificial Differential Item Functioning in Polytomous Items
ERIC Educational Resources Information Center
Andrich, David; Hagquist, Curt
2015-01-01
Differential item functioning (DIF) for an item between two groups is present if, for the same person location on a variable, persons from different groups have different expected values for their responses. Applying only to dichotomously scored items in the popular Mantel-Haenszel (MH) method for detecting DIF in which persons are classified by…
ERIC Educational Resources Information Center
Thurman, Carol
2009-01-01
The increased use of polytomous item formats has led assessment developers to pay greater attention to the detection of differential item functioning (DIF) in these items. DIF occurs when an item performs differently for two contrasting groups of respondents (e.g., males versus females) after controlling for differences in the abilities of the…
ERIC Educational Resources Information Center
Benítez, Isabel; Padilla, José-Luis
2014-01-01
Differential item functioning (DIF) can undermine the validity of cross-lingual comparisons. While a lot of efficient statistics for detecting DIF are available, few general findings have been found to explain DIF results. The objective of the article was to study DIF sources by using a mixed method design. The design involves a quantitative phase…
Ramsay-Curve Differential Item Functioning
ERIC Educational Resources Information Center
Woods, Carol M.
2011-01-01
Differential item functioning (DIF) occurs when an item on a test, questionnaire, or interview has different measurement properties for one group of people versus another, irrespective of true group-mean differences on the constructs being measured. This article is focused on item response theory based likelihood ratio testing for DIF (IRT-LR or…
A Comparison of Lord's Chi Square and Raju's Area Measures in Detection of DIF.
ERIC Educational Resources Information Center
Cohen, Allan S.; Kim, Seock-Ho
1993-01-01
The effectiveness of two statistical tests of the area between item response functions (exact signed area and exact unsigned area) estimated in different samples, a measure of differential item functioning (DIF), was compared with Lord's chi square. Lord's chi square was found the most effective in determining DIF. (SLD)
Langer, Michelle M.; Hill, Cheryl D.; Thissen, David; Burwinkle, Tasha M.; Varni, James W.; DeWalt, Darren A.
2008-01-01
Objective To demonstrate the value of item response theory (IRT) and differential item functioning (DIF) methods in examining a health-related quality of life (HRQOL) measure in children and adolescents. Study Design and Setting This illustration uses data from 5,429 children using the four subscales of the PedsQL™ 4.0 Generic Core Scales. The IRT model-based likelihood ratio test was used to detect and evaluate DIF between healthy children and children with a chronic condition. Results DIF was detected for a majority of items but cancelled out at the total test score level due to opposing directions of DIF. Post-hoc analysis indicated that this pattern of results may be due to multidimensionality. We discuss issues in detecting and handling DIF. Conclusion This paper describes how to perform DIF analyses in validating a questionnaire to ensure that scores have equivalent meaning across subgroups. It offers insight into ways information gained through the analysis can be used to evaluate an existing scale. PMID:18226750
Anchor Selection Strategies for DIF Analysis: Review, Assessment, and New Approaches
ERIC Educational Resources Information Center
Kopf, Julia; Zeileis, Achim; Strobl, Carolin
2015-01-01
Differential item functioning (DIF) indicates the violation of the invariance assumption, for instance, in models based on item response theory (IRT). For item-wise DIF analysis using IRT, a common metric for the item parameters of the groups that are to be compared (e.g., for the reference and the focal group) is necessary. In the Rasch model,…
NASA Astrophysics Data System (ADS)
Ilich, Maria O.
Psychometricians and test developers evaluate standardized tests for potential bias against groups of test-takers by using differential item functioning (DIF). English language learners (ELLs) are a diverse group of students whose native language is not English. While they are still learning the English language, they must take their standardized tests for their school subjects, including science, in English. In this study, linguistic complexity was examined as a possible source of DIF that may result in test scores that confound science knowledge with a lack of English proficiency among ELLs. Two years of fifth-grade state science tests were analyzed for evidence of DIF using two DIF methods, Simultaneous Item Bias Test (SIBTest) and logistic regression. The tests presented a unique challenge in that the test items were grouped together into testlets---groups of items referring to a scientific scenario to measure knowledge of different science content or skills. Very large samples of 10, 256 students in 2006 and 13,571 students in 2007 were examined. Half of each sample was composed of Spanish-speaking ELLs; the balance was comprised of native English speakers. The two DIF methods were in agreement about the items that favored non-ELLs and the items that favored ELLs. Logistic regression effect sizes were all negligible, while SIBTest flagged items with low to high DIF. A decrease in socioeconomic status and Spanish-speaking ELL diversity may have led to inconsistent SIBTest effect sizes for items used in both testing years. The DIF results for the testlets suggested that ELLs lacked sufficient opportunity to learn science content. The DIF results further suggest that those constructed response test items requiring the student to draw a conclusion about a scientific investigation or to plan a new investigation tended to favor ELLs.
Assessment of Differential Item Functioning in the Experiences of Discrimination Index
Cunningham, Timothy J.; Berkman, Lisa F.; Gortmaker, Steven L.; Kiefe, Catarina I.; Jacobs, David R.; Seeman, Teresa E.; Kawachi, Ichiro
2011-01-01
The psychometric properties of instruments used to measure self-reported experiences of discrimination in epidemiologic studies are rarely assessed, especially regarding construct validity. The authors used 2000–2001 data from the Coronary Artery Risk Development in Young Adults (CARDIA) Study to examine differential item functioning (DIF) in 2 versions of the Experiences of Discrimination (EOD) Index, an index measuring self-reported experiences of racial/ethnic and gender discrimination. DIF may confound interpretation of subgroup differences. Large DIF was observed for 2 of 7 racial/ethnic discrimination items: White participants reported more racial/ethnic discrimination for the “at school” item, and black participants reported more racial/ethnic discrimination for the “getting housing” item. The large DIF by race/ethnicity in the index for racial/ethnic discrimination probably reflects item impact and is the result of valid group differences between blacks and whites regarding their respective experiences of discrimination. The authors also observed large DIF by race/ethnicity for 3 of 7 gender discrimination items. This is more likely to have been due to item bias. Users of the EOD Index must consider the advantages and disadvantages of DIF adjustment (omitting items, constructing separate measures, and retaining items). The EOD Index has substantial usefulness as an instrument that can assess self-reported experiences of discrimination. PMID:22038104
Effect of Purification Procedures on DIF Analysis in IRTPRO
ERIC Educational Resources Information Center
Fikis, David R. J.; Oshima, T. C.
2017-01-01
Purification of the test has been a well-accepted procedure in enhancing the performance of tests for differential item functioning (DIF). As defined by Lord, purification requires reestimation of ability parameters after removing DIF items before conducting the final DIF analysis. IRTPRO 3 is a recently updated program for analyses in item…
Examining Multiple Sources of Differential Item Functioning on the Clinician & Group CAHPS® Survey
Rodriguez, Hector P; Crane, Paul K
2011-01-01
Objective To evaluate psychometric properties of a widely used patient experience survey. Data Sources English-language responses to the Clinician & Group Consumer Assessment of Healthcare Providers and Systems (CG-CAHPS®) survey (n = 12,244) from a 2008 quality improvement initiative involving eight southern California medical groups. Methods We used an iterative hybrid ordinal logistic regression/item response theory differential item functioning (DIF) algorithm to identify items with DIF related to patient sociodemographic characteristics, duration of the physician–patient relationship, number of physician visits, and self-rated physical and mental health. We accounted for all sources of DIF and determined its cumulative impact. Principal Findings The upper end of the CG-CAHPS® performance range is measured with low precision. With sensitive settings, some items were found to have DIF. However, overall DIF impact was negligible, as 0.14 percent of participants had salient DIF impact. Latinos who spoke predominantly English at home had the highest prevalence of salient DIF impact at 0.26 percent. Conclusions The CG-CAHPS® functions similarly across commercially insured respondents from diverse backgrounds. Consequently, previously documented racial and ethnic group differences likely reflect true differences rather than measurement bias. The impact of low precision at the upper end of the scale should be clarified. PMID:22092021
Differential Item Functioning: Its Consequences. Research Report. ETS RR-10-01
ERIC Educational Resources Information Center
Lee, Yi-Hsuan; Zhang, Jinming
2010-01-01
This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…
Real and Artificial Differential Item Functioning
ERIC Educational Resources Information Center
Andrich, David; Hagquist, Curt
2012-01-01
The literature in modern test theory on procedures for identifying items with differential item functioning (DIF) among two groups of persons includes the Mantel-Haenszel (MH) procedure. Generally, it is not recognized explicitly that if there is real DIF in some items which favor one group, then as an artifact of this procedure, artificial DIF…
ERIC Educational Resources Information Center
Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung
2014-01-01
The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…
DIF Detection Using Multiple-Group Categorical CFA with Minimum Free Baseline Approach
ERIC Educational Resources Information Center
Chang, Yu-Wei; Huang, Wei-Kang; Tsai, Rung-Ching
2015-01-01
The aim of this study is to assess the efficiency of using the multiple-group categorical confirmatory factor analysis (MCCFA) and the robust chi-square difference test in differential item functioning (DIF) detection for polytomous items under the minimum free baseline strategy. While testing for DIF items, despite the strong assumption that all…
Teresi, Jeanne A.; Ocepek-Welikson, Katja; Kleinman, Marjorie; Ramirez, Mildred; Kim, Giyeon
2017-01-01
This is the first study of the measurement equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Anxiety short forms in a large ethnically diverse sample. The psychometric properties and differential item functioning (DIF) were examined across different racial/ethnic, educational, age, gender and language groups. Methods These data are from individuals selected from cancer registries in the United States. For the analyses of race/ethnicity the reference group was non-Hispanic Whites (n = 2,263), the studied groups were non-Hispanic Blacks (n = 1,117), Hispanics (n = 1,043) and Asians/Pacific Islanders (n = 907). Within the Hispanic subsample, there were 335 interviews conducted in Spanish and 703 in English. The 11 anxiety items were from the PROMIS emotional disturbance item bank. DIF hypotheses were generated by content experts who rated whether or not they expected DIF to be present, and the direction of the DIF with respect to several comparison groups. The primary method used for DIF detection was the Wald test for examination of group differences in item response theory (IRT) item parameters accompanied by magnitude measures. Expected item scores were examined as measures of magnitude. The method used for quantification of the difference in the average expected item scores was the non-compensatory DIF (NCDIF) index. DIF impact was examined using expected scale score functions. Additionally, precision and reliabilities were examined using several methods. Results Although not hypothesized to show DIF for Asians/Pacific Islanders, every item evidenced DIF by at least one method. Two items showed DIF of higher magnitude for Asians/Pacific Islanders vs. Whites: “Many situations made me worry” and “I felt anxious”. However, the magnitude of DIF was small and the NCDIF statistics were not above threshold. The impact of DIF was negligible. For education, six items were identified with consistent DIF across methods: fearful, anxious, worried, hard to focus, uneasy and tense. However, the NCDIF was not above threshold and the impact of DIF on the scale was trivial. No items showed high magnitude DIF for gender. Two items showed slightly higher magnitude for age (although not above the cutoff): worried and fearful. The scale level impact was trivial. Only one item showed DIF with the Wald test after the Bonferroni correction for the language comparisons: “I felt fearful”. Two additional items were flagged in sensitivity analyses after Bonferroni correction, anxious and many situations made me worry. The latter item also showed DIF of higher magnitude, with an NCDIF value (0.144) above threshold. Individual impact was relatively small. Conclusions Although many items from the PROMIS short form anxiety measures were flagged with DIF, item level magnitude was low and scale level DIF impact was minimal; however, three items: anxious, worried and many situations made me worry might be singled out for further study. It is concluded that the PROMIS Anxiety short form evidenced good psychometric properties, was relatively invariant across the groups studied, and performed well among ethnically diverse subgroups of Blacks, Hispanic, White non-Hispanic and Asians/Pacific Islanders. In general more research with the Asians/Pacific Islanders group is needed. Further study of subgroups within these broad categories is recommended. PMID:28649483
Hu, Jinxiang; Ward, Michael M
2017-09-01
To determine if persons with arthritis differ systematically from persons without arthritis in how they respond to questions on three depression questionnaires, which include somatic items such as fatigue and sleep disturbance. We extracted data on the Centers for Epidemiological Studies Depression (CES-D) scale, the Patient Health Questionnaire-9 (PHQ-9), and the Kessler-6 (K-6) scale from three large population-based national surveys. We assessed items on these questionnaires for differential item functioning (DIF) between persons with and without self-reported physician-diagnosed arthritis using multiple indicator multiple cause models, which controlled for the underlying level of depression and important confounders. We also examined if DIF by arthritis status was similar between women and men. Although five items of the CES-D, one item of the PHQ-9, and five items of the K-6 scale had evidence of DIF based on statistical comparisons, the magnitude of each difference was less than the threshold of a small effect. The statistical differences were a function of the very large sample sizes in the surveys. Effect sizes for DIF were similar between women and men except for two items on the Patient Health Questionnaire-9. For each questionnaire, DIF accounted for 8% or less of the arthritis-depression association, and excluding items with DIF did not reduce the difference in depression scores between those with and without arthritis. Persons with arthritis respond to items on the CES-D, PHQ-9, and K-6 depression scales similarly to persons without arthritis, despite the inclusion of somatic items in these scales.
ERIC Educational Resources Information Center
Myers, Nicholas D.; Wolfe, Edward W.; Feltz, Deborah L.; Penfield, Randall D.
2006-01-01
This study (a) provided a conceptual introduction to differential item functioning (DIF), (b) introduced the multifaceted Rasch rating scale model (MRSM) and an associated statistical procedure for identifying DIF in rating scale items, and (c) applied this procedure to previously collected data from American coaches who responded to the coaching…
Differential Item Functioning Analysis of the 2003-04 NHANES Physical Activity Questionnaire
ERIC Educational Resources Information Center
Gao, Yong; Zhu, Weimo
2011-01-01
Using differential item functioning (DIF) analyses, this study examined whether there were any DIF items in the National Health and Nutrition Examination Survey (NHANES) physical activity (PA) questionnaire. A subset of adult data from the 2003-04 NHANES study (n = 3,083) was used. PA items related to respondents' occupational, transportation,…
Using a Mixture IRT Model to Understand English Learner Performance on Large-Scale Assessments
ERIC Educational Resources Information Center
Shea, Christine A.
2013-01-01
The purpose of this study was to determine whether an eighth grade state-level math assessment contained items that function differentially (DIF) for English Learner students (EL) as compared to English Only students (EO) and if so, what factors might have caused DIF. To determine this, Differential Item Functioning (DIF) analysis was employed.…
DIFAS: Differential Item Functioning Analysis System. Computer Program Exchange
ERIC Educational Resources Information Center
Penfield, Randall D.
2005-01-01
Differential item functioning (DIF) is an important consideration in assessing the validity of test scores (Camilli & Shepard, 1994). A variety of statistical procedures have been developed to assess DIF in tests of dichotomous (Hills, 1989; Millsap & Everson, 1993) and polytomous (Penfield & Lam, 2000; Potenza & Dorans, 1995) items. Some of these…
ERIC Educational Resources Information Center
Fidalgo, Angel M.
2011-01-01
Mantel-Haenszel (MH) methods constitute one of the most popular nonparametric differential item functioning (DIF) detection procedures. GMHDIF has been developed to provide an easy-to-use program for conducting DIF analyses. Some of the advantages of this program are that (a) it performs two-stage DIF analyses in multiple groups simultaneously;…
Jafari, Peyman; Sharafi, Zahra; Bagheri, Zahra; Shalileh, Sara
2014-06-01
Measurement equivalence is a necessary assumption for meaningful comparison of pediatric quality of life rated by children and parents. In this study, differential item functioning (DIF) analysis is used to examine whether children and their parents respond consistently to the items in the KINDer Lebensqualitätsfragebogen (KINDL; in German, Children Quality of Life Questionnaire). Two DIF detection methods, graded response model (GRM) and ordinal logistic regression (OLR), were applied for comparability. The KINDL was completed by 1,086 school children and 1,061 of their parents. While the GRM revealed that 12 out of the 24 items were flagged with DIF, the OLR identified 14 out of the 24 items with DIF. Seven items with DIF and five items without DIF were common across the two methods, yielding a total agreement rate of 50 %. This study revealed that parent proxy-reports cannot be used as a substitute for a child's ratings in the KINDL.
Gibbons, Laura E; Crane, Paul K; Mehta, Kala M; Pedraza, Otto; Tang, Yuxiao; Manly, Jennifer J; Narasimhalu, Kaavya; Teresi, Jeanne; Jones, Richard N; Mungas, Dan
2011-04-28
Differential item functioning (DIF) occurs when a test item has different statistical properties in subgroups, controlling for the underlying ability measured by the test. DIF assessment is necessary when evaluating measurement bias in tests used across different language groups. However, other factors such as educational attainment can differ across language groups, and DIF due to these other factors may also exist. How to conduct DIF analyses in the presence of multiple, correlated factors remains largely unexplored. This study assessed DIF related to Spanish versus English language in a 44-item object naming test. Data come from a community-based sample of 1,755 Spanish- and English-speaking older adults. We compared simultaneous accounting, a new strategy for handling differences in educational attainment across language groups, with existing methods. Compared to other methods, simultaneously accounting for language- and education-related DIF yielded salient differences in some object naming scores, particularly for Spanish speakers with at least 9 years of education. Accounting for factors that vary across language groups can be important when assessing language DIF. The use of simultaneous accounting will be relevant to other cross-cultural studies in cognition and in other fields, including health-related quality of life.
Gibbons, Laura E.; Crane, Paul K.; Mehta, Kala M.; Pedraza, Otto; Tang, Yuxiao; Manly, Jennifer J.; Narasimhalu, Kaavya; Teresi, Jeanne; Jones, Richard N.; Mungas, Dan
2012-01-01
Differential item functioning (DIF) occurs when a test item has different statistical properties in subgroups, controlling for the underlying ability measured by the test. DIF assessment is necessary when evaluating measurement bias in tests used across different language groups. However, other factors such as educational attainment can differ across language groups, and DIF due to these other factors may also exist. How to conduct DIF analyses in the presence of multiple, correlated factors remains largely unexplored. This study assessed DIF related to Spanish versus English language in a 44-item object naming test. Data come from a community-based sample of 1,755 Spanish- and English-speaking older adults. We compared simultaneous accounting, a new strategy for handling differences in educational attainment across language groups, with existing methods. Compared to other methods, simultaneously accounting for language- and education-related DIF yielded salient differences in some object naming scores, particularly for Spanish speakers with at least 9 years of education. Accounting for factors that vary across language groups can be important when assessing language DIF. The use of simultaneous accounting will be relevant to other cross-cultural studies in cognition and in other fields, including health-related quality of life. PMID:22900138
ERIC Educational Resources Information Center
Drabinová, Adéla; Martinková, Patrícia
2017-01-01
In this article we present a general approach not relying on item response theory models (non-IRT) to detect differential item functioning (DIF) in dichotomous items with presence of guessing. The proposed nonlinear regression (NLR) procedure for DIF detection is an extension of method based on logistic regression. As a non-IRT approach, NLR can…
ERIC Educational Resources Information Center
Magis, David; De Boeck, Paul
2011-01-01
We focus on the identification of differential item functioning (DIF) when more than two groups of examinees are considered. We propose to consider items as elements of a multivariate space, where DIF items are outlying elements. Following this approach, the situation of multiple groups is a quite natural case. A robust statistics technique is…
Using Loss Functions for DIF Detection: An Empirical Bayes Approach.
ERIC Educational Resources Information Center
Zwick, Rebecca; Thayer, Dorothy; Lewis, Charles
2000-01-01
Studied a method for flagging differential item functioning (DIF) based on loss functions. Builds on earlier research that led to the development of an empirical Bayes enhancement to the Mantel-Haenszel DIF analysis. Tested the method through simulation and found its performance better than some commonly used DIF classification systems. (SLD)
ERIC Educational Resources Information Center
Hou, Likun; de la Torre, Jimmy; Nandakumar, Ratna
2014-01-01
Analyzing examinees' responses using cognitive diagnostic models (CDMs) has the advantage of providing diagnostic information. To ensure the validity of the results from these models, differential item functioning (DIF) in CDMs needs to be investigated. In this article, the Wald test is proposed to examine DIF in the context of CDMs. This study…
Testing for Differential Item Functioning with Measures of Partial Association
ERIC Educational Resources Information Center
Woods, Carol M.
2009-01-01
Differential item functioning (DIF) occurs when an item on a test or questionnaire has different measurement properties for one group of people versus another, irrespective of mean differences on the construct. There are many methods available for DIF assessment. The present article is focused on indices of partial association. A family of average…
Comparing DIF Methods for Data with Dual Dependency
ERIC Educational Resources Information Center
Jin, Ying; Kang, Minsoo
2016-01-01
Background: The current study compared four differential item functioning (DIF) methods to examine their performances in terms of accounting for dual dependency (i.e., person and item clustering effects) simultaneously by a simulation study, which is not sufficiently studied under the current DIF literature. The four methods compared are logistic…
ERIC Educational Resources Information Center
Grover, Raman K.; Ercikan, Kadriye
2017-01-01
In gender differential item functioning (DIF) research it is assumed that all members of a gender group have similar item response patterns and therefore generalizations from group level to subgroup and individual levels can be made accurately. However DIF items do not necessarily disadvantage every member of a gender group to the same degree,…
Zampetakis, Leonidas A.; Bakatsaki, Maria; Litos, Charalambos; Kafetsios, Konstantinos G.; Moustakis, Vassilis
2017-01-01
Over the past years the percentage of female entrepreneurs has increased, yet it is still far below of that for males. Although various attempts have been made to explain differences in mens’ and women’s entrepreneurial attitudes and intentions, the extent to which those differences are due to self-report biases has not been yet considered. The present study utilized Differential Item Functioning (DIF) to compare men and women’s reporting on entrepreneurial intentions. DIF occurs in situations where members of different groups show differing probabilities of endorsing an item despite possessing the same level of the ability that the item is intended to measure. Drawing on the theory of planned behavior (TPB), the present study investigated whether constructs such as entrepreneurial attitudes, perceived behavioral control, subjective norms and intention would show gender differences and whether these gender differences could be explained by DIF. Using DIF methods on a dataset of 1800 Greek participants (50.4% female) indicated that differences at the item-level are almost non-existent. Moreover, the differential test functioning (DTF) analysis, which allows assessing the overall impact of DIF effects with all items being taken into account simultaneously, suggested that the effect of DIF across all the items for each scale was negligible. Future research should consider that measurement invariance can be assumed when using TPB constructs for the study of entrepreneurial motivation independent of gender. PMID:28386244
Zampetakis, Leonidas A; Bakatsaki, Maria; Litos, Charalambos; Kafetsios, Konstantinos G; Moustakis, Vassilis
2017-01-01
Over the past years the percentage of female entrepreneurs has increased, yet it is still far below of that for males. Although various attempts have been made to explain differences in mens' and women's entrepreneurial attitudes and intentions, the extent to which those differences are due to self-report biases has not been yet considered. The present study utilized Differential Item Functioning (DIF) to compare men and women's reporting on entrepreneurial intentions. DIF occurs in situations where members of different groups show differing probabilities of endorsing an item despite possessing the same level of the ability that the item is intended to measure. Drawing on the theory of planned behavior (TPB), the present study investigated whether constructs such as entrepreneurial attitudes, perceived behavioral control, subjective norms and intention would show gender differences and whether these gender differences could be explained by DIF. Using DIF methods on a dataset of 1800 Greek participants (50.4% female) indicated that differences at the item-level are almost non-existent. Moreover, the differential test functioning (DTF) analysis, which allows assessing the overall impact of DIF effects with all items being taken into account simultaneously, suggested that the effect of DIF across all the items for each scale was negligible. Future research should consider that measurement invariance can be assumed when using TPB constructs for the study of entrepreneurial motivation independent of gender.
ERIC Educational Resources Information Center
French, Brian F.; Gotch, Chad M.
2013-01-01
The Brigance Comprehensive Inventory of Basic Skills-II (CIBS-II) is a diagnostic battery intended for children in grades 1st through 6th. The aim of this study was to test for item invariance, or differential item functioning (DIF), of the CIBS-II across sex in the standardization sample through the use of item response theory DIF detection…
ERIC Educational Resources Information Center
Dimitrov, Dimiter M.
2017-01-01
This article offers an approach to examining differential item functioning (DIF) under its item response theory (IRT) treatment in the framework of confirmatory factor analysis (CFA). The approach is based on integrating IRT- and CFA-based testing of DIF and using bias-corrected bootstrap confidence intervals with a syntax code in Mplus.
Assessment of Differential Item Functioning under Cognitive Diagnosis Models: The DINA Model Example
ERIC Educational Resources Information Center
Li, Xiaomin; Wang, Wen-Chung
2015-01-01
The assessment of differential item functioning (DIF) is routinely conducted to ensure test fairness and validity. Although many DIF assessment methods have been developed in the context of classical test theory and item response theory, they are not applicable for cognitive diagnosis models (CDMs), as the underlying latent attributes of CDMs are…
ERIC Educational Resources Information Center
Martinková, Patricia; Drabinová, Adéla; Liaw, Yuan-Ling; Sanders, Elizabeth A.; McFarland, Jenny L.; Price, Rebecca M.
2017-01-01
We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments. After explaining a number of methodological approaches, we test for gender bias in two scenarios that demonstrate why DIF analysis is crucial for developing assessments, particularly because…
Martinková, Patrícia; Drabinová, Adéla; Liaw, Yuan-Ling; Sanders, Elizabeth A.; McFarland, Jenny L.; Price, Rebecca M.
2017-01-01
We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments. After explaining a number of methodological approaches, we test for gender bias in two scenarios that demonstrate why DIF analysis is crucial for developing assessments, particularly because simply comparing two groups’ total scores can lead to incorrect conclusions about test fairness. First, a significant difference between groups on total scores can exist even when items are not biased, as we illustrate with data collected during the validation of the Homeostasis Concept Inventory. Second, item bias can exist even when the two groups have exactly the same distribution of total scores, as we illustrate with a simulated data set. We also present a brief overview of how DIF analysis has been used in the biology education literature to illustrate the way DIF items need to be reevaluated by content experts to determine whether they should be revised or removed from the assessment. Finally, we conclude by arguing that DIF analysis should be used routinely to evaluate items in developing conceptual assessments. These steps will ensure more equitable—and therefore more valid—scores from conceptual assessments. PMID:28572182
ERIC Educational Resources Information Center
Jin, Ying; Myers, Nicholas D.; Ahn, Soyeon
2014-01-01
Previous research has demonstrated that differential item functioning (DIF) methods that do not account for multilevel data structure could result in too frequent rejection of the null hypothesis (i.e., no DIF) when the intraclass correlation coefficient (?) of the studied item was the same as the ? of the total score. The current study extended…
Sharafi, Zahra
2017-01-01
Background The purpose of this study was to evaluate the effectiveness of two methods of detecting differential item functioning (DIF) in the presence of multilevel data and polytomously scored items. The assessment of DIF with multilevel data (e.g., patients nested within hospitals, hospitals nested within districts) from large-scale assessment programs has received considerable attention but very few studies evaluated the effect of hierarchical structure of data on DIF detection for polytomously scored items. Methods The ordinal logistic regression (OLR) and hierarchical ordinal logistic regression (HOLR) were utilized to assess DIF in simulated and real multilevel polytomous data. Six factors (DIF magnitude, grouping variable, intraclass correlation coefficient, number of clusters, number of participants per cluster, and item discrimination parameter) with a fully crossed design were considered in the simulation study. Furthermore, data of Pediatric Quality of Life Inventory™ (PedsQL™) 4.0 collected from 576 healthy school children were analyzed. Results Overall, results indicate that both methods performed equivalently in terms of controlling Type I error and detection power rates. Conclusions The current study showed negligible difference between OLR and HOLR in detecting DIF with polytomously scored items in a hierarchical structure. Implications and considerations while analyzing real data were also discussed. PMID:29312463
Sharafi, Zahra; Mousavi, Amin; Ayatollahi, Seyyed Mohammad Taghi; Jafari, Peyman
2017-01-01
The purpose of this study was to evaluate the effectiveness of two methods of detecting differential item functioning (DIF) in the presence of multilevel data and polytomously scored items. The assessment of DIF with multilevel data (e.g., patients nested within hospitals, hospitals nested within districts) from large-scale assessment programs has received considerable attention but very few studies evaluated the effect of hierarchical structure of data on DIF detection for polytomously scored items. The ordinal logistic regression (OLR) and hierarchical ordinal logistic regression (HOLR) were utilized to assess DIF in simulated and real multilevel polytomous data. Six factors (DIF magnitude, grouping variable, intraclass correlation coefficient, number of clusters, number of participants per cluster, and item discrimination parameter) with a fully crossed design were considered in the simulation study. Furthermore, data of Pediatric Quality of Life Inventory™ (PedsQL™) 4.0 collected from 576 healthy school children were analyzed. Overall, results indicate that both methods performed equivalently in terms of controlling Type I error and detection power rates. The current study showed negligible difference between OLR and HOLR in detecting DIF with polytomously scored items in a hierarchical structure. Implications and considerations while analyzing real data were also discussed.
Differential item functioning analysis of the Vanderbilt Expertise Test for cars.
Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W; Van Gulick, Ana Beth; Gauthier, Isabel
2015-01-01
The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge.
A Monte Carlo Study Investigating Missing Data, Differential Item Functioning, and Effect Size
ERIC Educational Resources Information Center
Garrett, Phyllis
2009-01-01
The use of polytomous items in assessments has increased over the years, and as a result, the validity of these assessments has been a concern. Differential item functioning (DIF) and missing data are two factors that may adversely affect assessment validity. Both factors have been studied separately, but DIF and missing data are likely to occur…
A Methodology for Zumbo's Third Generation DIF Analyses and the Ecology of Item Responding
ERIC Educational Resources Information Center
Zumbo, Bruno D.; Liu, Yan; Wu, Amery D.; Shear, Benjamin R.; Olvera Astivia, Oscar L.; Ark, Tavinder K.
2015-01-01
Methods for detecting differential item functioning (DIF) and item bias are typically used in the process of item analysis when developing new measures; adapting existing measures for different populations, languages, or cultures; or more generally validating test score inferences. In 2007 in "Language Assessment Quarterly," Zumbo…
Evaluation of MIMIC-Model Methods for DIF Testing with Comparison to Two-Group Analysis
ERIC Educational Resources Information Center
Woods, Carol M.
2009-01-01
Differential item functioning (DIF) occurs when an item on a test or questionnaire has different measurement properties for 1 group of people versus another, irrespective of mean differences on the construct. This study focuses on the use of multiple-indicator multiple-cause (MIMIC) structural equation models for DIF testing, parameterized as item…
NASA Astrophysics Data System (ADS)
Qian, Xiaoyu
Science is an area where a large achievement gap has been observed between White and minority, and between male and female students. The science minority gap has continued as indicated by the National Assessment of Educational Progress and the Trends in International Mathematics and Science Studies (TIMSS). TIMSS also shows a gender gap favoring males emerging at the eighth grade. Both gaps continue to be wider in the number of doctoral degrees and full professorships awarded (NSF, 2008). The current study investigated both minority and gender achievement gaps in science utilizing a multi-level differential item functioning (DIF) methodology (Kamata, 2001) within fully Bayesian framework. All dichotomously coded items from TIMSS 2007 science assessment at eighth grade were analyzed. Both gender DIF and minority DIF were studied. Multi-level models were employed to identify DIF items and sources of DIF at both student and teacher levels. The study found that several student variables were potential sources of achievement gaps. It was also found that gender DIF favoring male students was more noticeable in the content areas of physics and earth science than biology and chemistry. In terms of item type, the majority of these gender DIF items were multiple choice than constructed response items. Female students also performed less well on items requiring visual-spatial ability. Minority students performed significantly worse on physics and earth science items as well. A higher percentage of minority DIF items in earth science and biology were constructed response than multiple choice items, indicating that literacy may be the cause of minority DIF. Three-level model results suggested that some teacher variables may be the cause of DIF variations from teacher to teacher. It is essential for both middle school science teachers and science educators to find instructional methods that work more effectively to improve science achievement of both female and minority students. Physics and earth science are two areas to be improved for both groups. Curriculum and instruction need to enhance female students' learning interests and give them opportunities to improve their visual perception skills. Science instruction should address improving minority students' literacy skills while teaching science.
Terluin, Berend; Brouwers, Evelien P M; Marchand, Miquelle A G; de Vet, Henrica C W
2018-05-01
Many paper-and-pencil (P&P) questionnaires have been migrated to electronic platforms. Differential item and test functioning (DIF and DTF) analysis constitutes a superior research design to assess measurement equivalence across modes of administration. The purpose of this study was to demonstrate an item response theory (IRT)-based DIF and DTF analysis to assess the measurement equivalence of a Web-based version and the original P&P format of the Four-Dimensional Symptom Questionnaire (4DSQ), measuring distress, depression, anxiety, and somatization. The P&P group (n = 2031) and the Web group (n = 958) consisted of primary care psychology clients. Unidimensionality and local independence of the 4DSQ scales were examined using IRT and Yen's Q3. Bifactor modeling was used to assess the scales' essential unidimensionality. Measurement equivalence was assessed using IRT-based DIF analysis using a 3-stage approach: linking on the latent mean and variance, selection of anchor items, and DIF testing using the Wald test. DTF was evaluated by comparing expected scale scores as a function of the latent trait. The 4DSQ scales proved to be essentially unidimensional in both modalities. Five items, belonging to the distress and somatization scales, displayed small amounts of DIF. DTF analysis revealed that the impact of DIF on the scale level was negligible. IRT-based DIF and DTF analysis is demonstrated as a way to assess the equivalence of Web-based and P&P questionnaire modalities. Data obtained with the Web-based 4DSQ are equivalent to data obtained with the P&P version.
Using Mixed Methods to Interpret Differential Item Functioning
ERIC Educational Resources Information Center
Benítez, Isabel; Padilla, José-Luis; Hidalgo Montesinos, María Dolores; Sireci, Stephen G.
2016-01-01
Analysis of differential item functioning (DIF) is often used to determine if cross-lingual assessments are equivalent across languages. However, evidence on the causes of cross-lingual DIF is still evasive. Expert appraisal is a qualitative method useful for obtaining detailed information about problematic elements in the different linguistic…
Babiar, Tasha Calvert
2011-01-01
Traditionally, women and minorities have not been fully represented in science and engineering. Numerous studies have attributed these differences to gaps in science achievement as measured by various standardized tests. Rather than describe mean group differences in science achievement across multiple cultures, this study focused on an in-depth item-level analysis across two countries: Spain and the United States. This study investigated eighth-grade gender differences on science items across the two countries. A secondary purpose of the study was to explore the nature of gender differences using the many-faceted Rasch Model as a way to estimate gender DIF. A secondary analysis of data from the Third International Mathematics and Science Study (TIMSS) was used to address three questions: 1) Does gender DIF in science achievement exist? 2) Is there a relationship between gender DIF and characteristics of the science items? 3) Do the relationships between item characteristics and gender DIF in science items replicate across countries. Participants included 7,087 eight grade students from the United States and 3,855 students from Spain who participated in TIMSS. The Facets program (Linacre and Wright, 1992) was used to estimate gender DIF. The results of the analysis indicate that the content of the item seemed to be related to gender DIF. The analysis also suggests that there is a relationship between gender DIF and item format. No pattern of gender DIF related to cognitive demand was found. The general pattern of gender DIF was similar across the two countries used in the analysis. The strength of item-level analysis as opposed to group mean difference analysis is that gender differences can be detected at the item level, even when no mean differences can be detected at the group level.
ERIC Educational Resources Information Center
Ercikan, Kadriye; Arim, Rubab; Law, Danielle; Domene, Jose; Gagnon, France; Lacroix, Serge
2010-01-01
This paper demonstrates and discusses the use of think aloud protocols (TAPs) as an approach for examining and confirming sources of differential item functioning (DIF). The TAPs are used to investigate to what extent surface characteristics of the items that are identified by expert reviews as sources of DIF are supported by empirical evidence…
Detecting DIF in Polytomous Items Using MACS, IRT and Ordinal Logistic Regression
ERIC Educational Resources Information Center
Elosua, Paula; Wells, Craig
2013-01-01
The purpose of the present study was to compare the Type I error rate and power of two model-based procedures, the mean and covariance structure model (MACS) and the item response theory (IRT), and an observed-score based procedure, ordinal logistic regression, for detecting differential item functioning (DIF) in polytomous items. A simulation…
Application of a Method of Estimating DIF for Polytomous Test Items.
ERIC Educational Resources Information Center
Camilli, Gregory; Congdon, Peter
1999-01-01
Demonstrates a method for studying differential item functioning (DIF) that can be used with dichotomous or polytomous items and that is valid for data that follow a partial credit Item Response Theory model. A simulation study shows that positively biased Type I error rates are in accord with results from previous studies. (SLD)
ERIC Educational Resources Information Center
Monahan, Patrick O.; McHorney, Colleen A.; Stump, Timothy E.; Perkins, Anthony J.
2007-01-01
Previous methodological and applied studies that used binary logistic regression (LR) for detection of differential item functioning (DIF) in dichotomously scored items either did not report an effect size or did not employ several useful measures of DIF magnitude derived from the LR model. Equations are provided for these effect size indices.…
Investigating Causal DIF via Propensity Score Methods
ERIC Educational Resources Information Center
Liu, Yan; Zumbo, Bruno D.; Gustafson, Paul; Huang, Yi; Kroc, Edward; Wu, Amery D.
2016-01-01
A variety of differential item functioning (DIF) methods have been proposed and used for ensuring that a test is fair to all test takers in a target population in the situations of, for example, a test being translated to other languages. However, once a method flags an item as DIF, it is difficult to conclude that the grouping variable (e.g.,…
Differential item functioning analysis of the Vanderbilt Expertise Test for cars
Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W.; Van Gulick, Ana Beth; Gauthier, Isabel
2015-01-01
The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge. PMID:26418499
Abdin, Edimansyah; Subramaniam, Mythily; Picco, Louisa; Pang, Shirlene; Vaingankar, Janhavi Ajit; Shahwan, Shazana; Sagayadevan, Vathsala; Zhang, Yunjue; Chong, Siow Ann
2017-04-01
The present study aims to examine the impact of chronic conditions after adjusting for differential item functioning (DIF) on the various aspects of health-related quality of life (HRQoL) in a multi-ethnic Asian population in Singapore. Data on 3006 participants from a nation-wide cross-sectional survey of mental health literacy conducted in Singapore were used. Multiple Indicators Multiple Causes model was used to investigate the effects of chronic medical conditions on various HRQoL dimensions assessed with the 36-item Medical Outcomes Study Short Form Health Survey (SF-36) after adjusting for DIF. Twenty out of 36 items were detected with DIF for chronic conditions including high blood pressure, cardiovascular disorders, diabetes, cancer, neurological disorders and ulcer as well as for a few demographic factors such age, gender and marital status. Twenty significant associations between chronic conditions and SF-36 domains were observed. After controlling for all chronic conditions, socio-demographic and DIF items, a significant association emerged between cardiovascular disorders and physical functioning, while the association between diabetes and ulcer and general health became nonsignificant. All other associations remained statistically significant. Our findings provide useful information and important implications of DIF on the impact of chronic conditions on HRQoL. We found the impact of DIF with respect to the impact of chronic conditions on HRQoL to be minimal after accounting for measurement bias in this multiracial Asian population.
ERIC Educational Resources Information Center
Lee, HyeSun; Geisinger, Kurt F.
2016-01-01
The current study investigated the impact of matching criterion purification on the accuracy of differential item functioning (DIF) detection in large-scale assessments. The three matching approaches for DIF analyses (block-level matching, pooled booklet matching, and equated pooled booklet matching) were employed with the Mantel-Haenszel…
ERIC Educational Resources Information Center
Finch, W. Holmes; Hernández Finch, Maria E.; French, Brian F.
2016-01-01
Differential item functioning (DIF) assessment is key in score validation. When DIF is present scores may not accurately reflect the construct of interest for some groups of examinees, leading to incorrect conclusions from the scores. Given rising immigration, and the increased reliance of educational policymakers on cross-national assessments…
Using Multiple-Variable Matching to Identify Cultural Sources of Differential Item Functioning
ERIC Educational Resources Information Center
Wu, Amery D.; Ercikan, Kadriye
2006-01-01
Identifying the sources of differential item functioning (DIF) in international assessments is very challenging, because such sources are often nebulous and intertwined. Even though researchers frequently focus on test translation and content area, few actually go beyond these factors to investigate other cultural sources of DIF. This article…
Bjorner, Jakob Bue; Pejtersen, Jan Hyld
2010-02-01
To evaluate the construct validity of the Copenhagen Psychosocial Questionnaire II (COPSOQ II) by means of tests for differential item functioning (DIF) and differential item effect (DIE). We used a Danish general population postal survey (n = 4,732 with 3,517 wage earners) with a one-year register based follow up for long-term sickness absence. DIF was evaluated against age, gender, education, social class, public/private sector employment, and job type using ordinal logistic regression. DIE was evaluated against job satisfaction and self-rated health (using ordinal logistic regression), against depressive symptoms, burnout, and stress (using multiple linear regression), and against long-term sick leave (using a proportional hazards model). We used a cross-validation approach to counter the risk of significant results due to multiple testing. Out of 1,052 tests, we found 599 significant instances of DIF/DIE, 69 of which showed both practical and statistical significance across two independent samples. Most DIF occurred for job type (in 20 cases), while we found little DIF for age, gender, education, social class and sector. DIE seemed to pertain to particular items, which showed DIE in the same direction for several outcome variables. The results allowed a preliminary identification of items that have a positive impact on construct validity and items that have negative impact on construct validity. These results can be used to develop better shortform measures and to improve the conceptual framework, items and scales of the COPSOQ II. We conclude that tests of DIF and DIE are useful for evaluating construct validity.
Adjusting for cross-cultural differences in computer-adaptive tests of quality of life.
Gibbons, C J; Skevington, S M
2018-04-01
Previous studies using the WHOQOL measures have demonstrated that the relationship between individual items and the underlying quality of life (QoL) construct may differ between cultures. If unaccounted for, these differing relationships can lead to measurement bias which, in turn, can undermine the reliability of results. We used item response theory (IRT) to assess differential item functioning (DIF) in WHOQOL data from diverse language versions collected in UK, Zimbabwe, Russia, and India (total N = 1332). Data were fitted to the partial credit 'Rasch' model. We used four item banks previously derived from the WHOQOL-100 measure, which provided excellent measurement for physical, psychological, social, and environmental quality of life domains (40 items overall). Cross-cultural differential item functioning was assessed using analysis of variance for item residuals and post hoc Tukey tests. Simulated computer-adaptive tests (CATs) were conducted to assess the efficiency and precision of the four items banks. Splitting item parameters by DIF results in four linked item banks without DIF or other breaches of IRT model assumptions. Simulated CATs were more precise and efficient than longer paper-based alternatives. Assessing differential item functioning using item response theory can identify measurement invariance between cultures which, if uncontrolled, may undermine accurate comparisons in computer-adaptive testing assessments of QoL. We demonstrate how compensating for DIF using item anchoring allowed data from all four countries to be compared on a common metric, thus facilitating assessments which were both sensitive to cultural nuance and comparable between countries.
Use of Automated Scoring Features to Generate Hypotheses Regarding Language-Based DIF
ERIC Educational Resources Information Center
Shermis, Mark D.; Mao, Liyang; Mulholland, Matthew; Kieftenbeld, Vincent
2017-01-01
This study uses the feature sets employed by two automated scoring engines to determine if a "linguistic profile" could be formulated that would help identify items that are likely to exhibit differential item functioning (DIF) based on linguistic features. Sixteen items were administered to 1200 students where demographic information…
Differential Item Functioning Amplification and Cancellation in a Reading Test
ERIC Educational Resources Information Center
Bao, Han; Dayton, C. Mitchell; Hendrickson, Amy B.
2009-01-01
When testlet effects and item idiosyncratic features are both considered to be the reasons of DIF in educational tests using testlets (Wainer & Kiely, 1987) or item bundles (Rosenbaum, 1988), it is interesting to investigate the phenomena of DIF amplification and cancellation due to the interactive effects of these two factors. This research…
ERIC Educational Resources Information Center
Ögretmen, Tuncay
2015-01-01
The purpose of this study is to carry out differential item functioning (DIF) analysis for content areas of a reading comprehension subtest using four area indices within Item Response Theory (IRT) framework. The differences in the magnitudes of the area indices were compared based on the subject areas. The DIF analysis was carried out across…
ERIC Educational Resources Information Center
Woods, Carol M.; Cai, Li; Wang, Mian
2013-01-01
Differential item functioning (DIF) occurs when the probability of responding in a particular category to an item differs for members of different groups who are matched on the construct being measured. The identification of DIF is important for valid measurement. This research evaluates an improved version of Lord's chi [superscript 2]…
ERIC Educational Resources Information Center
Paek, Insu
2010-01-01
Conservative bias in rejection of a null hypothesis from using the continuity correction in the Mantel-Haenszel (MH) procedure was examined through simulation in a differential item functioning (DIF) investigation context in which statistical testing uses a prespecified level [alpha] for the decision on an item with respect to DIF. The standard MH…
ERIC Educational Resources Information Center
Seo, Dong Gi; Hao, Shiqi
2016-01-01
Differential item/test functioning (DIF/DTF) are routine procedures to detect item/test unfairness as an explanation for group performance difference. However, unequal sample sizes and small sample sizes have an impact on the statistical power of the DIF/DTF detection procedures. Furthermore, DIF/DTF cannot be used for two test forms without…
Differential Item Functioning Detection Across Two Methods of Defining Group Comparisons
Sari, Halil Ibrahim
2014-01-01
This study compares two methods of defining groups for the detection of differential item functioning (DIF): (a) pairwise comparisons and (b) composite group comparisons. We aim to emphasize and empirically support the notion that the choice of pairwise versus composite group definitions in DIF is a reflection of how one defines fairness in DIF studies. In this study, a simulation was conducted based on data from a 60-item ACT Mathematics test (ACT; Hanson & Béguin). The unsigned area measure method (Raju) was used as the DIF detection method. An application to operational data was also completed in the study, as well as a comparison of observed Type I error rates and false discovery rates across the two methods of defining groups. Results indicate that the amount of flagged DIF or interpretations about DIF in all conditions were not the same across the two methods, and there may be some benefits to using composite group approaches. The results are discussed in connection to differing definitions of fairness. Recommendations for practice are made. PMID:29795837
Paz, Sylvia H; Spritzer, Karen L; Reise, Steven P; Hays, Ron D
2017-06-01
About 70% of Latinos, 5 years old or older, in the United States speak Spanish at home. Measurement equivalence of the PROMIS ® pain interference (PI) item bank by language of administration (English versus Spanish) has not been evaluated. A sample of 527 adult Spanish-speaking Latinos completed the Spanish version of the 41-item PROMIS ® pain interference item bank. We evaluate dimensionality, monotonicity and local independence of the Spanish-language items. Then we evaluate differential item functioning (DIF) using ordinal logistic regression with item response theory scores estimated from DIF-free "anchor" items. One of the 41 items in the Spanish version of the PROMIS ® PI item bank was identified as having significant uniform DIF. English- and Spanish-speaking subjects with the same level of pain interference responded differently to 1 of the 41 items in the PROMIS ® PI item bank. This item was not retained due to proprietary issues. The original English language item parameters can be used when estimating PROMIS ® PI scores.
Exploring Crossing Differential Item Functioning by Gender in Mathematics Assessment
ERIC Educational Resources Information Center
Ong, Yoke Mooi; Williams, Julian; Lamprianou, Iasonas
2015-01-01
The purpose of this article is to explore crossing differential item functioning (DIF) in a test drawn from a national examination of mathematics for 11-year-old pupils in England. An empirical dataset was analyzed to explore DIF by gender in a mathematics assessment. A two-step process involving the logistic regression (LR) procedure for…
Wang, Jing-Jing; Chen, Tzu-An; Baranowski, Tom; Lau, Patrick W C
2017-09-16
This study aimed to evaluate the psychometric properties of four self-efficacy scales (i.e., self-efficacy for fruit (FSE), vegetable (VSE), and water (WSE) intakes, and physical activity (PASE)) and to investigate their differences in item functioning across sex, age, and body weight status groups using item response modeling (IRM) and differential item functioning (DIF). Four self-efficacy scales were administrated to 763 Hong Kong Chinese children (55.2% boys) aged 8-13 years. Classical test theory (CTT) was used to examine the reliability and factorial validity of scales. IRM was conducted and DIF analyses were performed to assess the characteristics of item parameter estimates on the basis of children's sex, age and body weight status. All self-efficacy scales demonstrated adequate to excellent internal consistency reliability (Cronbach's α: 0.79-0.91). One FSE misfit item and one PASE misfit item were detected. Small DIF were found for all the scale items across children's age groups. Items with medium to large DIF were detected in different sex and body weight status groups, which will require modification. A Wright map revealed that items covered the range of the distribution of participants' self-efficacy for each scale except VSE. Several self-efficacy scales' items functioned differently by children's sex and body weight status. Additional research is required to modify the four self-efficacy scales to minimize these moderating influences for application.
ERIC Educational Resources Information Center
Zwick, Rebecca
2012-01-01
Differential item functioning (DIF) analysis is a key component in the evaluation of the fairness and validity of educational tests. The goal of this project was to review the status of ETS DIF analysis procedures, focusing on three aspects: (a) the nature and stringency of the statistical rules used to flag items, (b) the minimum sample size…
Fieo, Robert; Ocepek-Welikson, Katja; Kleinman, Marjorie; Eimicke, Joseph P.; Crane, Paul K.; Cella, David; Teresi, Jeanne A.
2017-01-01
Aims The goals of these analyses were to examine the psychometric properties and measurement equivalence of a self-reported cognition measure, the Patient Reported Outcome Measurement Information System® (PROMIS®) Applied Cognition – General Concerns short form. These items are also found in the PROMIS Cognitive Function (version 2) item bank. This scale consists of eight items related to subjective cognitive concerns. Differential item functioning (DIF) analyses of gender, education, race, age, and (Spanish) language were performed using an ethnically diverse sample (n = 5,477) of individuals with cancer. This is the first analysis examining DIF in this item set across ethnic and racial groups. Methods DIF hypotheses were derived by asking content experts to indicate whether they posited DIF for each item and to specify the direction. The principal DIF analytic model was item response theory (IRT) using the graded response model for polytomous data, with accompanying Wald tests and measures of magnitude. Sensitivity analyses were conducted using ordinal logistic regression (OLR) with a latent conditioning variable. IRT-based reliability, precision and information indices were estimated. Results DIF was identified consistently only for the item, brain not working as well as usual. After correction for multiple comparisons, this item showed significant DIF for both the primary and sensitivity analyses. Black respondents and Hispanics in comparison to White non-Hispanic respondents evidenced a lower conditional probability of endorsing the item, brain not working as well as usual. The same pattern was observed for the education grouping variable: as compared to those with a graduate degree, conditioning on overall level of subjective cognitive concerns, those with less than high school education also had a lower probability of endorsing this item. DIF was also observed for age for two items after correction for multiple comparisons for both the IRT and OLR-based models: “I have had to work really hard to pay attention or I would make a mistake” and “I have had trouble shifting back and forth between different activities that require thinking”. For both items, conditional on cognitive complaints, older respondents had a higher likelihood than younger respondents of endorsing the item in the cognitive complaints direction. The magnitude and impact of DIF was minimal. The scale showed high precision along much of the subjective cognitive concerns continuum; the overall IRT-based reliability estimate for the total sample was 0.88 and the estimates for subgroups ranged from 0.87 to 0.92. Conclusion Little DIF of high magnitude or impact was observed in the PROMIS Applied Cognition – General Concerns short form item set. One item, “It has seemed like my brain was not working as well as usual” might be singled out for further study. However, in general the short form item set was highly reliable, informative, and invariant across differing race/ethnic, educational, age, gender, and language groups. PMID:28523238
Fieo, Robert; Ocepek-Welikson, Katja; Kleinman, Marjorie; Eimicke, Joseph P; Crane, Paul K; Cella, David; Teresi, Jeanne A
2016-01-01
The goals of these analyses were to examine the psychometric properties and measurement equivalence of a self-reported cognition measure, the Patient Reported Outcome Measurement Information System ® (PROMIS ® ) Applied Cognition - General Concerns short form. These items are also found in the PROMIS Cognitive Function (version 2) item bank. This scale consists of eight items related to subjective cognitive concerns. Differential item functioning (DIF) analyses of gender, education, race, age, and (Spanish) language were performed using an ethnically diverse sample ( n = 5,477) of individuals with cancer. This is the first analysis examining DIF in this item set across ethnic and racial groups. DIF hypotheses were derived by asking content experts to indicate whether they posited DIF for each item and to specify the direction. The principal DIF analytic model was item response theory (IRT) using the graded response model for polytomous data, with accompanying Wald tests and measures of magnitude. Sensitivity analyses were conducted using ordinal logistic regression (OLR) with a latent conditioning variable. IRT-based reliability, precision and information indices were estimated. DIF was identified consistently only for the item, brain not working as well as usual. After correction for multiple comparisons, this item showed significant DIF for both the primary and sensitivity analyses. Black respondents and Hispanics in comparison to White non-Hispanic respondents evidenced a lower conditional probability of endorsing the item, brain not working as well as usual. The same pattern was observed for the education grouping variable: as compared to those with a graduate degree, conditioning on overall level of subjective cognitive concerns, those with less than high school education also had a lower probability of endorsing this item. DIF was also observed for age for two items after correction for multiple comparisons for both the IRT and OLR-based models: "I have had to work really hard to pay attention or I would make a mistake" and "I have had trouble shifting back and forth between different activities that require thinking". For both items, conditional on cognitive complaints, older respondents had a higher likelihood than younger respondents of endorsing the item in the cognitive complaints direction. The magnitude and impact of DIF was minimal. The scale showed high precision along much of the subjective cognitive concerns continuum; the overall IRT-based reliability estimate for the total sample was 0.88 and the estimates for subgroups ranged from 0.87 to 0.92. Little DIF of high magnitude or impact was observed in the PROMIS Applied Cognition - General Concerns short form item set. One item, "It has seemed like my brain was not working as well as usual" might be singled out for further study. However, in general the short form item set was highly reliable, informative, and invariant across differing race/ethnic, educational, age, gender, and language groups.
ERIC Educational Resources Information Center
Carvajal, Jorge; Skorupski, William P.
2010-01-01
This study is an evaluation of the behavior of the Liu-Agresti estimator of the cumulative common odds ratio when identifying differential item functioning (DIF) with polytomously scored test items using small samples. The Liu-Agresti estimator has been proposed by Penfield and Algina as a promising approach for the study of polytomous DIF but no…
IRT-LR-DIF with Estimation of the Focal-Group Density as an Empirical Histogram
ERIC Educational Resources Information Center
Woods, Carol M.
2008-01-01
Item response theory-likelihood ratio-differential item functioning (IRT-LR-DIF) is used to evaluate the degree to which items on a test or questionnaire have different measurement properties for one group of people versus another, irrespective of group-mean differences on the construct. Usually, the latent distribution is presumed normal for both…
Examining the Effectiveness of Test Accommodation Using DIF and a Mixture IRT Model
ERIC Educational Resources Information Center
Cho, Hyun-Jeong; Lee, Jaehoon; Kingston, Neal
2012-01-01
This study examined the validity of test accommodation in third-eighth graders using differential item functioning (DIF) and mixture IRT models. Two data sets were used for these analyses. With the first data set (N = 51,591) we examined whether item type (i.e., story, explanation, straightforward) or item features were associated with item…
Kwakkenbos, Linda; Willems, Linda M; Baron, Murray; Hudson, Marie; Cella, David; van den Ende, Cornelia H M; Thombs, Brett D
2014-01-01
The Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-F) is commonly used to assess fatigue in rheumatic diseases, and has shown to discriminate better across levels of the fatigue spectrum than other commonly used measures. The aim of this study was to assess the cross-language measurement equivalence of the English, French, and Dutch versions of the FACIT-F in systemic sclerosis (SSc) patients. The FACIT-F was completed by 871 English-speaking Canadian, 238 French-speaking Canadian and 230 Dutch SSc patients. Confirmatory factor analysis was used to assess the factor structure in the three samples. The Multiple-Indicator Multiple-Cause (MIMIC) model was utilized to assess differential item functioning (DIF), comparing English versus French and versus Dutch patient responses separately. A unidimensional factor model showed good fit in all samples. Comparing French versus English patients, statistically significant, but small-magnitude DIF was found for 3 of 13 items. French patients had 0.04 of a standard deviation (SD) lower latent fatigue scores than English patients and there was an increase of only 0.03 SD after accounting for DIF. For the Dutch versus English comparison, 4 items showed small, but statistically significant, DIF. Dutch patients had 0.20 SD lower latent fatigue scores than English patients. After correcting for DIF, there was a reduction of 0.16 SD in this difference. There was statistically significant DIF in several items, but the overall effect on fatigue scores was minimal. English, French and Dutch versions of the FACIT-F can be reasonably treated as having equivalent scoring metrics.
Kwakkenbos, Linda; Willems, Linda M.; Baron, Murray; Hudson, Marie; Cella, David; van den Ende, Cornelia H. M.; Thombs, Brett D.
2014-01-01
Objective The Functional Assessment of Chronic Illness Therapy- Fatigue (FACIT-F) is commonly used to assess fatigue in rheumatic diseases, and has shown to discriminate better across levels of the fatigue spectrum than other commonly used measures. The aim of this study was to assess the cross-language measurement equivalence of the English, French, and Dutch versions of the FACIT-F in systemic sclerosis (SSc) patients. Methods The FACIT-F was completed by 871 English-speaking Canadian, 238 French-speaking Canadian and 230 Dutch SSc patients. Confirmatory factor analysis was used to assess the factor structure in the three samples. The Multiple-Indicator Multiple-Cause (MIMIC) model was utilized to assess differential item functioning (DIF), comparing English versus French and versus Dutch patient responses separately. Results A unidimensional factor model showed good fit in all samples. Comparing French versus English patients, statistically significant, but small-magnitude DIF was found for 3 of 13 items. French patients had 0.04 of a standard deviation (SD) lower latent fatigue scores than English patients and there was an increase of only 0.03 SD after accounting for DIF. For the Dutch versus English comparison, 4 items showed small, but statistically significant, DIF. Dutch patients had 0.20 SD lower latent fatigue scores than English patients. After correcting for DIF, there was a reduction of 0.16 SD in this difference. Conclusions There was statistically significant DIF in several items, but the overall effect on fatigue scores was minimal. English, French and Dutch versions of the FACIT-F can be reasonably treated as having equivalent scoring metrics. PMID:24638101
ERIC Educational Resources Information Center
Mendes-Barnett, Sharon; Ercikan, Kadriye
2006-01-01
This study contributes to understanding sources of gender differential item functioning (DIF) on mathematics tests. This study focused on identifying sources of DIF and differential bundle functioning for boys and girls on the British Columbia Principles of Mathematics Exam (Grade 12) using a confirmatory SIBTEST approach based on a…
Deutscher, Daniel; Hart, Dennis L; Crane, Paul K; Dickstein, Ruth
2010-12-01
Comparative effectiveness research across cultures requires unbiased measures that accurately detect clinical differences between patient groups. The purpose of this study was to assess the presence and impact of differential item functioning (DIF) in knee functional status (FS) items administered using computerized adaptive testing (CAT) as a possible cause for observed differences in outcomes between 2 cultural patient groups in a polyglot society. This study was a secondary analysis of prospectively collected data. We evaluated data from 9,134 patients with knee impairments from outpatient physical therapy clinics in Israel. Items were analyzed for DIF related to sex, age, symptom acuity, surgical history, exercise history, and language used to complete the functional survey (Hebrew versus Russian). Several items exhibited DIF, but unadjusted FS estimates and FS estimates that accounted for DIF were essentially equal (intraclass correlation coefficient [2,1]>.999). No individual patient had a difference between unadjusted and adjusted FS estimates as large as the median standard error of the unadjusted estimates. Differences between groups defined by any of the covariates considered were essentially unchanged when using adjusted instead of unadjusted FS estimates. The greatest group-level impact was <0.3% of 1 standard deviation of the unadjusted FS estimates. Complete data where patients answered all items in the scale would have been preferred for DIF analysis, but only CAT data were available. Differences in FS outcomes between groups of patients with knee impairments who answered the knee CAT in Hebrew or Russian in Israel most likely reflected true differences that may reflect societal disparities in this health outcome.
Cook, Karon F; Kallen, Michael A; Bombardier, Charles; Bamer, Alyssa M; Choi, Seung W; Kim, Jiseon; Salem, Rana; Amtmann, Dagmar
2017-01-01
To evaluate whether items of three measures of depressive symptoms function differently in persons with spinal cord injury (SCI) than in persons from a primary care sample. This study was a retrospective analysis of responses to the Patient Health Questionnaire depression scale, the Center for Epidemiological Studies Depression scale, and the National Institutes of Health Patient-Reported Outcomes Measurement Information System (PROMIS ® ) version 1.0 eight-item depression short form 8b (PROMIS-D). The presence of differential item function (DIF) was evaluated using ordinal logistic regression. No items of any of the three target measures were flagged for DIF based on standard criteria. In a follow-up sensitivity analyses, the criterion was changed to make the analysis more sensitive to potential DIF. Scores were corrected for DIF flagged under this criterion. Minimal differences were found between the original scores and those corrected for DIF under the sensitivity criterion. The three depression screening measures evaluated in this study did not perform differently in samples of individuals with SCI compared to general and community samples. Transdiagnostic symptoms did not appear to spuriously inflate depression severity estimates when administered to people with SCI.
Disparities in Sense of Community: True race differences or differential item functioning?
Coffman, Donna L.; BeLue, Rhonda
2009-01-01
The sense of community index (SCI) has been widely used to measure psychological sense of community (SOC). Furthermore, SOC has been found to differ among racial groups. Since different ethnic groups have different cultural and historical experiences that may lead to different interpretations of measurement items, it is important to know whether the instrument used to measure the construct of interest has equivalency in measurement across groups or if the instrument exhibits differential item functioning (DIF). Examining DIF in the SCI helps assure that subgroup comparisons identify true differences in SOC between Blacks and Whites. We did not find DIF between races but we did find that that the SCI question ‘I feel at home in my neighborhood’ was a more reliable measure of SOC for Whites than for Blacks. In other words, this item has less measurement error for Whites than for Blacks. Therefore, differences on the SCI may be attributable to true differences in SOC between races rather than DIF. PMID:19890462
ERIC Educational Resources Information Center
Laitusis, Cara Cahalan; Maneckshana, Behroz; Monfils, Lora; Ahlgrim-Delzell, Lynn
2009-01-01
The purpose of this study was to examine Differential Item Functioning (DIF) by disability groups on an on-demand performance assessment for students with severe cognitive impairments. Researchers examined the presence of DIF for two comparisons. One comparison involved students with severe cognitive impairments who served as the reference group…
ERIC Educational Resources Information Center
Polikoff, Morgan S.; May, Henry; Porter, Andrew C.; Elliott, Stephen N.; Goldring, Ellen; Murphy, Joseph
2009-01-01
The Vanderbilt Assessment of Leadership in Education is a 360-degree assessment of the effectiveness of principals' learning-centered leadership behaviors. In this report, we present results from a differential item functioning (DIF) study of the assessment. Using data from a national field trial, we searched for evidence of DIF on school level,…
ERIC Educational Resources Information Center
Sari, Halil Ibrahim; Huggins, Anne Corinne
2015-01-01
This study compares two methods of defining groups for the detection of differential item functioning (DIF): (a) pairwise comparisons and (b) composite group comparisons. We aim to emphasize and empirically support the notion that the choice of pairwise versus composite group definitions in DIF is a reflection of how one defines fairness in DIF…
Use of multilevel logistic regression to identify the causes of differential item functioning.
Balluerka, Nekane; Gorostiaga, Arantxa; Gómez-Benito, Juana; Hidalgo, María Dolores
2010-11-01
Given that a key function of tests is to serve as evaluation instruments and for decision making in the fields of psychology and education, the possibility that some of their items may show differential behaviour is a major concern for psychometricians. In recent decades, important progress has been made as regards the efficacy of techniques designed to detect this differential item functioning (DIF). However, the findings are scant when it comes to explaining its causes. The present study addresses this problem from the perspective of multilevel analysis. Starting from a case study in the area of transcultural comparisons, multilevel logistic regression is used: 1) to identify the item characteristics associated with the presence of DIF; 2) to estimate the proportion of variation in the DIF coefficients that is explained by these characteristics; and 3) to evaluate alternative explanations of the DIF by comparing the explanatory power or fit of different sequential models. The comparison of these models confirmed one of the two alternatives (familiarity with the stimulus) and rejected the other (the topic area) as being a cause of differential functioning with respect to the compared groups.
Teresi, Jeanne A; Ocepek-Welikson, Katja; Cook, Karon F; Kleinman, Marjorie; Ramirez, Mildred; Reid, M Carrington; Siu, Albert
2016-01-01
Reducing the response burden of standardized pain measures is desirable, particularly for individuals who are frail or live with chronic illness, e.g., those suffering from cancer and those in palliative care. The Patient Reported Outcome Measurement Information System ® (PROMIS ® ) project addressed this issue with the provision of computerized adaptive tests (CAT) and short form measures that can be used clinically and in research. Although there has been substantial evaluation of PROMIS item banks, little is known about the performance of PROMIS short forms, particularly in ethnically diverse groups. Reviewed in this article are findings related to the differential item functioning (DIF) and reliability of the PROMIS pain interference short forms across diverse sociodemographic groups. DIF hypotheses were generated for the PROMIS short form pain interference items. Initial analyses tested item response theory (IRT) model assumptions of unidimensionality and local independence. Dimensionality was evaluated using factor analytic methods; local dependence (LD) was tested using IRT-based LD indices. Wald tests were used to examine group differences in IRT parameters, and to test DIF hypotheses. A second DIF-detection method used in sensitivity analyses was based on ordinal logistic regression with a latent IRT-derived conditioning variable. Magnitude and impact of DIF were investigated, and reliability and item and scale information statistics were estimated. The reliability of the short form item set was excellent. However, there were a few items with high local dependency, which affected the estimation of the final discrimination parameters. As a result, the item, "How much did pain interfere with enjoyment of social activities?" was excluded in the DIF analyses for all subgroup comparisons. No items were hypothesized to show DIF for race and ethnicity; however, five items showed DIF after adjustment for multiple comparisons in both primary and sensitivity analyses: ability to concentrate, enjoyment of recreational activities, tasks away from home, participation in social activities, and socializing with others. The magnitude of DIF was small and the impact negligible. Three items were consistently identified with DIF for education: enjoyment of life, ability to concentrate, and enjoyment of recreational activities. No item showed DIF above the magnitude threshold and the impact of DIF on the overall measure was minimal. No item showed gender DIF after correction for multiple comparisons in the primary analyses. Four items showed consistent age DIF: enjoyment of life, ability to concentrate, day to day activities, and enjoyment of recreational activities, none with primary magnitude values above threshold. Conditional on the pain state, Spanish speakers were hypothesized to report less pain interference on one item, enjoyment of life. The DIF findings confirmed the hypothesis; however, the magnitude was small. Using an arbitrary cutoff point of theta ( θ ) ≥ 1.0 to classify respondents with acute pain interference, the highest number of changes were for the education groups analyses. There were 231 respondents (4% of the total sample) who changed from the designation of no acute pain interference to acute interference after the DIF adjustment. There was no change in the designations for race/ethnic subgroups, and a small number of changes for respondents aged 65 to 84. Although significant DIF was observed after correction for multiple comparisons, all DIF was of low magnitude and impact. However, some individual-level impact was observed for low education groups. Reliability estimates were high. Thus, the PROMIS short form pain items examined in this ethnically diverse sample performed relatively well; although one item was problematic and removed from the analyses. It is concluded that the majority of the PROMIS pain interference short form items can be recommended for use among ethnically diverse groups, including those in palliative care and with cancer and chronic illness.
Teresi, Jeanne A.; Ocepek-Welikson, Katja; Cook, Karon F.; Kleinman, Marjorie; Ramirez, Mildred; Reid, M. Carrington; Siu, Albert
2017-01-01
Reducing the response burden of standardized pain measures is desirable, particularly for individuals who are frail or live with chronic illness, e.g., those suffering from cancer and those in palliative care. The Patient Reported Outcome Measurement Information System® (PROMIS®) project addressed this issue with the provision of computerized adaptive tests (CAT) and short form measures that can be used clinically and in research. Although there has been substantial evaluation of PROMIS item banks, little is known about the performance of PROMIS short forms, particularly in ethnically diverse groups. Reviewed in this article are findings related to the differential item functioning (DIF) and reliability of the PROMIS pain interference short forms across diverse sociodemographic groups. Methods DIF hypotheses were generated for the PROMIS short form pain interference items. Initial analyses tested item response theory (IRT) model assumptions of unidimensionality and local independence. Dimensionality was evaluated using factor analytic methods; local dependence (LD) was tested using IRT-based LD indices. Wald tests were used to examine group differences in IRT parameters, and to test DIF hypotheses. A second DIF-detection method used in sensitivity analyses was based on ordinal logistic regression with a latent IRT-derived conditioning variable. Magnitude and impact of DIF were investigated, and reliability and item and scale information statistics were estimated. Results The reliability of the short form item set was excellent. However, there were a few items with high local dependency, which affected the estimation of the final discrimination parameters. As a result, the item, “How much did pain interfere with enjoyment of social activities?” was excluded in the DIF analyses for all subgroup comparisons. No items were hypothesized to show DIF for race and ethnicity; however, five items showed DIF after adjustment for multiple comparisons in both primary and sensitivity analyses: ability to concentrate, enjoyment of recreational activities, tasks away from home, participation in social activities, and socializing with others. The magnitude of DIF was small and the impact negligible. Three items were consistently identified with DIF for education: enjoyment of life, ability to concentrate, and enjoyment of recreational activities. No item showed DIF above the magnitude threshold and the impact of DIF on the overall measure was minimal. No item showed gender DIF after correction for multiple comparisons in the primary analyses. Four items showed consistent age DIF: enjoyment of life, ability to concentrate, day to day activities, and enjoyment of recreational activities, none with primary magnitude values above threshold. Conditional on the pain state, Spanish speakers were hypothesized to report less pain interference on one item, enjoyment of life. The DIF findings confirmed the hypothesis; however, the magnitude was small. Using an arbitrary cutoff point of theta (θ) ≥ 1.0 to classify respondents with acute pain interference, the highest number of changes were for the education groups analyses. There were 231 respondents (4% of the total sample) who changed from the designation of no acute pain interference to acute interference after the DIF adjustment. There was no change in the designations for race/ethnic subgroups, and a small number of changes for respondents aged 65 to 84. Conclusions Although significant DIF was observed after correction for multiple comparisons, all DIF was of low magnitude and impact. However, some individual-level impact was observed for low education groups. Reliability estimates were high. Thus, the PROMIS short form pain items examined in this ethnically diverse sample performed relatively well; although one item was problematic and removed from the analyses. It is concluded that the majority of the PROMIS pain interference short form items can be recommended for use among ethnically diverse groups, including those in palliative care and with cancer and chronic illness. PMID:28983449
Paz, Sylvia H; Spritzer, Karen L; Morales, Leo S; Hays, Ron D
2013-03-29
To evaluate the equivalence of the PROMIS® wave 1 physical functioning item bank, by age (50 years or older versus 18-49). A total of 114 physical functioning items with 5 response choices were administered to English- (n=1504) and Spanish-language (n=640) adults. Item frequencies, means and standard deviations, item-scale correlations, and internal consistency reliability were estimated. Differential Item Functioning (DIF) by age was evaluated. Thirty of the 114 items were fagged for DIF based on an R-squared of 0.02 or above criterion. The expected total score was higher for those respondents who were 18-49 than those who were 50 or older. Those who were 50 years or older versus 18-49 years old with the same level of physical functioning responded differently to 30 of the 114 items in the PROMIS® physical functioning item bank. This study yields essential information about the equivalence of the physical functioning items in older versus younger individuals.
Teresi, Jeanne A.; Ramirez, Mildred; Lai, Jin-Shei; Silver, Stephanie
2009-01-01
Examination of the equivalence of measures involves several levels, including conceptual equivalence of meaning, as well as quantitative tests of differential item functioning (DIF). The purpose of this review is to examine DIF in patient-reported outcomes. Reviewed were measures of self-reported depression, quality of life (QoL) and general health. Most measures of depression contained large amounts of DIF, and the impact of DIF at the scale level was typically sizeable. The studies of QoL and health measures identified a moderate amount of DIF; however, many of these studies examined only one type of DIF (uniform). Relative to DIF analyses of depression measures, less analysis of the impact of DIF on QoL and health measures was performed, and the authors of these analyses generally did not recommend remedial action, with one notable exception. While these studies represent good beginning efforts to examine measurement equivalence in patient-reported outcome measures, more cross-validation work is required using other (often larger) samples of different ethnic and language groups, as well as other methods that permit more extensive analyses of the type of DIF, together with magnitude and impact. PMID:20165561
ERIC Educational Resources Information Center
Finch, Holmes
2011-01-01
Methods of uniform differential item functioning (DIF) detection have been extensively studied in the complete data case. However, less work has been done examining the performance of these methods when missing item responses are present. Research that has been done in this regard appears to indicate that treating missing item responses as…
The Mediated MIMIC Model for Understanding the Underlying Mechanism of DIF.
Cheng, Ying; Shao, Can; Lathrop, Quinn N
2016-02-01
Due to its flexibility, the multiple-indicator, multiple-causes (MIMIC) model has become an increasingly popular method for the detection of differential item functioning (DIF). In this article, we propose the mediated MIMIC model method to uncover the underlying mechanism of DIF. This method extends the usual MIMIC model by including one variable or multiple variables that may completely or partially mediate the DIF effect. If complete mediation effect is found, the DIF effect is fully accounted for. Through our simulation study, we find that the mediated MIMIC model is very successful in detecting the mediation effect that completely or partially accounts for DIF, while keeping the Type I error rate well controlled for both balanced and unbalanced sample sizes between focal and reference groups. Because it is successful in detecting such mediation effects, the mediated MIMIC model may help explain DIF and give guidance in the revision of a DIF item.
The Mediated MIMIC Model for Understanding the Underlying Mechanism of DIF
Cheng, Ying; Shao, Can; Lathrop, Quinn N.
2015-01-01
Due to its flexibility, the multiple-indicator, multiple-causes (MIMIC) model has become an increasingly popular method for the detection of differential item functioning (DIF). In this article, we propose the mediated MIMIC model method to uncover the underlying mechanism of DIF. This method extends the usual MIMIC model by including one variable or multiple variables that may completely or partially mediate the DIF effect. If complete mediation effect is found, the DIF effect is fully accounted for. Through our simulation study, we find that the mediated MIMIC model is very successful in detecting the mediation effect that completely or partially accounts for DIF, while keeping the Type I error rate well controlled for both balanced and unbalanced sample sizes between focal and reference groups. Because it is successful in detecting such mediation effects, the mediated MIMIC model may help explain DIF and give guidance in the revision of a DIF item.
Differential item functioning of the Geriatric Depression Scale in an Asian population.
Broekman, B F P; Nyunt, S Z; Niti, M; Jin, A Z; Ko, S M; Kumar, R; Fones, C S L; Ng, T P
2008-06-01
The Geriatric Depression Scale (GDS) is widely used for screening and assessment of major depressive disorder (MDD). Screening scales are often culture-specific and should be evaluated for item response bias (synonymously differential item functioning, DIF) before use in clinical practice and research in a different population. In this study, we examined DIF associated with age, gender, ethnicity and chronic illness in a heterogeneous Asian population in Singapore. The GDS-15 and Structured Clinical Interview for DSM-IV diagnosis of MDD were independently administered by interviewers on 4253 non-institutionalized community living elderly subjects aged 60 years and above who were users of social service agencies. Multiple Indicator Multiple Cause latent variable modelling was used to identify DIF. We found evidence of significant DIF associated with age, gender, ethnicity and chronic illness for 8 items: dropped many activities and interests, afraid something bad is going to happen, prefer staying home to going out, more problems with memory than most, think it is (not) wonderful to be alive, feel pretty worthless, feel (not) full of energy, feel that situation is hopeless. The smaller number of minority Indian and Malay subjects and the self-report of chronic medical illnesses. In a heterogeneous mix of respondents in Singapore, eight items of the GDS-15 showed DIF for age, gender, ethnicity and chronic illness. The awareness and identification of DIF in the GDS-15 provides a rational basis for its use in diverse population groups and guiding the derivation of abbreviated scales.
Church, A Timothy; Alvarez, Juan M; Mai, Nhu T Q; French, Brian F; Katigbak, Marcia S; Ortiz, Fernando A
2011-11-01
Measurement invariance is a prerequisite for confident cross-cultural comparisons of personality profiles. Multigroup confirmatory factor analysis was used to detect differential item functioning (DIF) in factor loadings and intercepts for the Revised NEO Personality Inventory (P. T. Costa, Jr., & R. R. McCrae, 1992) in comparisons of college students in the United States (N = 261), Philippines (N = 268), and Mexico (N = 775). About 40%-50% of the items exhibited some form of DIF and item-level noninvariance often carried forward to the facet level at which scores are compared. After excluding DIF items, some facet scales were too short or unreliable for cross-cultural comparisons, and for some other facets, cultural mean differences were reduced or eliminated. The results indicate that considerable caution is warranted in cross-cultural comparisons of personality profiles.
An Empirical Bayes Approach to Mantel-Haenszel DIF Analysis.
ERIC Educational Resources Information Center
Zwick, Rebecca; Thayer, Dorothy T.; Lewis, Charles
1999-01-01
Developed an empirical Bayes enhancement to Mantel-Haenszel (MH) analysis of differential item functioning (DIF) in which it is assumed that the MH statistics are normally distributed and that the prior distribution of underlying DIF parameters is also normal. (Author/SLD)
ERIC Educational Resources Information Center
Paek, Insu; Wilson, Mark
2011-01-01
This study elaborates the Rasch differential item functioning (DIF) model formulation under the marginal maximum likelihood estimation context. Also, the Rasch DIF model performance was examined and compared with the Mantel-Haenszel (MH) procedure in small sample and short test length conditions through simulations. The theoretically known…
Jafari, Peyman; Bagheri, Zahra; Hashemi, Seyyedeh Zahra; Shalileh, Keivan
2013-06-06
Limited studies have examined the effect of differential item functioning (DIF) on comparing health related quality of life (HRQoL) scores across child self-reports and parent proxy-reports. This study aims to determine whether parents and children respond differently to the items in the Persian version of the PedsQoLTM 4.0 measure. The PedsQLTM 4.0 Generic Core Scales was completed by 938 child-parent dyads. The graded response model (GRM) was used to detect DIF between parents and children. The IRT analyses were conducted using IRTPRO 2.1.On the whole, our findings showed that 50% (4 out of 8) of the items in the physical subscale and 40% (2 out of 5) in both emotional and school subscales were flagged with DIF. Among the DIF items, 62.5% (5 out of 8) were uniform and the remaining 37.5% (3 out of 8) were non-uniform. Parents and children interpret certain items of the PedsQLTM 4.0 in a different ways, except for the social subscale. Hence, we should be cautious about using parent proxy-report as a substitute for a child's ratings.
Jafari, Peyman; Allahyari, Elahe; Salarzadeh, Mina; Bagheri, Zahra
2016-01-01
Child obesity has become a major health concern worldwide. In order to provide successful intervention strategies, it is necessary to understand how obese-overweight children and their parents perceive obesity and its consequences on child's health-related quality of life (HRQoL). This study aimed to assess measurement equivalence of the PedsQL™ 4.0 across obese-overweight children and their parents. The items in the PedsQL™ 4.0 were analysed for differential item functioning (DIF) across obese-overweight children and their parents using an iterative hybrid ordinal logistic regression/item response theory approach. The sample included 647 overweight-obese children and their parents, who completed child and parent reports of the PedsQL™ 4.0, respectively. Overall, 17 out of 23 (74%) items were flagged with DIF across two groups: eight items exhibited uniform DIF and nine items non-uniform DIF. In addition, parents of obese children rated the child's HRQoL significantly lower than their children in all domains of the PedsQL™ 4.0, and this finding did not change whether or not items with uniform DIF were included. Although obese-overweight children and their parents interpret items of the PedsQL™ 4.0 in a conceptually different manner, removing or retaining DIF items in the subscales had no significant effects on group differences. Accordingly, it appears that observed differences in HRQoL scores across child and parent reports are a true difference and not a reflection of measurement artefact.
Stevanovic, Dejan; Jafari, Peyman
2015-01-01
The KIDSCREEN questionnaire for health-related quality of life (HRQOL) assessments in children and adolescents was simultaneously developed across 13 European countries, and it was subsequently translated and culturally adapted to over 30 different languages across the world. The aim of this study was to evaluate the measurement equivalence of the KIDSCREEN-27 across Serbian and Iranian children and adolescents. The items in the KIDSCREEN-27 were analyzed for differential item functioning (DIF) across Iranian and Serbian populations using ordinal logistic regression with three different criteria. The sample included 330 Iranian and 329 Serbian children and adolescents and 330 and 314 of their parents, respectively. Across the two samples, DIF was detected in 16 (59 %) of 27 items in the child self-reports and in 20 (74 %) of 27 items in the parent/proxy report. However, using alternative criteria based on magnitude detected for DIF, only three items in the parent/proxy report showed significant DIF. Our study provided more evidence that the KIDSCREEN-27 possesses DIF items across different cultures, but their impact is probably small, and the questionnaire could be used for cross-cultural HRQOL comparisons.
Validation of a mobility item bank for older patients in primary care.
Cabrero-García, Julio; Ramos-Pichardo, Juan Diego; Muñoz-Mendoza, Carmen Luz; Cabañero-Martínez, María José; González-Llopis, Lorena; Reig-Ferrer, Abilio
2012-12-05
To develop and validate an item bank to measure mobility in older people in primary care and to analyse differential item functioning (DIF) and differential bundle functioning (DBF) by sex. A pool of 48 mobility items was administered by interview to 593 older people attending primary health care practices. The pool contained four domains based on the International Classification of Functioning: changing and maintaining body position, carrying, lifting and pushing, walking and going up and down stairs. The Late Life Mobility item bank consisted of 35 items, and measured with a reliability of 0.90 or more across the full spectrum of mobility, except at the higher end of better functioning. No evidence was found of non-uniform DIF but uniform DIF was observed, mainly for items in the changing and maintaining body position and carrying, lifting and pushing domains. The walking domain did not display DBF, but the other three domains did, principally the carrying, lifting and pushing items. During the design and validation of an item bank to measure mobility in older people, we found that strength (carrying, lifting and pushing) items formed a secondary dimension that produced DBF. More research is needed to determine how best to include strength items in a mobility measure, or whether it would be more appropriate to design separate measures for each construct.
Comparison of IRT Likelihood Ratio Test and Logistic Regression DIF Detection Procedures
ERIC Educational Resources Information Center
Atar, Burcu; Kamata, Akihito
2011-01-01
The Type I error rates and the power of IRT likelihood ratio test and cumulative logit ordinal logistic regression procedures in detecting differential item functioning (DIF) for polytomously scored items were investigated in this Monte Carlo simulation study. For this purpose, 54 simulation conditions (combinations of 3 sample sizes, 2 sample…
Dimensionality and DIF in a Licensure Examination.
ERIC Educational Resources Information Center
Sykes, Robert C.; And Others
The sources of multidimensionality found in several different forms of a licensure examination were studied. The relationship between one source of multidimensionality, differential item functioning (DIF) (or factors producing DIF), and content characteristics was explored in an attempt to isolate aspects of training or curriculum that could…
Kasper, Judith D.; Brandt, Jason; Pezzin, Liliana E.
2012-01-01
Objective. To examine the measurement equivalence of items on disability across three international surveys of aging. Method. Data for persons aged 65 and older were drawn from the Health and Retirement Survey (HRS, n = 10,905), English Longitudinal Study of Aging (ELSA, n = 5,437), and Survey of Health, Ageing and Retirement in Europe (SHARE, n = 13,408). Differential item functioning (DIF) was assessed using item response theory (IRT) methods for activities of daily living (ADL) and instrumental activities of daily living (IADL) items. Results. HRS and SHARE exhibited measurement equivalence, but 6 of 11 items in ELSA demonstrated meaningful DIF. At the scale level, this item-level DIF affected scores reflecting greater disability. IRT methods also spread out score distributions and shifted scores higher (toward greater disability). Results for mean disability differences by demographic characteristics, using original and DIF-adjusted scores, were the same overall but differed for some subgroup comparisons involving ELSA. Discussion. Testing and adjusting for DIF is one means of minimizing measurement error in cross-national survey comparisons. IRT methods were used to evaluate potential measurement bias in disability comparisons across three international surveys of aging. The analysis also suggested DIF was mitigated for scales including both ADL and IADL and that summary indexes (counts of limitations) likely underestimate mean disability in these international populations. PMID:22156662
Haroz, E E; Bolton, P; Gross, A; Chan, K S; Michalopoulos, L; Bass, J
2016-07-01
Prevalence estimates of depression vary between countries, possibly due to differential functioning of items between settings. This study compared the performance of the widely used Hopkins symptom checklist 15-item depression scale (HSCL-15) across multiple settings using item response theory analyses. Data came from adult populations in the low and middle income countries (LMIC) of Colombia, Indonesia, Kurdistan Iraq, Rwanda, Iraq, Thailand (Burmese refugees), and Uganda (N = 4732). Item parameters based on a graded response model were compared across LMIC settings. Differential item functioning (DIF) by setting was evaluated using multiple indicators multiple causes (MIMIC) models. Most items performed well across settings except items related to suicidal ideation and "loss of sexual interest or pleasure," which had low discrimination parameters (suicide: a = 0.31 in Thailand to a = 2.49 in Indonesia; sexual interest: a = 0.74 in Rwanda to a = 1.26 in one region of Kurdistan). Most items showed some degree of DIF, but DIF only impacted aggregate scale-level scores in Indonesia. Thirteen of the 15 HSCL depression items performed well across diverse settings, with most items showing a strong relationship to the underlying trait of depression. The results support the cross-cultural applicability of most of these depression symptoms across LMIC settings. DIF impacted aggregate depression scores in one setting illustrating a possible source of measurement invariance in prevalence estimates.
Malec, James F; Kean, Jacob; Altman, Irwin M; Swick, Shannon
2012-12-01
(1) To evaluate the measurement reliability and construct validity of the Mayo-Portland Adaptability Inventory, 4th revision (MPAI-4) in a sample consisting exclusively of patients with cerebrovascular accident (CVA) using single parameter (Rasch) item-response methods; (2) to examine the differential item functioning (DIF) by sex within the CVA population; and (3) to examine DIF and differential test functioning (DTF) across traumatic brain injury (TBI) and CVA samples. Retrospective psychometric analysis of rating scale data. Home- and community-based brain injury rehabilitation program. Individuals post-CVA (n=861) and individuals with TBI (n=603). Not applicable. MPAI-4. Item data on admission to community-based rehabilitation were submitted to Rasch, DIF, and DTF analyses. The final calibration in the CVA sample revealed satisfactory reliability/separation for persons (.91/3.16) and items (1.00/23.64). DIF showed that items for pain, anger, audition, and memory were associated with higher levels of disability for CVA than TBI patients; whereas, self-care, mobility, and use of hands indicated greater overall disability for TBI patients. DTF analyses showed a high degree of association between the 2 sets of items (R=.92; R(2)=.85) and, at most, a 3.7 point difference in raw scores. The MPAI-4 demonstrates satisfactory psychometric properties for use with individuals with CVA applying for interdisciplinary posthospital rehabilitation. DIF reveals clinically meaningful differences between CVA and TBI groups that should be considered in results at the item and subscale level. Copyright © 2012 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Wang, Wen-Chung; Su, Ya-Hui
2004-01-01
In this study we investigated the effects of the average signed area (ASA) between the item characteristic curves of the reference and focal groups and three test purification procedures on the uniform differential item functioning (DIF) detection via the Mantel-Haenszel (M-H) method through Monte Carlo simulations. The results showed that ASA,…
Khan, Anzalee; Liharska, Lora; Harvey, Philip; Atkins, Alexandra; Keefe, Richard; Ulshen, Danny
2018-01-01
Abstract Background Recognizing the discrete dimensions that underlie negative symptoms in schizophrenia and how these dimensions are conceptualized across geographical regions may result in better understanding and treatment. The expressive-experiential distinction has been shown to have vast importance in relation to functional outcomes in schizophrenia. Previous studies have shown that the PANSS may not be equivalently rated across counties and cultures, suggesting regional differences in both symptom expression and rater judgment of symptom severity. Items that perform in markedly different ways across demographic, regional, cultural, or clinical severity characteristics may not offer valid representations of the target construct. 1) Will the expressive and experiential dimensions of the PANSS vary over 15 geographical regions and will the item ratings defining each dimension manifest similar reliability across these regions? 2) In large multi-center, international trials where data are combined, which of the two dimensions are disposed to social, linguistic and cultural inconsistency? Methods Data was obtained for the baseline PANSS visits of 6,889 subjects. Using Confirmatory Factor Analysis (CFA), we examined whether the expressive-experiential distinction would be replicated in our sample. We investigated the validity of the expressive-experiential distinction using Differential Item Functioning (DIF; Mantel-Haenszel) across 15 geographical regions – South America-Mexico, Austria-Germany, Belgium-Netherlands, Brazil, Canada, Nordic regions (Denmark, Finland, Norway, Sweden), France, Great Britain, India, Italy, Poland, Eastern Europe (Romania, Slovakia, Ukraine, Croatia, Estonia, Czech Republic), Russia, South Africa, and Spain - as compared to the United States. Results Expressive Deficit: More DIF was observed for items in the Expressive deficit factor than for items relating to experiential deficits. The following regions showed at least moderate to large DIF for all items: Austria-Germany, Nordic, France, and Poland. Of all the items, N3 Poor Rapport showed the most moderate and large DIF (n = 13; 86.67%) across countries, with 7 countries reporting large DIF. Similarly, N6 Lack of Spontaneity and Flow of Conversation showed moderate and large DIF for 66.67% countries (n=10). Experiential Deficit: Item G16 Active Social Avoidance reported negligible DIF for 14 of the 15 countries investigated (93.33%). Large DIF was observed for N2 Emotional Withdrawal and N4 Passive Apathetic Social Withdrawal for Brazil and India. Seven regions demonstrated no DIF across all items of the PANSS experiential deficit factor (South America-Mexico, Belgium-Netherlands, Nordic, Great Britain, Eastern Europe, Russia, and Spain). Overall, there were many fewer observed items with large DIF for PANSS experiential domain. Discussion These results suggest that the PANSS Negative Symptoms Factor can be better represented by a two-factor model than by a single-factor model. Additionally, the results show significant differences in ratings on the PANSS expressive items, but not the experiential items, across regions. This could be due to a lack of equivalence between the original and translated versions, cultural differences in the interpretation of items, rater training, or understanding of scoring anchors. Knowing which items are challenging for raters across regions can help guide PANSS training to improve results of international clinical trials aimed at negative symptoms.
Testing for DIF in a Model with Single Peaked Item Characteristic Curves: The PARELLA Model.
ERIC Educational Resources Information Center
Hoijtink, Herbert; Molenaar, Ivo W.
1992-01-01
The PARallELogram Analysis (PARELLA) model is a probabilistic parallelogram model that can be used for the measurement of latent attitudes or latent preferences. A method is presented for testing for differential item functioning (DIF) for the PARELLA model using the approach of D. Thissen and others (1988). (SLD)
Small-Sample DIF Estimation Using SIBTEST, Cochran's Z, and Log-Linear Smoothing
ERIC Educational Resources Information Center
Lei, Pui-Wa; Li, Hongli
2013-01-01
Minimum sample sizes of about 200 to 250 per group are often recommended for differential item functioning (DIF) analyses. However, there are times when sample sizes for one or both groups of interest are smaller than 200 due to practical constraints. This study attempts to examine the performance of Simultaneous Item Bias Test (SIBTEST),…
A Comparison of Uniform DIF Effect Size Estimators under the MIMIC and Rasch Models
ERIC Educational Resources Information Center
Jin, Ying; Myers, Nicholas D.; Ahn, Soyeon; Penfield, Randall D.
2013-01-01
The Rasch model, a member of a larger group of models within item response theory, is widely used in empirical studies. Detection of uniform differential item functioning (DIF) within the Rasch model typically employs null hypothesis testing with a concomitant consideration of effect size (e.g., signed area [SA]). Parametric equivalence between…
ERIC Educational Resources Information Center
Lee, Soo; Suh, Youngsuk
2018-01-01
Lord's Wald test for differential item functioning (DIF) has not been studied extensively in the context of the multidimensional item response theory (MIRT) framework. In this article, Lord's Wald test was implemented using two estimation approaches, marginal maximum likelihood estimation and Bayesian Markov chain Monte Carlo estimation, to detect…
Examining the Measurement Precision and Invariance of the Revised Get Ready to Read!
Farrington, Amber L.; Lonigan, Christopher J.
2016-01-01
Children's emergent literacy skills are highly predictive of later reading abilities. To determine which children have weaker emergent literacy skills and are in need of intervention, it is necessary to assess emergent literacy skills accurately and reliably. In this study, 1,351 children were administered the Revised Get Ready to Read! (GRTR-R), and an item response theory analysis was used to evaluate the item-level reliability of the measure. Differential item functioning (DIF) analyses were conducted to examine whether items function similarly between subpopulations of children. The GRTR-R had acceptable reliability for children whose ability level was just below the mean. DIF for a small number of items was present for only two comparisons—children who were older versus younger and children who were White versus African American. These results demonstrate that the GRTR-R has acceptable reliability and limited DIF, enabling the screener to identify those at risk for developing reading problems. PMID:23851136
Chan, Kitty S; Gross, Alden L; Pezzin, Liliana E; Brandt, Jason; Kasper, Judith D
2015-12-01
To harmonize measures of cognitive performance using item response theory (IRT) across two international aging studies. Data for persons ≥65 years from the Health and Retirement Study (HRS, N = 9,471) and the English Longitudinal Study of Aging (ELSA, N = 5,444). Cognitive performance measures varied (HRS fielded 25, ELSA 13); 9 were in common. Measurement precision was examined for IRT scores based on (a) common items, (b) common items adjusted for differential item functioning (DIF), and (c) DIF-adjusted all items. Three common items (day of date, immediate word recall, and delayed word recall) demonstrated DIF by survey. Adding survey-specific items improved precision but mainly for HRS respondents at lower cognitive levels. IRT offers a feasible strategy for harmonizing cognitive performance measures across other surveys and for other multi-item constructs of interest in studies of aging. Practical implications depend on sample distribution and the difficulty mix of in-common and survey-specific items. © The Author(s) 2015.
ERIC Educational Resources Information Center
Monahan, Patrick O.; Ankenmann, Robert D.
2010-01-01
When the matching score is either less than perfectly reliable or not a sufficient statistic for determining latent proficiency in data conforming to item response theory (IRT) models, Type I error (TIE) inflation may occur for the Mantel-Haenszel (MH) procedure or any differential item functioning (DIF) procedure that matches on summed-item…
Transforming SIBTEST to Account for Multilevel Data Structures
ERIC Educational Resources Information Center
French, Brian F.; Finch, W. Holmes
2015-01-01
SIBTEST is a differential item functioning (DIF) detection method that is accurate and effective with small samples, in the presence of group mean differences, and for assessment of both uniform and nonuniform DIF. The presence of multilevel data with DIF detection has received increased attention. Ignoring such structure can inflate Type I error.…
ERIC Educational Resources Information Center
Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza
2014-01-01
This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…
A Note on Three Statistical Tests in the Logistic Regression DIF Procedure
ERIC Educational Resources Information Center
Paek, Insu
2012-01-01
Although logistic regression became one of the well-known methods in detecting differential item functioning (DIF), its three statistical tests, the Wald, likelihood ratio (LR), and score tests, which are readily available under the maximum likelihood, do not seem to be consistently distinguished in DIF literature. This paper provides a clarifying…
Gambling-Related Cognition Scale (GRCS): Are skills-based games at a disadvantage?
Lévesque, David; Sévigny, Serge; Giroux, Isabelle; Jacques, Christian
2017-09-01
The Gambling-Related Cognition Scale (GRCS; Raylu & Oei, 2004) was developed to evaluate gambling-related cognitive distortions for all types of gamblers, regardless of their gambling activities (poker, slot machine, etc.). It is therefore imperative to ascertain the validity of its interpretation across different types of gamblers; however, some skills-related items endorsed by players could be interpreted as a cognitive distortion despite the fact that they play skills-related games. Using an intergroup (168 poker players and 73 video lottery terminal [VLT] players) differential item functioning (DIF) analysis, this study examined the possible manifestation of item biases associated with the GRCS. DIF was analyzed with ordinal logistic regressions (OLRs) and Ramsay's (1991) nonparametric kernel smoothing approach with TestGraf. Results show that half of the items display at least moderate DIF between groups and, depending on the type of analysis used, 3 to 7 items displayed large DIF. The 5 items with the most DIF were more significantly endorsed by poker players (uniform DIF) and were all related to skills, knowledge, learning, or probabilities. Poker players' interpretations of some skills-related items may lead to an overestimation of their cognitive distortions due to their total score increased by measurement artifact. Findings indicate that the current structure of the GRCS contains potential biases to be considered when poker players are surveyed. The present study conveys new and important information on bias issues to ponder carefully before using and interpreting the GRCS and other similar wide-range instruments with poker players. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
DIF Trees: Using Classification Trees to Detect Differential Item Functioning
ERIC Educational Resources Information Center
Vaughn, Brandon K.; Wang, Qiu
2010-01-01
A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…
ERIC Educational Resources Information Center
Sachse, Karoline A.; Roppelt, Alexander; Haag, Nicole
2016-01-01
Trend estimation in international comparative large-scale assessments relies on measurement invariance between countries. However, cross-national differential item functioning (DIF) has been repeatedly documented. We ran a simulation study using national item parameters, which required trends to be computed separately for each country, to compare…
ERIC Educational Resources Information Center
Puhan, Gautam; Moses, Tim P.; Yu, Lei; Dorans, Neil J.
2007-01-01
The purpose of the current study was to examine whether log-linear smoothing of observed score distributions in small samples results in more accurate differential item functioning (DIF) estimates under the simultaneous item bias test (SIBTEST) framework. Data from a teacher certification test were analyzed using White candidates in the reference…
ERIC Educational Resources Information Center
Arikan, Serkan; van de Vijver, Fons J. R.; Yagmur, Kutlay
2018-01-01
We examined Differential Item Functioning (DIF) and the size of cross-cultural performance differences in the Programme for International Student Assessment (PISA) 2012 mathematics data before and after application of propensity score matching. The mathematics performance of Indonesian, Turkish, Australian, and Dutch students on released items was…
Watt, Torquil; Barbesino, Giuseppe; Bjorner, Jakob Bue; Bonnema, Steen Joop; Bukvic, Branka; Drummond, Russell; Groenvold, Mogens; Hegedüs, Laszlo; Kantzer, Valeska; Lasch, Kathryn E; Marcocci, Claudio; Mishra, Anjali; Netea-Maier, Romana; Ekker, Merel; Paunovic, Ivan; Quinn, Terence J; Rasmussen, Åse Krogh; Russell, Audrey; Sabaretnam, Mayilvaganan; Smit, Johannes; Törring, Ove; Zivaljevic, Vladan; Feldt-Rasmussen, Ulla
2015-03-01
Thyroid diseases are common and often affect quality of life (QoL). No cross-culturally validated patient-reported outcome measuring thyroid-related QoL is available. The purpose of the present study was to test the cross-cultural validity of the newly developed thyroid-related patient-reported outcome ThyPRO, using tests for differential item functioning (DIF) according to language version. The ThyPRO consists of 85 items summarized in 13 multi-item scales and one single item. Scales cover physical and mental symptoms, well-being and function as well as social and daily function and cosmetic concerns. Translation applied standard forward-backward methodology with subsequent cognitive interviews and reviews. Responses (N = 1,810) to the ThyPRO were collected in seven countries: UK (n = 166), The Netherlands (n = 147), Serbia (n = 150), Italy (n = 110), India (n = 148), Denmark (n = 902) and Sweden (n = 187). Translated versions were compared pairwise to the English version by examining uniform and nonuniform DIF, i.e., whether patients from different countries respond differently to a particular item, although they have identical level of the concept measured by the item. Analyses were controlled for thyroid diagnosis. DIF was investigated by ordinal logistic regression, testing for both statistical significance and magnitude (ΔR (2) > 0.02). Scale level was estimated by the sum score, after purification. For twelve of the 84 tested items, DIF was identified in more than one language. Eight of these were small, but four were indicative of possible low translatability. Twenty-one instances of DIF in single languages were identified, indicating potential problems with the particular translation. However, only seven were of a magnitude which could affect scale scores, most of which could be explained by sample differences not controlled for. The ThyPRO has good cross-cultural validity with only minor cross-cultural invariance and is recommended for use in international multicenter studies.
Harpole, Jared K; Levinson, Cheri A; Woods, Carol M; Rodebaugh, Thomas L; Weeks, Justin W; Brown, Patrick J; Heimberg, Richard G; Menatti, Andrew R; Blanco, Carlos; Schneier, Franklin; Liebowitz, Michael
2015-06-01
The Brief Fear of Negative Evaluation Scale (BFNE; Leary Personality and Social Psychology Bulletin , 9, 371-375, 1983) assesses fear and worry about receiving negative evaluation from others. Rodebaugh et al. Psychological Assessment, 16 , 169-181, (2004) found that the BFNE is composed of a reverse-worded factor (BFNE-R) and straightforwardly-worded factor (BFNE-S). Further, they found the BFNE-S to have better psychometric properties and provide more information than the BFNE-R. Currently there is a lack of research regarding the measurement invariance of the BFNE-S across gender and ethnicity with respect to item thresholds. The present study uses item response theory (IRT) to test the BFNE-S for differential item functioning (DIF) related to gender and ethnicity (White, Asian, and Black). Six data sets consisting of clinical, community, and undergraduate participants were utilized ( N =2,109). The factor structure of the BFNE-S was confirmed using categorical confirmatory factor analysis, IRT model assumptions were tested, and the BFNE-S was evaluated for DIF. Item nine demonstrated significant non-uniform DIF between White and Black participants. No other items showed significant uniform or non-uniform DIF across gender or ethnicity. Results suggest the BFNE-S can be used reliably with men and women and Asian and White participants. More research is needed to understand the implications of using the BFNE-S with Black participants.
Detecting Gender Bias Through Test Item Analysis
NASA Astrophysics Data System (ADS)
González-Espada, Wilson J.
2009-03-01
Many physical science and physics instructors might not be trained in pedagogically appropriate test construction methods. This could lead to test items that do not measure what they are intended to measure. A subgroup of these items might show bias against some groups of students. This paper describes how the author became aware of potentially biased items against females in his examinations, which led to the exploration of fundamental issues related to item validity, gender bias, and differential item functioning, or DIF. A brief discussion of DIF in the context of university courses, as well as practical suggestions to detect possible gender-biased items, follows.
Item Purification in Differential Item Functioning Using Generalized Linear Mixed Models
ERIC Educational Resources Information Center
Liu, Qian
2011-01-01
For this dissertation, four item purification procedures were implemented onto the generalized linear mixed model for differential item functioning (DIF) analysis, and the performance of these item purification procedures was investigated through a series of simulations. Among the four procedures, forward and generalized linear mixed model (GLMM)…
Jones, Richard N
2006-11-01
Knowledge of the extent to which measurement of adult cognitive functioning differs between Spanish and English language administrations of the Mini-Mental State Examination (MMSE) is critical for inclusive, representative, and valid research of older adults in the United States. We sought to demonstrate the use of an item response theory (IRT) based structural equation model, that is, the MIMIC model (multiple indicators, multiple causes), to evaluate MMSE responses for evidence of differential item functioning (DIF) attributable to language of administration. We studied participants in a dementia case registry study (n = 1546), 42% of whom were examined with the Spanish language MMSE. Twelve of 21 items were identified as having significant uniform DIF. The 4 most discrepant included orientation to season, orientation to state, repeat phrase, and follow command. DIF accounted for two-thirds of the observed difference in underlying level of cognitive functioning between Spanish- and English-language administration groups. Failing to account for measurement differences may lead to spurious inferences regarding language group differences in level of underlying level of cognitive functioning. The MIMIC model can be used to detect and adjust for such measurement differences in substantive research.
The Mediated MIMIC Model for Understanding the Underlying Mechanism of DIF
ERIC Educational Resources Information Center
Cheng, Ying; Shao, Can; Lathrop, Quinn N.
2016-01-01
Due to its flexibility, the multiple-indicator, multiple-causes (MIMIC) model has become an increasingly popular method for the detection of differential item functioning (DIF). In this article, we propose the mediated MIMIC model method to uncover the underlying mechanism of DIF. This method extends the usual MIMIC model by including one variable…
Rasch Mixture Models for DIF Detection: A Comparison of Old and New Score Specifications
ERIC Educational Resources Information Center
Frick, Hannah; Strobl, Carolin; Zeileis, Achim
2015-01-01
Rasch mixture models can be a useful tool when checking the assumption of measurement invariance for a single Rasch model. They provide advantages compared to manifest differential item functioning (DIF) tests when the DIF groups are only weakly correlated with the manifest covariates available. Unlike in single Rasch models, estimation of Rasch…
A Monte Carlo Study of Skewed Theta Distributions on DIF Indices.
ERIC Educational Resources Information Center
Monaco, Malina
The effects of skewed theta distributions on indices of differential item functioning (DIF) were studied, comparing Mantel Haenszel (N. Mantel and W. Haenszel, 1959) and DFIT (N. S. Raju, W. J. van der Linden, and P. F. Fleer) (noncompensatory DIF). The significance of the study is that in educational and psychological data, the distributions one…
Setodji, Claude M; Elliott, Marc N; Abel, Gary; Burt, Jenni; Roland, Martin; Campbell, John
2015-09-01
To evaluate two 5-item patient experience scales from the English General Practice (GP) Patient Survey for evidence of differential item functioning (DIF) given prior evidence of substantially worse reported health care experiences for South Asian compared with white British respondents. A national survey of English patients' primary care experiences. We used classic test and item response theory analysis to examine the possibility of DIF by patient ethnicity (South Asian, white British) after controlling for age, sex, health status, and quality of life in the English GP Patient Survey conducted in 2011/2012. Data were available for 873,051 respondents (818,219 white British/54,832 South Asian from 7795 English practices) who answered items relating to experiences of GP or nurses' care. Internal consistency reliability was high and similar for South Asian and white British patients. White British patients reported better average experiences than South Asians, but there was no evidence of DIF or different item response curves for white British and South Asian respondents, even in sensitivity analyses using matched samples. All communication items in the English GP Patient Survey showed similar South Asian versus white British differences, with no evidence of DIF. In contrast, differences due to scale use or expectations are typically variable rather than constant across scales. While other possibilities remain, these findings increase the likelihood that the observed negative responses of South Asian patients to this national survey reflect true differences in their experiences of care.
Detecting Differential Person Functioning in Emotional Intelligence
ERIC Educational Resources Information Center
Alsmadi, Yahia M.; Alsmadi, Abdalla A.
2009-01-01
Differential Item Functioning (DIF) is a widely used term in test development literature. It is very important to analyze test's data for DIF because It is a serious threat to validity. If the same data matrix was transposed, similar analysis can be carried for Differential Person Functioning (DPF). The purpose of this paper is to introduce and…
Wu, Tzu-Yi; Lin, Chung-Ying; Årestedt, Kristofer; Griffiths, Mark D.; Broström, Anders; Pakpour, Amir H.
2017-01-01
Background and aims The nine-item Internet Gaming Disorder Scale – Short Form (IGDS-SF9) is brief and effective to evaluate Internet Gaming Disorder (IGD) severity. Although its scores show promising psychometric properties, less is known about whether different groups of gamers interpret the items similarly. This study aimed to verify the construct validity of the Persian IGDS-SF9 and examine the scores in relation to gender and hours spent online gaming among 2,363 Iranian adolescents. Methods Confirmatory factor analysis (CFA) and Rasch analysis were used to examine the construct validity of the IGDS-SF9. The effects of gender and time spent online gaming per week were investigated by multigroup CFA and Rasch differential item functioning (DIF). Results The unidimensionality of the IGDS-SF9 was supported in both CFA and Rasch. However, Item 4 (fail to control or cease gaming activities) displayed DIF (DIF contrast = 0.55) slightly over the recommended cutoff in Rasch but was invariant in multigroup CFA across gender. Items 4 (DIF contrast = −0.67) and 9 (jeopardize or lose an important thing because of gaming activity; DIF contrast = 0.61) displayed DIF in Rasch and were non-invariant in multigroup CFA across time spent online gaming. Conclusions Given the Persian IGDS-SF9 was unidimensional, it is concluded that the instrument can be used to assess IGD severity. However, users of the instrument are cautioned concerning the comparisons of the sum scores of the IGDS-SF9 across gender and across adolescents spending different amounts of time online gaming. PMID:28571474
Wu, Tzu-Yi; Lin, Chung-Ying; Årestedt, Kristofer; Griffiths, Mark D; Broström, Anders; Pakpour, Amir H
2017-06-01
Background and aims The nine-item Internet Gaming Disorder Scale - Short Form (IGDS-SF9) is brief and effective to evaluate Internet Gaming Disorder (IGD) severity. Although its scores show promising psychometric properties, less is known about whether different groups of gamers interpret the items similarly. This study aimed to verify the construct validity of the Persian IGDS-SF9 and examine the scores in relation to gender and hours spent online gaming among 2,363 Iranian adolescents. Methods Confirmatory factor analysis (CFA) and Rasch analysis were used to examine the construct validity of the IGDS-SF9. The effects of gender and time spent online gaming per week were investigated by multigroup CFA and Rasch differential item functioning (DIF). Results The unidimensionality of the IGDS-SF9 was supported in both CFA and Rasch. However, Item 4 (fail to control or cease gaming activities) displayed DIF (DIF contrast = 0.55) slightly over the recommended cutoff in Rasch but was invariant in multigroup CFA across gender. Items 4 (DIF contrast = -0.67) and 9 (jeopardize or lose an important thing because of gaming activity; DIF contrast = 0.61) displayed DIF in Rasch and were non-invariant in multigroup CFA across time spent online gaming. Conclusions Given the Persian IGDS-SF9 was unidimensional, it is concluded that the instrument can be used to assess IGD severity. However, users of the instrument are cautioned concerning the comparisons of the sum scores of the IGDS-SF9 across gender and across adolescents spending different amounts of time online gaming.
Measuring Attitudes About Intimate Partner Violence Against Women: The ATT-IPV Scale
Yount, Kathryn M.; VanderEnde, Kristin; Zureick-Brown, Sarah; Anh, Hoang Tu; Schuler, Sidney Ruth; Minh, Tran Hung
2014-01-01
In lower-income settings, women more often than men justify intimate partner violence (IPV). Yet, the role of measurement invariance across gender is unstudied. We developed the ATT-IPV scale to measure attitudes about physical violence against wives in 1,055 married men and women ages 18–50 in My Hao district, Vietnam. Across 10 items about transgressions of the wife, women more often than men agreed that a man had good reason to hit his wife (3 % to 92 %; 0 % to 67 %). In random split-half samples, one-factor exploratory factor analysis (EFA) (N1 = 527) and confirmatory factor analysis (CFA) (N2 = 528) models for nine items with sufficient variability had significant loadings (0.575–0.883; 0.502–0.897) and good fit (RMSEA = 0.068, 0.048; CFI = 0.951, 0.978, TLI = 0.935, 0.970). Three items had significant uniform differential item functioning (DIF) by gender, and adjustment for DIF revealed that measurement noninvariance was partially masking men’s lower propensity than women to justify IPV. A CFA model for the six items without DIF had excellent fit (RMSEA = 0.019, CFI = 0.994, TLI = 0.991) and an attitudinal gender gap similar to the DIF-adjusted nine-item model, suggesting that the six-item scale reliably measures attitudes about IPV across gender. Researchers should validate the scale in urban Vietnam and elsewhere and decompose DIF-adjusted gender attitudinal gaps. PMID:24846070
Sandilos, Lia E.; Lewis, Kandia; Komaroff, Eugene; Hammer, Carol Scheffner; Scarpino, Shelley E.; Lopez, Lisa; Rodriguez, Barbara; Goldstein, Brian
2015-01-01
The purpose of this study was to investigate the way in which items on the Woodcock-Muñoz Language Survey Revised (WMLS-R) Spanish and English versions function for bilingual children from different ethnic subgroups who speak different dialects of Spanish. Using data from a sample of 324 bilingual Hispanic families and their children living on the United States mainland, differential item functioning (DIF) was conducted to determine if test items in English and Spanish functioned differently for Mexican, Cuban, and Puerto Rican bilingual children. Data on child and parent language characteristics and children’s scores on Picture Vocabulary and Story Recall subtests in English and Spanish were collected. DIF was not detected for items on the Spanish subtests. Results revealed that some items on English subtests displayed statistically and practically significant DIF. The findings indicate that there are differences in the difficulty level of WMLS-R English-form test items depending on the examinees’ ethnic subgroup membership. This outcome suggests that test developers need to be mindful of potential differences in performance based on ethnic subgroup and dialect when developing standardized language assessments that may be administered to bilingual students. PMID:26705400
Effect of Differential Item Functioning on Test Equating
ERIC Educational Resources Information Center
Kabasakal, Kübra Atalay; Kelecioglu, Hülya
2015-01-01
This study examines the effect of differential item functioning (DIF) items on test equating through multilevel item response models (MIRMs) and traditional IRMs. The performances of three different equating models were investigated under 24 different simulation conditions, and the variables whose effects were examined included sample size, test…
The MIMIC Model as a Tool for Differential Bundle Functioning Detection
ERIC Educational Resources Information Center
Finch, W. Holmes
2012-01-01
Increasingly, researchers interested in identifying potentially biased test items are encouraged to use a confirmatory, rather than exploratory, approach. One such method for confirmatory testing is rooted in differential bundle functioning (DBF), where hypotheses regarding potential differential item functioning (DIF) for sets of items (bundles)…
ERIC Educational Resources Information Center
Wang, Wen-Chung
2004-01-01
Scale indeterminacy in analysis of differential item functioning (DIF) within the framework of item response theory can be resolved by imposing 3 anchor item methods: the equal-mean-difficulty method, the all-other anchor item method, and the constant anchor item method. In this article, applicability and limitations of these 3 methods are…
Roberts, Chris; Zoanetti, Nathan; Rothnie, Imogene
2009-04-01
The multiple mini-interview (MMI) was initially designed to test non-cognitive characteristics related to professionalism in entry-level students. However, it may be testing cognitive reasoning skills. Candidates to medical and dental schools come from diverse backgrounds and it is important for the validity and fairness of the MMI that these background factors do not impact on their scores. A suite of advanced psychometric techniques drawn from item response theory (IRT) was used to validate an MMI question bank in order to establish the conceptual equivalence of the questions. Bias against candidate subgroups of equal ability was investigated using differential item functioning (DIF) analysis. All 39 questions had a good fit to the IRT model. Of the 195 checklist items, none were found to have significant DIF after visual inspection of expected score curves, consideration of the number of applicants per category, and evaluation of the magnitude of the DIF parameter estimates. The question bank contains items that have been studied carefully in terms of model fit and DIF. Questions appear to measure a cognitive unidimensional construct, 'entry-level reasoning skills in professionalism', as suggested by goodness-of-fit statistics. The lack of items exhibiting DIF is encouraging in a contemporary high-stakes admission setting where candidates of diverse personal, cultural and academic backgrounds are assessed by common means. This IRT approach has potential to provide assessment designers with a quality control procedure that extends to the level of checklist items.
ERIC Educational Resources Information Center
Ahmadi, Alireza; Bazvand, Ali Darabi
2016-01-01
Differential Item Functioning (DIF) exists when examinees of equal ability from different groups have different probabilities of successful performance in a certain item. This study examined gender differential item functioning across the PhD Entrance Exam of TEFL (PEET) in Iran, using both logistic regression (LR) and one-parameter item response…
Terluin, Berend; Smits, Niels; Miedema, Baukje
2014-12-01
Translations of questionnaires need to be carefully validated to assure that the translation measures the same construct(s) as the original questionnaire. The four-dimensional symptom questionnaire (4DSQ) is a Dutch self-report questionnaire measuring distress, depression, anxiety and somatization. To evaluate the equivalence of the English version of the 4DSQ. 4DSQ data of English and Dutch speaking general practice attendees were analysed and compared. The English speaking group consisted of 205 attendees, aged 18-64 years, in general practice, in Canada whereas the Dutch group consisted of 302 general practice attendees in the Netherlands. Differential item functioning (DIF) analysis was conducted using the Mantel-Haenszel method and ordinal logistic regression. Differential test functioning (DTF; i.e., the scale impact of DIF) was evaluated using linear regression analysis. DIF was detected in 2/16 distress items, 2/6 depression items, 2/12 anxiety items, and 1/16 somatization items. With respect to mean scale scores, the impact of DIF on the scale level was negligible for all scales. On the anxiety scale DIF caused the English speaking patients with moderate to severe anxiety to score about one point lower than Dutch patients with the same anxiety level. The English 4DSQ measures the same constructs like the original Dutch 4DSQ. The distress, depression and somatization scales can employ the same cut-off points as the corresponding Dutch scales. However, cut-off points of the English 4DSQ anxiety scale should be lowered by one point to retain the same meaning as the Dutch anxiety cut-off points.
Küçükdeveci, Ayse A; Sahin, Hülya; Ataman, Sebnem; Griffiths, Bridget; Tennant, Alan
2004-02-15
Guidelines have been established for cross-cultural adaptation of outcome measures. However, invariance across cultures must also be demonstrated through analysis of Differential Item Functioning (DIF). This is tested in the context of a Turkish adaptation of the Health Assessment Questionnaire (HAQ). Internal construct validity of the adapted HAQ is assessed by Rasch analysis; reliability, by internal consistency and the intraclass correlation coefficient; external construct validity, by association with impairments and American College of Rheumatology functional stages. Cross-cultural validity is tested through DIF by comparison with data from the UK version of the HAQ. The adapted version of the HAQ demonstrated good internal construct validity through fit of the data to the Rasch model (mean item fit 0.205; SD 0.998). Reliability was excellent (alpha = 0.97) and external construct validity was confirmed by expected associations. DIF for culture was found in only 1 item. Cross-cultural validity was found to be sufficient for use in international studies between the UK and Turkey. Future adaptation of instruments should include analysis of DIF at the field testing stage in the adaptation process.
The Usefulness of Differential Item Functioning Methodology in Longitudinal Intervention Studies
USDA-ARS?s Scientific Manuscript database
Perceived self-efficacy (SE) for engaging in physical activity (PA) is a key variable mediating PA change in interventions. The purpose of this study is to demonstrate the usefulness of item response modeling-based (IRM) differential item functioning (DIF) in the investigation of group differences ...
ERIC Educational Resources Information Center
Beinicke, Andrea; Pässler, Katja; Hell, Benedikt
2014-01-01
The study investigates consequences of eliminating items showing gender-specific differential item functioning (DIF) on the psychometric structure of a standard RIASEC interest inventory. Holland's hexagonal model was tested for structural invariance using a confirmatory methodological approach (confirmatory factor analysis and randomization…
The Effects of Testlets on Reliability and Differential Item Functioning
ERIC Educational Resources Information Center
Teker, Gulsen Tasdelen; Dogan, Nuri
2015-01-01
Reliability and differential item functioning (DIF) analyses were conducted on testlets displaying local item dependence in this study. The data set employed in the research was obtained from the answers given by 1,500 students to the 20 items included in six testlets given in English Proficiency Exam by the School of Foreign Languages of a state…
MIMIC Methods for Assessing Differential Item Functioning in Polytomous Items
ERIC Educational Resources Information Center
Wang, Wen-Chung; Shih, Ching-Lin
2010-01-01
Three multiple indicators-multiple causes (MIMIC) methods, namely, the standard MIMIC method (M-ST), the MIMIC method with scale purification (M-SP), and the MIMIC method with a pure anchor (M-PA), were developed to assess differential item functioning (DIF) in polytomous items. In a series of simulations, it appeared that all three methods…
Identifying Differential Item Functioning in Multi-Stage Computer Adaptive Testing
ERIC Educational Resources Information Center
Gierl, Mark J.; Lai, Hollis; Li, Johnson
2013-01-01
The purpose of this study is to evaluate the performance of CATSIB (Computer Adaptive Testing-Simultaneous Item Bias Test) for detecting differential item functioning (DIF) when items in the matching and studied subtest are administered adaptively in the context of a realistic multi-stage adaptive test (MST). MST was simulated using a 4-item…
Measurement equivalence and differential item functioning in family psychology.
Bingenheimer, Jeffrey B; Raudenbush, Stephen W; Leventhal, Tama; Brooks-Gunn, Jeanne
2005-09-01
Several hypotheses in family psychology involve comparisons of sociocultural groups. Yet the potential for cross-cultural inequivalence in widely used psychological measurement instruments threatens the validity of inferences about group differences. Methods for dealing with these issues have been developed via the framework of item response theory. These methods deal with an important type of measurement inequivalence, called differential item functioning (DIF). The authors introduce DIF analytic methods, linking them to a well-established framework for conceptualizing cross-cultural measurement equivalence in psychology (C.H. Hui and H.C. Triandis, 1985). They illustrate the use of DIF methods using data from the Project on Human Development in Chicago Neighborhoods (PHDCN). Focusing on the Caregiver Warmth and Environmental Organization scales from the PHDCN's adaptation of the Home Observation for Measurement of the Environment Inventory, the authors obtain results that exemplify the range of outcomes that may result when these methods are applied to psychological measurement instruments. (c) 2005 APA, all rights reserved
Gao, Yong; Zhu, Weimo
2011-05-01
The purpose of this study was to identify subgroup-sensitive physical activities (PA) using differential item functioning (DIF) analysis. A sub-unweighted sample of 1857 (men=923 and women=934) from the 2003-2004 National Health and Nutrition Examination Survey PA questionnaire data was used for the analyses. Using the Mantel-Haenszel, the simultaneous item bias test, and the ANOVA DIF methods, 33 specific leisure-time moderate and/or vigorous PA (MVPA) items were analyzed for DIF across race/ethnicity, gender, education, income, and age groups. Many leisure-time MVPA items were identified as large DIF items. When participating in the same amount of leisure-time MVPA, non-Hispanic blacks were more likely to participate in basketball and dance activities than non-Hispanic whites (NHW); NHW were more likely to participated in golf and hiking than non-Hispanic blacks; Hispanics were more likely to participate in dancing, hiking, and soccer than NHW, whereas NHW were more likely to engage in bicycling, golf, swimming, and walking than Hispanics; women were more likely to participate in aerobics, dancing, stretching, and walking than men, whereas men were more likely to engage in basketball, fishing, golf, running, soccer, weightlifting, and hunting than women; educated persons were more likely to participate in jogging and treadmill exercise than less educated persons; persons with higher incomes were more likely to engage in golf than those with lower incomes; and adults (20-59 yr) were more likely to participate in basketball, dancing, jogging, running, and weightlifting than older adults (60+ yr), whereas older adults were more likely to participate in walking and golf than younger adults. DIF methods are able to identify subgroup-sensitive PA and thus provide useful information to help design group-sensitive, targeted interventions for disadvantaged PA subgroups. © 2011 by the American College of Sports Medicine
Effects of Linking Methods on Detection of DIF.
ERIC Educational Resources Information Center
Kim, Seock-Ho; Cohen, Allan S.
1992-01-01
Effects of the following methods for linking metrics on detection of differential item functioning (DIF) were compared: (1) test characteristic curve method (TCC); (2) weighted mean and sigma method; and (3) minimum chi-square method. With large samples, results were essentially the same. With small samples, TCC was most accurate. (SLD)
Evaluation of the CATSIB DIF Procedure in a Pretest Setting
ERIC Educational Resources Information Center
Nandakumar, Ratna; Roussos, Louis
2004-01-01
A new procedure, CATSIB, for assessing differential item functioning (DIF) on computerized adaptive tests (CATs) is proposed. CATSIB, a modified SIBTEST procedure, matches test takers on estimated ability and controls for impact-induced Type 1 error inflation by employing a CAT version of the IBTEST "regression correction." The…
Unexpected Direction of Differential Item Functioning
ERIC Educational Resources Information Center
Park, Sangwook
2011-01-01
Many studies have been conducted to evaluate the performance of DIF detection methods, when two groups have different ability distributions. Such studies typically have demonstrated factors that are associated with inflation of Type I error rates in DIF detection, such as mean ability differences. However, no study has examined how the direction…
Effect Size Measures for Differential Item Functioning in a Multidimensional IRT Model
ERIC Educational Resources Information Center
Suh, Youngsuk
2016-01-01
This study adapted an effect size measure used for studying differential item functioning (DIF) in unidimensional tests and extended the measure to multidimensional tests. Two effect size measures were considered in a multidimensional item response theory model: signed weighted P-difference and unsigned weighted P-difference. The performance of…
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning
ERIC Educational Resources Information Center
Finch, W. Holmes
2011-01-01
Missing information is a ubiquitous aspect of data analysis, including responses to items on cognitive and affective instruments. Although the broader statistical literature describes missing data methods, relatively little work has focused on this issue in the context of differential item functioning (DIF) detection. Such prior research has…
Chung, Hyewon; Kim, Jiseon; Cook, Karon F; Askew, Robert L; Revicki, Dennis A; Amtmann, Dagmar
2014-02-01
In order to test the difference between group means, the construct measured must have the same meaning for all groups under investigation. This study examined the measurement invariance of responses to the patient-reported outcomes measurement information system (PROMIS) pain behavior (PB) item bank in two samples: the PROMIS calibration sample (Wave 1, N = 426) and a sample recruited from the American Chronic Pain Association (ACPA, N = 750). The ACPA data were collected to increase the number of participants with higher levels of pain. Multi-group confirmatory factor analysis (MG-CFA) and two item response theory (IRT)-based differential item functioning (DIF) approaches were employed to evaluate the existence of measurement invariance. MG-CFA results supported metric invariance of the PROMIS-PB, indicating unstandardized factor loadings with equal across samples. DIF analyses revealed that impact of 6 DIF items was negligible. Based on the results of both MG-CFA and IRT-based DIF approaches, we recommend retaining the original parameter estimates obtained from the combined samples based on the results of MG-CFA.
ERIC Educational Resources Information Center
Ajeigbe, Taiwo Oluwafemi; Afolabi, Eyitayo Rufus Ifedayo
2017-01-01
This study assessed unidimensionality and occurrence of Differential Item Functioning (DIF) in Mathematics and English Language items of Osun State Qualifying Examination. The study made use of secondary data. The results showed that OSQ Mathematics (-0.094 = r = 0.236) and English Language items (-0.095 = r = 0.228) were unidimensional. Also,…
Effect of Multiple Testing Adjustment in Differential Item Functioning Detection
ERIC Educational Resources Information Center
Kim, Jihye; Oshima, T. C.
2013-01-01
In a typical differential item functioning (DIF) analysis, a significance test is conducted for each item. As a test consists of multiple items, such multiple testing may increase the possibility of making a Type I error at least once. The goal of this study was to investigate how to control a Type I error rate and power using adjustment…
ERIC Educational Resources Information Center
Zebehazy, Kim T.; Zigmond, Naomi; Zimmerman, George J.
2012-01-01
Introduction: This study investigated differential item functioning (DIF) of test items on Pennsylvania's Alternate System of Assessment (PASA) for students with visual impairments and severe cognitive disabilities and what the reasons for the differences may be. Methods: The Wilcoxon signed ranks test was used to analyze differences in the scores…
RhinAsthma patient perspective: A Rasch validation study.
Molinengo, Giorgia; Baiardini, Ilaria; Braido, Fulvio; Loera, Barbara
2018-02-01
In daily practice, Health-Related Quality of Life (HRQoL) tools are useful for supplementing clinical data with the patient's perspective. To encourage their use by clinicians, the availability of tools that can quickly provide valid results is crucial. A new HRQoL tool has been proposed for patients with asthma and rhinitis: the RhinAsthma Patient Perspective-RAPP. The aim of this study was to evaluate the psychometric robustness of the RAPP using the Item Response Theory (IRT) approach, to evaluate the scalability of items and test whether or not patients use the items response scale correctly. 155 patients (53.5% women, mean age 39.1, range 16-76) were recruited during a multicenter study. RAPP metric properties were investigated using IRT models. Differential item functioning (DIF) was used for gender, age, and asthma control test (ACT). The RAPP adequately fitted the Rating Scale model, demonstrating the equality of the rating scale structure for all items. All statistics on items were satisfactory. The RAPP had adequate internal reliability and showed good ability to discriminate among different groups of participants. DIF analysis indicated that there were no differential item functioning issues for gender. One item showed a DIF by age and four items by ACT. The psychometric evaluation performed using IRT models demonstrated that the RAPP met all the criteria to be considered a reliable and valid method of measurement. From a clinical perspective, this will allow physicians to confidently interpret scores as good indicators of Quality of Life of patients with asthma.
Responding to Claims of Misrepresentation
ERIC Educational Resources Information Center
Santelices, Maria Veronica; Wilson, Mark
2010-01-01
In their paper "Unfair Treatment? The Case of Freedle, the SAT, and the Standardization Approach to Differential Item Functioning" (Santelices & Wilson, 2010), the authors studied claims of differential effects of the SAT on Latinos and African Americans through the methodology of differential item functioning (DIF). Previous…
ERIC Educational Resources Information Center
Magis, David; Raiche, Gilles; Beland, Sebastien; Gerard, Paul
2011-01-01
We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual framework of a single focal group, we propose a general approach to estimate the item response functions in each group and to test for the presence…
Item Response Theory Using Hierarchical Generalized Linear Models
ERIC Educational Resources Information Center
Ravand, Hamdollah
2015-01-01
Multilevel models (MLMs) are flexible in that they can be employed to obtain item and person parameters, test for differential item functioning (DIF) and capture both local item and person dependence. Papers on the MLM analysis of item response data have focused mostly on theoretical issues where applications have been add-ons to simulation…
Differential Performance by English Language Learners on an Inquiry-Based Science Assessment
NASA Astrophysics Data System (ADS)
Turkan, Sultan; Liu, Ou Lydia
2012-10-01
The performance of English language learners (ELLs) has been a concern given the rapidly changing demographics in US K-12 education. This study aimed to examine whether students' English language status has an impact on their inquiry science performance. Differential item functioning (DIF) analysis was conducted with regard to ELL status on an inquiry-based science assessment, using a multifaceted Rasch DIF model. A total of 1,396 seventh- and eighth-grade students took the science test, including 313 ELL students. The results showed that, overall, non-ELLs significantly outperformed ELLs. Of the four items that showed DIF, three favored non-ELLs while one favored ELLs. The item that favored ELLs provided a graphic representation of a science concept within a family context. There is some evidence that constructed-response items may help ELLs articulate scientific reasoning using their own words. Assessment developers and teachers should pay attention to the possible interaction between linguistic challenges and science content when designing assessment for and providing instruction to ELLs.
Lúcio, Patrícia Silva; Cogo-Moreira, Hugo; Puglisi, Marina; Polanczyk, Guilherme Vanoni; Little, Todd D
2017-11-01
The present study investigated the psychometric properties of the Raven's Colored Progressive Matrices (CPM) test in a sample of preschoolers from Brazil ( n = 582; age: mean = 57 months, SD = 7 months; 46% female). We investigated the plausibility of unidimensionality of the items (confirmatory factor analysis) and differential item functioning (DIF) for sex and age (multiple indicators multiple causes method). We tested four unidimensional models and the one with the best-fit index was a reduced form of the Raven's CPM. The DIF analysis was carried out with the reduced form of the test. A few items presented DIF (two for sex and one for age), confirming that the Raven's CPM items are mostly measurement invariant. There was no effect of sex on the general factor, but increasing age was associated with higher values of the g factor. Future research should indicate if the reduced form is suitable for evaluating the general ability of preschoolers.
Crins, Martine H P; Terwee, Caroline B; Klausch, Thomas; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis A; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Roorda, Leo D
2017-07-01
The objective of this study was to assess the psychometric properties of the Dutch-Flemish Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank in Dutch patients with chronic pain. A bank of 121 items was administered to 1,247 Dutch patients with chronic pain. Unidimensionality was assessed by fitting a one-factor confirmatory factor analysis and evaluating resulting fit statistics. Items were calibrated with the graded response model and its fit was evaluated. Cross-cultural validity was assessed by testing items for differential item functioning (DIF) based on language (Dutch vs. English). Construct validity was evaluated by calculation correlations between scores on the Dutch-Flemish PROMIS Physical Function measure and scores on generic and disease-specific measures. Results supported the Dutch-Flemish PROMIS Physical Function item bank's unidimensionality (Comparative Fit Index = 0.976, Tucker Lewis Index = 0.976) and model fit. Item thresholds targeted a wide range of physical function construct (threshold-parameters range: -4.2 to 5.6). Cross-cultural validity was good as four items only showed DIF for language and their impact on item scores was minimal. Physical Function scores were strongly associated with scores on all other measures (all correlations ≤ -0.60 as expected). The Dutch-Flemish PROMIS Physical Function item bank exhibited good psychometric properties. Development of a computer adaptive test based on the large bank is warranted. Copyright © 2017 Elsevier Inc. All rights reserved.
An Analytical Evaluation of Two Common-Odds Ratios as Population Indicators of DIF.
ERIC Educational Resources Information Center
Pommerich, Mary; And Others
The Mantel-Haenszel (MH) statistic for identifying differential item functioning (DIF) commonly conditions on the observed test score as a surrogate for conditioning on latent ability. When the comparison group distributions are not completely overlapping (i.e., are incongruent), the observed score represents different levels of latent ability…
An Analytic Comparison of Effect Sizes for Differential Item Functioning
ERIC Educational Resources Information Center
Demars, Christine E.
2011-01-01
Three types of effects sizes for DIF are described in this exposition: log of the odds-ratio (differences in log-odds), differences in probability-correct, and proportion of variance accounted for. Using these indices involves conceptualizing the degree of DIF in different ways. This integrative review discusses how these measures are impacted in…
ERIC Educational Resources Information Center
Gomez, Rapson
2012-01-01
Objective: Generalized partial credit model, which is based on item response theory (IRT), was used to test differential item functioning (DIF) for the "Diagnostic and Statistical Manual of Mental Disorders" (4th ed.), inattention (IA), and hyperactivity/impulsivity (HI) symptoms across boys and girls. Method: To accomplish this, parents completed…
Stepwise Analysis of Differential Item Functioning Based on Multiple-Group Partial Credit Model.
ERIC Educational Resources Information Center
Muraki, Eiji
1999-01-01
Extended an Item Response Theory (IRT) method for detection of differential item functioning to the partial credit model and applied the method to simulated data using a stepwise procedure. Then applied the stepwise DIF analysis based on the multiple-group partial credit model to writing trend data from the National Assessment of Educational…
Effects of Differential Item Functioning on Examinees' Test Performance and Reliability of Test
ERIC Educational Resources Information Center
Lee, Yi-Hsuan; Zhang, Jinming
2017-01-01
Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…
ERIC Educational Resources Information Center
Sachse, Karoline A.; Haag, Nicole
2017-01-01
Standard errors computed according to the operational practices of international large-scale assessment studies such as the Programme for International Student Assessment's (PISA) or the Trends in International Mathematics and Science Study (TIMSS) may be biased when cross-national differential item functioning (DIF) and item parameter drift are…
Pedraza, Otto; Graff-Radford, Neill R.; Smith, Glenn E.; Ivnik, Robert J.; Willis, Floyd B.; Petersen, Ronald C.; Lucas, John A.
2010-01-01
Scores on the Boston Naming Test (BNT) are frequently lower for African American when compared to Caucasian adults. Although demographically-based norms can mitigate the impact of this discrepancy on the likelihood of erroneous diagnostic impressions, a growing consensus suggests that group norms do not sufficiently address or advance our understanding of the underlying psychometric and sociocultural factors that lead to between-group score discrepancies. Using item response theory and methods to detect differential item functioning (DIF), the current investigation moves beyond comparisons of the summed total score to examine whether the conditional probability of responding correctly to individual BNT items differs between African American and Caucasian adults. Participants included 670 adults age 52 and older who took part in Mayo's Older Americans and Older African Americans Normative Studies. Under a 2-parameter logistic IRT framework and after correction for the false discovery rate, 12 items where shown to demonstrate DIF. Six of these 12 items (“dominoes,” “escalator,” “muzzle,” “latch,” “tripod,” and “palette”) were also identified in additional analyses using hierarchical logistic regression models and represent the strongest evidence for race/ethnicity-based DIF. These findings afford a finer characterization of the psychometric properties of the BNT and expand our understanding of between-group performance. PMID:19570311
Gender-Related Differential Item Functioning on a Middle-School Mathematics Performance Assessment.
ERIC Educational Resources Information Center
Lane, Suzanne; And Others
This study examined gender-related differential item functioning (DIF) using a mathematics performance assessment, the QUASAR Cognitive Assessment Instrument (QCAI), administered to middle school students. The QCAI was developed for the Quantitative Understanding: Amplifying Student Achievement and Reading (QUASAR) project, which focuses on…
ERIC Educational Resources Information Center
Holweger, Nancy; Taylor, Grace
The fifth-grade and eighth-grade science items on a state performance assessment were compared for differential item functioning (DIF) due to gender. The grade 5 sample consisted of 8,539 females and 8,029 males and the grade 8 sample consisted of 7,477 females and 7,891 males. A total of 30 fifth grade items and 26 eighth grade items were…
Assessing psychological well-being: self-report instruments for the NIH Toolbox.
Salsman, John M; Lai, Jin-Shei; Hendrie, Hugh C; Butt, Zeeshan; Zill, Nicholas; Pilkonis, Paul A; Peterson, Christopher; Stoney, Catherine M; Brouwers, Pim; Cella, David
2014-02-01
Psychological well-being (PWB) has a significant relationship with physical and mental health. As a part of the NIH Toolbox for the Assessment of Neurological and Behavioral Function, we developed self-report item banks and short forms to assess PWB. Expert feedback and literature review informed the selection of PWB concepts and the development of item pools for positive affect, life satisfaction, and meaning and purpose. Items were tested with a community-dwelling US Internet panel sample of adults aged 18 and above (N = 552). Classical and item response theory (IRT) approaches were used to evaluate unidimensionality, fit of items to the overall measure, and calibrations of those items, including differential item function (DIF). IRT-calibrated item banks were produced for positive affect (34 items), life satisfaction (16 items), and meaning and purpose (18 items). Their psychometric properties were supported based on the results of factor analysis, fit statistics, and DIF evaluation. All banks measured the concepts precisely (reliability ≥0.90) for more than 98% of participants. These adult scales and item banks for PWB provide the flexibility, efficiency, and precision necessary to promote future epidemiological, observational, and intervention research on the relationship of PWB with physical and mental health.
Yau, David T W; Wong, May C M; Lam, K F; McGrath, Colman
2015-08-19
Four-factor structure of the two 8-item short forms of Child Perceptions Questionnaire CPQ11-14 (RSF:8 and ISF:8) has been confirmed. However, the sum scores are typically reported in practice as a proxy of Oral health-related Quality of Life (OHRQoL), which implied a unidimensional structure. This study first assessed the unidimensionality of 8-item short forms of CPQ11-14. Item response theory (IRT) was employed to offer an alternative and complementary approach of validation and to overcome the limitations of classical test theory assumptions. A random sample of 649 12-year-old school children in Hong Kong was analyzed. Unidimensionality of the scale was tested by confirmatory factor analysis (CFA), principle component analysis (PCA) and local dependency (LD) statistic. Graded response model was fitted to the data. Contribution of each item to the scale was assessed by item information function (IIF). Reliability of the scale was assessed by test information function (TIF). Differential item functioning (DIF) across gender was identified by Wald test and expected score functions. Both CPQ11-14 RSF:8 and ISF:8 did not deviate much from the unidimensionality assumption. Results from CFA indicated acceptable fit of the one-factor model. PCA indicated that the first principle component explained >30 % of the total variation with high factor loadings for both RSF:8 and ISF:8. Almost all LD statistic <10 indicated the absence of local dependency. Flat and low IIFs were observed in the oral symptoms items suggesting little contribution of information to the scale and item removal caused little practical impact. Comparing the TIFs, RSF:8 showed slightly better information than ISF:8. In addition to oral symptoms items, the item "Concerned with what other people think" demonstrated a uniform DIF (p < 0.001). The expected score functions were not much different between boys and girls. Items related to oral symptoms were not informative to OHRQoL and deletion of these items is suggested. The impact of DIF across gender on the overall score was minimal. CPQ11-14 RSF:8 performed slightly better than ISF:8 in measurement precision. The 6-item short forms suggested by IRT validation should be further investigated to ensure their robustness, responsiveness and discriminative performance.
Prisciandaro, James J; Tolliver, Bryan K
2016-11-15
The Young Mania Rating Scale (YMRS) and Montgomery-Asberg Depression Rating Scale (MADRS) are among the most widely used outcome measures for clinical trials of medications for Bipolar Disorder (BD). Nonetheless, very few studies have examined the measurement characteristics of the YMRS and MADRS in individuals with BD using modern psychometric methods. The present study evaluated the YMRS and MADRS in the Systematic Treatment Enhancement Program for BD (STEP-BD) study using Item Response Theory (IRT). Baseline data from 3716 STEP-BD participants were available for the present analysis. The Graded Response Model (GRM) was fit separately to YMRS and MADRS item responses. Differential item functioning (DIF) was examined by regressing a variety of clinically relevant covariates (e.g., sex, substance dependence) on all test items and on the latent symptom severity dimension, within each scale. Both scales: 1) contained several items that provided little or no psychometric information, 2) were inefficient, in that the majority of item response categories did not provide incremental psychometric information, 3) poorly measured participants outside of a narrow band of severity, 4) evidenced DIF for nearly all items, suggesting that item responses were, in part, determined by factors other than symptom severity. Limited to outpatients; DIF analysis only sensitive to certain forms of DIF. The present study provides evidence for significant measurement problems involving the YMRS and MADRS. More work is needed to refine these measures and/or develop suitable alternative measures of BD symptomatology for clinical trials research. Copyright © 2016 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Ayodele, Alicia Nicole
2017-01-01
Within polytomous items, differential item functioning (DIF) can take on various forms due to the number of response categories. The lack of invariance at this level is referred to as differential step functioning (DSF). The most common DSF methods in the literature are the adjacent category log odds ratio (AC-LOR) estimator and cumulative…
ERIC Educational Resources Information Center
Penfield, Randall D.; Giacobbi, Peter R., Jr.; Myers, Nicholas D.
2007-01-01
One aspect of construct validity is the extent to which the measurement properties of a rating scale are invariant across the groups being compared. An increasingly used method for assessing between-group differences in the measurement properties of items of a scale is the framework of differential item functioning (DIF). In this paper we…
ERIC Educational Resources Information Center
Qi, Cathy Huaqing; Marley, Scott C.
2009-01-01
The study examined whether item bias is present in the "Preschool Language Scale-4" (PLS-4). Participants were 440 children (3-5 years old; 86% English-speaking Hispanic and 14% European American) who were enrolled in Head Start programs. The PLS-4 items were analyzed for differential item functioning (DIF) using logistic regression and…
Curriculum Type as a Differentiating Factor in Medical Licensing Examinations.
ERIC Educational Resources Information Center
Shen, Linjun
This study assessed the effects of the type of medical curriculum on differential item functioning (DIF) and group differences at the test level in Level 1 of the Comprehensive Osteopathic Medical Licensing Examinations (COMLEX). The study also explored the relationship of the DIF and group differences at the test level. There are generally two…
Type I Error Inflation for Detecting DIF in the Presence of Impact
ERIC Educational Resources Information Center
DeMars, Christine E.
2010-01-01
In this brief explication, two challenges for using differential item functioning (DIF) measures when there are large group differences in true proficiency are illustrated. Each of these difficulties may lead to inflated Type I error rates, for very different reasons. One problem is that groups matched on observed score are not necessarily well…
ERIC Educational Resources Information Center
Cauffman, Elizabeth; MacIntosh, Randall
2006-01-01
The juvenile justice system needs a tool that can identify and assess mental health problems among youths quickly with validity and reliability. The goal of this article is to evaluate the racial/ethnic and gender differential item functioning (DIF) of the Massachusetts Youth Screening Instrument-Second Version (MAYSI-2) using the Rasch Model.…
DIF Testing with an Empirical-Histogram Approximation of the Latent Density for Each Group
ERIC Educational Resources Information Center
Woods, Carol M.
2011-01-01
This research introduces, illustrates, and tests a variation of IRT-LR-DIF, called EH-DIF-2, in which the latent density for each group is estimated simultaneously with the item parameters as an empirical histogram (EH). IRT-LR-DIF is used to evaluate the degree to which items have different measurement properties for one group of people versus…
Goh, Shaun K Y; Tham, Elaine K H; Magiati, Iliana; Sim, Litwee; Sanmugam, Shamini; Qiu, Anqi; Daniel, Mary L; Broekman, Birit F P; Rifkin-Graboi, Anne
2017-09-18
The purpose of this study was to improve standardized language assessments among bilingual toddlers by investigating and removing the effects of bias due to unfamiliarity with cultural norms or a distributed language system. The Expressive and Receptive Bayley-III language scales were adapted for use in a multilingual country (Singapore). Differential item functioning (DIF) was applied to data from 459 two-year-olds without atypical language development. This involved investigating if the probability of success on each item varied according to language exposure while holding latent language ability, gender, and socioeconomic status constant. Associations with language, behavioral, and emotional problems were also examined. Five of 16 items showed DIF, 1 of which may be attributed to cultural bias and another to a distributed language system. The remaining 3 items favored toddlers with higher bilingual exposure. Removal of DIF items reduced associations between language scales and emotional and language problems, but improved the validity of the expressive scale from poor to good. Our findings indicate the importance of considering cultural and distributed language bias in standardized language assessments. We discuss possible mechanisms influencing performance on items favoring bilingual exposure, including the potential role of inhibitory processing.
Factorial and Item-Level Invariance of a Principal Perspectives Survey: German and U.S. Principals.
Wang, Chuang; Hancock, Dawson R; Muller, Ulrich
This study examined the factorial and item-level invariance of a survey of principals' job satisfaction and perspectives about reasons and barriers to becoming a principal with a sample of US principals and another sample of German principals. Confirmatory factor analysis (CFA) and differential item functioning (DIF) analysis were employed at the test and item level, respectively. A single group CFA was conducted first, and the model was found to fit the data collected. The factorial invariance between the German and the US principals was tested through three steps: (a) configural invariance; (b) measurement invariance; and (c) structural invariance. The results suggest that the survey is a viable measure of principals' job satisfaction and perspectives about reasons and barriers to becoming a principal because principals from two different cultures shared a similar pattern on all three constructs. The DIF analysis further revealed that 22 out of the 28 items functioned similarly between German and US principals.
Mueller, Anne E; Segal, Daniel L; Gavett, Brandon; Marty, Meghan A; Yochim, Brian; June, Andrea; Coolidge, Frederick L
2015-07-01
The Geriatric Anxiety Scale (GAS; Segal et al. (Segal, D. L., June, A., Payne, M., Coolidge, F. L. and Yochim, B. (2010). Journal of Anxiety Disorders, 24, 709-714. doi:10.1016/j.janxdis.2010.05.002) is a self-report measure of anxiety that was designed to address unique issues associated with anxiety assessment in older adults. This study is the first to use item response theory (IRT) to examine the psychometric properties of a measure of anxiety in older adults. A large sample of older adults (n = 581; mean age = 72.32 years, SD = 7.64 years, range = 60 to 96 years; 64% women; 88% European American) completed the GAS. IRT properties were examined. The presence of differential item functioning (DIF) or measurement bias by age and sex was assessed, and a ten-item short form of the GAS (called the GAS-10) was created. All GAS items had discrimination parameters of 1.07 or greater. Items from the somatic subscale tended to have lower discrimination parameters than items on the cognitive or affective subscales. Two items were flagged for DIF, but the impact of the DIF was negligible. Women scored significantly higher than men on the GAS and its subscales. Participants in the young-old group (60 to 79 years old) scored significantly higher on the cognitive subscale than participants in the old-old group (80 years old and older). Results from the IRT analyses indicated that the GAS and GAS-10 have strong psychometric properties among older adults. We conclude by discussing implications and future research directions.
Ayala, Alba; Bilbao, Amaia; Garcia-Perez, Sonia; Escobar, Antonio; Forjaz, Maria João
2018-03-01
The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) measures the quality of life of patients with osteoarthritis (OA), and there is a specific scale for the physical functioning dimension, the short version with seven items WOMAC-pf. This study describes the application of the Rasch model to explore scale invariance and response stability of the WOMAC-pf short version across affected joint and over time. A sample of 884 patients with OA, from 15 hospitals in Spain, completed the WOMAC-pf before surgery (baseline) and at 3, 6 and 12 months post-surgery of hip or knee. The invariance by joint was explored through the differential item functioning (DIF) analysis of the Rasch model using baseline data, and time stability (DIF by time) were evaluated in stack data (each participant is represented four times, one by time point). Mean age of the patients was of 69.13 years (SD 10.01), 59.3% of them were women (n = 524), 59.2% had knee OA (n = 523) and 40.8% hip OA (n = 361). Item "putting on socks" showed DIF by joint and time. Fit to the Rasch model using stack data improved when this item was removed. Good reliability for individual use, local independency and unidimensionality of the models were confirmed. WOMAC-pf 7-item short version was invariant over time and joint when item "putting on socks" was removed. Researchers should carefully evaluate this item as it presents problems in scale invariance and stability, which could affect results when comparing data by joint or when computing change scores.
The e-MSWS-12: improving the multiple sclerosis walking scale using item response theory.
Engelhard, Matthew M; Schmidt, Karen M; Engel, Casey E; Brenton, J Nicholas; Patek, Stephen D; Goldman, Myla D
2016-12-01
The Multiple Sclerosis Walking Scale (MSWS-12) is the predominant patient-reported measure of multiple sclerosis (MS) -elated walking ability, yet it had not been analyzed using item response theory (IRT), the emerging standard for patient-reported outcome (PRO) validation. This study aims to reduce MSWS-12 measurement error and facilitate computerized adaptive testing by creating an IRT model of the MSWS-12 and distributing it online. MSWS-12 responses from 284 subjects with MS were collected by mail and used to fit and compare several IRT models. Following model selection and assessment, subpopulations based on age and sex were tested for differential item functioning (DIF). Model comparison favored a one-dimensional graded response model (GRM). This model met fit criteria and explained 87 % of response variance. The performance of each MSWS-12 item was characterized using category response curves (CRCs) and item information. IRT-based MSWS-12 scores correlated with traditional MSWS-12 scores (r = 0.99) and timed 25-foot walk (T25FW) speed (r = -0.70). Item 2 showed DIF based on age (χ 2 = 19.02, df = 5, p < 0.01), and Item 11 showed DIF based on sex (χ 2 = 13.76, df = 5, p = 0.02). MSWS-12 measurement error depends on walking ability, but could be lowered by improving or replacing items with low information or DIF. The e-MSWS-12 includes IRT-based scoring, error checking, and an estimated T25FW derived from MSWS-12 responses. It is available at https://ms-irt.shinyapps.io/e-MSWS-12 .
Cross-Group Equivalence of Interest and Motivation Items in PISA 2012 Turkey Sample
ERIC Educational Resources Information Center
Ardic, Elif Ozlem; Gelbal, Selahattin
2017-01-01
Purpose: The aim of this study was to examine measurement invariance of the interest and motivation related items contained in the PISA 2012 student survey with regard to gender school type and statistical regions and to identify the items that show differential item functioning (DIF) across groups. Research Methods: Multiple-group confirmatory…
Tsang, Siny; Schmidt, Karen M.; Vincent, Gina M.; Salekin, Randall T.; Moretti, Marlene M.; Odgers, Candice L.
2014-01-01
This study used an item response theory (IRT) model and a large adolescent sample of justice involved youth (N = 1,007, 38% female) to examine the item functioning of the Psychopathy Checklist – Youth Version (PCL: YV). Items that were most discriminating (or most sensitive to changes) of the latent trait (thought to be psychopathy) among adolescents included “Glibness/superficial charm”, “Lack of remorse”, and “Need for stimulation”, whereas items that were least discriminating included “Pathological lying”, “Failure to accept responsibility”, and “Lacks goals.” The items “Impulsivity” and “Irresponsibility” were the most likely to be rated high among adolescents, whereas “Parasitic lifestyle”, and “Glibness/superficial charm” were the most likely to be rated low. Evidence of differential item functioning (DIF) on four of the 13 items was found between boys and girls. “Failure to accept responsibility” and “Impulsivity” were endorsed more frequently to describe adolescent girls than boys at similar levels of the latent trait, and vice versa for “Grandiose sense of self-worth” and “Lacks goals.” The DIF findings suggest that four PCL: YV items function differently between boys and girls. PMID:25580672
Examing the Validity of the Adapted Alabama Parenting Questionnaire Parent Global Report Version
Maguin, Eugene; Nochajski, Thomas; Dewit, David; Safyer, Andrew
2015-01-01
The purpose of the present study was to comprehensively examine the validity of an adapted version of the parent global report form of the Alabama Parenting Questionnaire (APQ) with respect to its factor structure, relationships with demographic and response style covariates, and differential item functioning (DIF). The APQ was adapted by omitting the Corporal Punishment and the other discipline items. The sample consisted of 674 Canadian and United States families having a 9–12 year old child and at least one parent-figure who had received treatment within the past five years for alcohol problems or met criteria for alcohol abuse or dependence. The primary parent in each family completed the APQ. The four factor CFA model of the four published scales used and the three factor CFA model of those scales from prior research were rejected. Exploratory structural equation modeling was then used. The final three factor model combined the author-defined Involvement and Positive Parenting scales and retained the original Poor Monitoring/Supervision and Inconsistent Discipline scales. However, there were substantial numbers of moderate magnitude cross-loadings and large magnitude residual covariances. Differential item functioning (DIF) was observed for a number of APQ items. Controlling for DIF, response style and demographic variables were related significantly to the factors. PMID:26348028
Rasch Mixture Models for DIF Detection
Strobl, Carolin; Zeileis, Achim
2014-01-01
Rasch mixture models can be a useful tool when checking the assumption of measurement invariance for a single Rasch model. They provide advantages compared to manifest differential item functioning (DIF) tests when the DIF groups are only weakly correlated with the manifest covariates available. Unlike in single Rasch models, estimation of Rasch mixture models is sensitive to the specification of the ability distribution even when the conditional maximum likelihood approach is used. It is demonstrated in a simulation study how differences in ability can influence the latent classes of a Rasch mixture model. If the aim is only DIF detection, it is not of interest to uncover such ability differences as one is only interested in a latent group structure regarding the item difficulties. To avoid any confounding effect of ability differences (or impact), a new score distribution for the Rasch mixture model is introduced here. It ensures the estimation of the Rasch mixture model to be independent of the ability distribution and thus restricts the mixture to be sensitive to latent structure in the item difficulties only. Its usefulness is demonstrated in a simulation study, and its application is illustrated in a study of verbal aggression. PMID:29795819
Age neutrality of the young schema questionnaire in patients with a substance use disorder.
Pauwels, Els; Claes, Laurence; Dierckx, Eva; Debast, Inge; Van Alphen, S P J Bas; Rossi, Gina; Schotte, Chris; Santens, Els; Peuskens, Hendrik
2014-08-01
Young's Schema Focused Therapy (SFT) is gaining popularity in the treatment of older adults. In the context of this therapy, the Young Schema Questionnaire (YSQ) was developed to assess the early maladaptive schemas (EMS). EMS are considered to be relatively stable over time, but research shows that questionnaires often lack face validity in older adults, which makes it difficult to investigate EMS in older adults and their stability across the lifespan. In the present cross-sectional study, we investigated the age neutrality of the Young Schema Questionnaire--Long Form in young (aged 18-34 years), middle-aged (aged 35-59 years), and older (aged 60-75 years) adults in a clinical sample of substance use disorders (N = 321) by examining potential differential item functioning (DIF). While investigating the stability of the schemas, we controlled for substance dependency and clinical symptoms by means of, respectively, the Drug Use Screening Inventory - Revised and the Symptom Checklist-90-R. The Bonferroni-adjusted Liu-Agresti Cumulative Common Log-Odds Ratio confirmed large DIF for six items, divided across five schema scales (Mistrust/Abuse, Subjugation, Entitlement, Enmeshment and Self-sacrifice). Of the six items that presented DIF, only one item showed differential test functioning (Entitlement). Overall results show only 3% DIF, implying age neutrality of the questionnaire. Current results corroborate that most EMS scales are equally measured across age, and reliable comparisons can be made across the lifespan, allowing for good clinical practice and further research on SFT in older adults. Only for Entitlement, Enmeshment, and Insufficient Self-control, caution is needed when comparing mean scores across the age groups.
[Differential item functioning: a bibliometric analysis of journals published in Spanish].
Guilera, Georgina; Gómez, Juana; Hidalgo, M Dolores
2006-11-01
Differential item functioning: a bibliometric analysis of journals published in Spanish. This study aims to provide an overview of scientific productivity with respect to articles published in Spanish on the issue of DIF. The documents included in the study were identified using the Psicodoc database, as well as the Science Citation Index and Social Science Citation Index from the Web of Science. The analyses carried out are focused mainly on presenting the frequencies and percentages of publications with respect to various bibliometric indicators. The results reveal that interest in the issue of DIF has increased, and that the universities are the most productive institutions. The majority of articles have been published in the journal Psicothema.
ERIC Educational Resources Information Center
Gonzalez-Roma, Vicente; Tomas, Ines; Ferreres, Doris; Hernandez, Ana
2005-01-01
The aims of this study were to investigate whether the 6 items of the Physical Appearance Scale (Marsh, Richards, Johnson, Roche, & Tremayne, 1994) show differential item functioning (DIF) across gender groups of adolescents, and to show how this can be done using the multigroup mean and covariance structure (MG-MACS) analysis model. Two samples…
ERIC Educational Resources Information Center
DeMars, Christine E.
2009-01-01
The Mantel-Haenszel (MH) and logistic regression (LR) differential item functioning (DIF) procedures have inflated Type I error rates when there are large mean group differences, short tests, and large sample sizes.When there are large group differences in mean score, groups matched on the observed number-correct score differ on true score,…
ERIC Educational Resources Information Center
Fidalgo, Angel M.; Ferreres, Doris; Muniz, Jose
2004-01-01
Sample-size restrictions limit the contingency table approaches based on asymptotic distributions, such as the Mantel-Haenszel (MH) procedure, for detecting differential item functioning (DIF) in many practical applications. Within this framework, the present study investigated the power and Type I error performance of empirical and inferential…
Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning
ERIC Educational Resources Information Center
Li, Zhushan
2014-01-01
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
The MIMIC Method with Scale Purification for Detecting Differential Item Functioning
ERIC Educational Resources Information Center
Wang, Wen-Chung; Shih, Ching-Lin; Yang, Chih-Chien
2009-01-01
This study implements a scale purification procedure onto the standard MIMIC method for differential item functioning (DIF) detection and assesses its performance through a series of simulations. It is found that the MIMIC method with scale purification (denoted as M-SP) outperforms the standard MIMIC method (denoted as M-ST) in controlling…
Cross-cultural validation of the German and Turkish versions of the PHQ-9: an IRT approach.
Reich, Hanna; Rief, Winfried; Brähler, Elmar; Mewes, Ricarda
2018-06-05
The Patient Health Questionnaire's depression module (PHQ-9) is a widely used screening tool to assess depressive disorders. However, cross-linguistic and cross-cultural validation of the PHQ-9 is mostly lacking. This study investigates whether scores on the German and Turkish versions of the PHQ-9 are comparable. Data from Germans without a migration background (German version, n = 1670) and Turkish immigrants in Germany (either German or Turkish version, n = 307) were used. Differential Item Functioning (DIF) was assessed using Item Response Theory (IRT) models. Several items of the PHQ-9 were found to exhibit DIF related to language or ethnicity, e.g. 'sleep problems', 'appetite changes' and 'anhedonia'. However, PHQ-9 sum scores were found to be unbiased, i.e., DIF had no notable impact on scale levels. PHQ-9 sum scores can be compared between Turkish immigrants and Germans without a migration background without any adjustments, regardless of whether they complete the German or the Turkish version.
Jafari, Peyman; Stevanovic, Dejan; Bagheri, Zahra
2016-04-01
This cross-cultural study aimed to assess whether Iranian and Serbian children, and also their parents, perceived the meaning of the items in the KINDL quality of life questionnaire consistently. The sample included 1086 Iranian and 756 Serbian children and adolescents, alongside 1061 and 618 of their parents, respectively. The ordinal logistic regression was used to assess differential item functioning (DIF) of the self and proxy-reports of the two versions of the KINDL, including Kid-KINDL and Kiddo-KINDL, across Iranian and Serbian samples. Statistically significant DIF was flagged for 14 out of 24 (58%) and 20 out of 24 (83%) items in the self-report of the Kid-KINDL and Kiddo-KINDL, respectively. Moreover, 20 out of 24 (83%) in the proxy reports of the both Kid-KINDL and Kiddo-KINDL, showed DIF across two samples. Accordingly, considerable caution is warranted when using the KINDL for cross-cultural comparisons.
ERIC Educational Resources Information Center
Banerjee, Jayanti; Papageorgiou, Spiros
2016-01-01
The research reported in this article investigates differential item functioning (DIF) in a listening comprehension test. The study explores the relationship between test-taker age and the items' language domains across multiple test forms. The data comprise test-taker responses (N = 2,861) to a total of 133 unique items, 46 items of which were…
ERIC Educational Resources Information Center
Robitzsch, Alexander; Rupp, Andre A.
2009-01-01
This article describes the results of a simulation study to investigate the impact of missing data on the detection of differential item functioning (DIF). Specifically, it investigates how four methods for dealing with missing data (listwise deletion, zero imputation, two-way imputation, response function imputation) interact with two methods of…
Owens, Sherry; Kristjansson, Alfgeir L; Hunte, Haslyn E R
2015-11-05
We investigated whether individual items on the nine item William's Perceived Everyday Discrimination Scale (EDS) functioned differently by age (<45 vs ≥ 45) within five racial groups in the United States: Asians (n=2,017); Hispanics (n=2,688); Black Caribbeans (n=1,377); African Americans (n=3,434); and Whites (n=854). We used data from the 2001-2003 National Survey of American Lives and the 2001-2003 National Latino and Asian Studies. Multiple-indicator, multiple-cause models (MIMIC) were used to examine differential item functioning (DIF) on the EDS by age within each racial/ethnic group. Overall, Asian and Hispanic respondents reported less discrimination than Whites; on the other hand, African Americans and Black Caribbeans reported more discrimination than Whites. Regardless of race/ethnicity, the younger respondents (aged <45 years) reported less discrimination than the older respondents (aged ≥ 45 years). In terms of age by race/ethnicity, the results were mixed for 19 out of 45 tests of DIF (40%). No differences in item function were observed among Black Caribbeans. "Being called names or insulted" and others acting as "if they are afraid" of the respondents were the only two items that did not exhibit differential item functioning by age across all racial/ethnic groups. Overall, our findings suggest that the EDS scale should be used with caution in multi-age multi-racial/ethnic samples.
ERIC Educational Resources Information Center
Hidalgo, Mª Dolores; Gómez-Benito, Juana; Zumbo, Bruno D.
2014-01-01
The authors analyze the effectiveness of the R[superscript 2] and delta log odds ratio effect size measures when using logistic regression analysis to detect differential item functioning (DIF) in dichotomous items. A simulation study was carried out, and the Type I error rate and power estimates under conditions in which only statistical testing…
Morales, Leo S; Flowers, Claudia; Gutierrez, Peter; Kleinman, Marjorie; Teresi, Jeanne A
2006-11-01
To illustrate the application of the Differential Item and Test Functioning (DFIT) method using English and Spanish versions of the Mini-Mental State Examination (MMSE). Study participants were 65 years of age or older and lived in North Manhattan, New York. Of the 1578 study participants who were administered the MMSE 665 completed it in Spanish. : The MMSE contains 20 items that measure the degree of cognitive impairment in the areas of orientation, attention and calculation, registration, recall and language, as well as the ability to follow verbal and written commands. After assessing the dimensionality of the MMSE scale, item response theory person and item parameters were estimated separately for the English and Spanish sample using Samejima's 2-parameter graded response model. Then the DFIT framework was used to assess differential item functioning (DIF) and differential test functioning (DTF). Nine items were found to show DIF; these were items that ask the respondent to name the correct season, day of the month, city, state, and 2 nearby streets, recall 3 objects, repeat the phrase no ifs, no ands, no buts, follow the command, "close your eyes," and the command, "take the paper in your right hand, fold the paper in half with both hands, and put the paper down in your lap." At the scale level, however, the MMSE did not show differential functioning. Respondents to the English and Spanish versions of the MMSE are comparable on the basis of scale scores. However, assessments based on individual MMSE items may be misleading.
ERIC Educational Resources Information Center
Guler, Nese; Penfield, Randall D.
2009-01-01
In this study, we investigate the logistic regression (LR), Mantel-Haenszel (MH), and Breslow-Day (BD) procedures for the simultaneous detection of both uniform and nonuniform differential item functioning (DIF). A simulation study was used to assess and compare the Type I error rate and power of a combined decision rule (CDR), which assesses DIF…
Comparison of Objective and Subjective Methods on Determination of Differential Item Functioning
ERIC Educational Resources Information Center
Sahin, Melek Gülsah
2017-01-01
Research objective is comparing the objective methods often used in literature for determination of differential item functioning (DIF) and the subjective method based on the opinions of the experts which are not used so often in literature. Mantel-Haenszel (MH), Logistic Regression (LR) and SIBTEST are chosen as objective methods. While the data…
ERIC Educational Resources Information Center
Alavi, Seyed Mohammad; Bordbar, Soodeh
2017-01-01
Differential Item Functioning (DIF) analysis is a key element in evaluating educational test fairness and validity. One of the frequently cited sources of construct-irrelevant variance is gender which has an important role in the university entrance exam; therefore, it causes bias and consequently undermines test validity. The present study aims…
An Introduction to Missing Data in the Context of Differential Item Functioning
ERIC Educational Resources Information Center
Banks, Kathleen
2015-01-01
This article introduces practitioners and researchers to the topic of missing data in the context of differential item functioning (DIF), reviews the current literature on the issue, discusses implications of the review, and offers suggestions for future research. A total of nine studies were reviewed. All of these studies determined what effect…
Differential Item Functioning By Sex and Race in The Hogan Personality Inventory
ERIC Educational Resources Information Center
Sheppard, Richard; Han, Kyunghee; Colarelli, Stephen M.; Dai, Guangdong; King, Daniel W.
2006-01-01
The authors examined measurement bias in the Hogan Personality Inventory by investigating differential item functioning (DIF) across sex and two racial groups (Caucasian and Black). The sample consisted of 1,579 Caucasians (1,023 men, 556 women) and 523 Blacks (321 men, 202 women) who were applying for entry-level, unskilled jobs in factories.…
Srisurapanont, Manit; Arunpongpaisal, Suwanna; Wada, Kiyoshi; Marsden, John; Ali, Robert; Kongsakon, Ronnachai
2011-06-01
The concept of negative symptoms in methamphetamine (MA) psychosis (e.g., poverty of speech, flatten affect, and loss of drive) is still uncertain. This study aimed to use differential item functioning (DIF) statistical techniques to differentiate the severity of psychotic symptoms between MA psychotic and schizophrenic patients. Data of MA psychotic and schizophrenic patients were those of the participants in the WHO Multi-Site Project on Methamphetamine-Induced Psychosis (or WHO-MAIP study) and the Risperidone Long-Acting Injection in Thai Schizophrenic Patients (or RLAI-Thai study), respectively. To confirm the unidimensionality of psychotic syndromes, we applied the exploratory and confirmatory factor analyses (EFA and CFA) on the eight items of Manchester scale. We conducted the DIF analysis of psychotic symptoms observed in both groups by using nonparametric kernel-smoothing techniques of item response theory. A DIF composite index of 0.30 or greater indicated the difference of symptom severity. The analyses included the data of 168 MA psychotic participants and the baseline data of 169 schizophrenic patients. For both data sets, the EFA and CFA suggested a three-factor model of the psychotic symptoms, including negative syndrome (poverty of speech, psychomotor retardation and flatten/incongruous affect), positive syndrome (delusions, hallucinations and incoherent speech) and anxiety/depression syndrome (anxiety and depression). The DIF composite indexes comparing the severity differences of all eight psychotic symptoms were lower than 0.3. The results suggest that, at the same level of syndrome severity (i.e., negative, positive, and anxiety/depression syndromes), the severity of psychotic symptoms, including the negative ones, observed in MA psychotic and schizophrenic patients are almost the same. Copyright © 2011 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Suh, Youngsuk; Talley, Anna E.
2015-01-01
This study compared and illustrated four differential distractor functioning (DDF) detection methods for analyzing multiple-choice items. The log-linear approach, two item response theory-model-based approaches with likelihood ratio tests, and the odds ratio approach were compared to examine the congruence among the four DDF detection methods.…
Detecting a Gender-Related Differential Item Functioning Using Transformed Item Difficulty
ERIC Educational Resources Information Center
Abedalaziz, Nabeel; Leng, Chin Hai; Alahmadi, Ahlam
2014-01-01
The purpose of the study was to examine gender differences in performance on multiple-choice mathematical ability test, administered within the context of high school graduation test that was designed to match eleventh grade curriculum. The transformed item difficulty (TID) was used to detect a gender related DIF. A random sample of 1400 eleventh…
ERIC Educational Resources Information Center
Oliveri, Maria Elena; Olson, Brent F.; Ercikan, Kadriye; Zumbo, Bruno D.
2012-01-01
In this study, the Canadian English and French versions of the Problem-Solving Measure of the Programme for International Student Assessment 2003 were examined to investigate their degree of measurement comparability at the item- and test-levels. Three methods of differential item functioning (DIF) were compared: parametric and nonparametric item…
Examining the validity of the adapted Alabama Parenting Questionnaire-Parent Global Report Version.
Maguin, Eugene; Nochajski, Thomas H; De Wit, David J; Safyer, Andrew
2016-05-01
The purpose of the present study was to comprehensively examine the validity of an adapted version of the parent global report form of the Alabama Parenting Questionnaire (APQ) with respect to its factor structure, relationships with demographic and response style covariates, and differential item functioning (DIF). The APQ was adapted by omitting the corporal punishment and the other discipline items. The sample consisted of 674 Canadian and United States families having a 9- to 12-year-old child and at least 1 parent figure who had received treatment within the past 5 years for alcohol problems or met criteria for alcohol abuse or dependence. The primary parent in each family completed the APQ. The 4-factor CFA model of the 4 published scales used and the 3-factor CFA model of those scales from prior research were rejected. Exploratory structural equation modeling was then used. The final 3-factor model combined the author-defined Involvement and Positive Parenting scales and retained the original Poor Monitoring/Supervision and Inconsistent Discipline scales. However, there were substantial numbers of moderate magnitude cross-loadings and large magnitude residual covariances. Differential item functioning (DIF) was observed for a number of APQ items. Controlling for DIF, response style and demographic variables were related significantly to the factors. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Delisle, Vanessa C.; Kwakkenbos, Linda; Hudson, Marie; Baron, Murray; Thombs, Brett D.
2014-01-01
Objectives Center for Epidemiologic Studies Depression (CES-D) Scale scores in English- and French-speaking Canadian systemic sclerosis (SSc) patients are commonly pooled in analyses, but no studies have evaluated the metric equivalence of the English and French CES-D. The study objective was to examine the metric equivalence of the CES-D in English- and French-speaking SSc patients. Methods The CES-D was completed by 1007 English-speaking and 248 French-speaking patients from the Canadian Scleroderma Research Group Registry. Confirmatory factor analysis (CFA) was used to assess the factor structure in both samples. The Multiple-Indicator Multiple-Cause (MIMIC) model was utilized to assess differential item functioning (DIF). Results A two-factor model (Positive and Negative affect) showed excellent fit in both samples. Statistically significant, but small-magnitude, DIF was found for 3 of 20 CES-D items, including items 3 (Blues), 10 (Fearful), and 11 (Sleep). Prior to accounting for DIF, French-speaking patients had 0.08 of a standard deviation (SD) lower latent scores for the Positive factor (95% confidence interval [CI]−0.25 to 0.08) and 0.09 SD higher scores (95% CI−0.07 to 0.24) for the Negative factor than English-speaking patients. After DIF correction, there was no change on the Positive factor and a non-significant increase of 0.04 SD on the Negative factor for French-speaking patients (difference = 0.13 SD, 95% CI−0.03 to 0.28). Conclusions The English and French versions of the CES-D, despite minor DIF on several items, are substantively equivalent and can be used in studies that combine data from English- and French-speaking Canadian SSc patients. PMID:25036894
Huang, Frederick Y; Chung, Henry; Kroenke, Kurt; Delucchi, Kevin L; Spitzer, Robert L
2006-06-01
The Patient Health Questionnaire depression scale (PHQ-9) is a well-validated, Diagnostic and Statistical Manual of Mental Disorders- Fourth Edition (DSM-IV) criterion-based measure for diagnosing depression, assessing severity and monitoring treatment response. The performance of most depression scales including the PHQ-9, however, has not been rigorously evaluated in different racial/ethnic populations. Therefore, we compared the factor structure of the PHQ-9 between different racial/ethnic groups as well as the rates of endorsement and differential item functioning (DIF) of the 9 items of the PHQ-9. The presence of DIF would indicate that responses to an individual item differ significantly between groups, controlling for the level of depression. A combined dataset from 2 separate studies of 5,053 primary care patients including non-Hispanic white (n=2,520), African American (n=598), Chinese American (n=941), and Latino (n=974) patients was used for our analysis. Exploratory principal components factor analysis was used to derive the factor structure of the PHQ-9 in each of the 4 racial/ethnic groups. A generalized Mantel-Haenszel statistic was used to test for DIF. One main factor that included all PHQ-9 items was found in each racial/ethnic group with alpha coefficients ranging from 0.79 to 0.89. Although endorsement rates of individual items were generally similar among the 4 groups, evidence of DIF was found for some items. Our analyses indicate that in African American, Chinese American, Latino, and non-Hispanic white patient groups the PHQ-9 measures a common concept of depression and can be effective for the detection and monitoring of depression in these diverse populations.
Peipert, John D; Bentler, Peter; Klicko, Kristi; Hays, Ron D
2018-05-14
Black dialysis patients report better health-related quality of life (HRQOL) than White patients, which may be explained if Black and White patients respond systematically differently to HRQOL survey items. We examined differential item functioning (DIF) of the Kidney Disease Quality of Life 36-item (KDQOL TM -36) Burden of Kidney Disease, Symptoms and Problems with Kidney Disease, and Effects of Kidney Disease scales between Black (n = 18,404) and White (n = 21,439) dialysis patients. We fit multiple group confirmatory factor analysis models with increasing invariance: a Configural model (invariant factor structure), a Metric model (invariant factor loadings), and a Scalar model (invariant intercepts). Criteria for invariance included non-significant χ 2 tests, > 0.002 difference in the models' CFI, and > 0.015 difference in RMSEA and SRMR. Next, starting with a fully invariant model, we freed loadings and intercepts item-by-item to determine if DIF impacted estimated KDQOL TM -36 scale means. ΔCFI was 0.006 between the metric and scalar models but was reduced to 0.001 when we freed intercepts for the burdens and symptoms and problems of kidney disease scales. In comparison to standardized means of 0 in the White group, those for the Black group on the Burdens, Symptoms and Problems, and Effects of Kidney Disease scales were 0.218, 0.061, and 0.161, respectively. When loadings and thresholds were released sequentially, differences in means between models ranged between 0.001 and 0.048. Despite some DIF, impacts on KDQOL TM -36 responses appear to be minimal. We conclude that the KDQOL TM -36 is appropriate to make substantive comparisons of HRQOL between Black and White dialysis patients.
Development of a PROMIS item bank to measure pain interference.
Amtmann, Dagmar; Cook, Karon F; Jensen, Mark P; Chen, Wen-Hung; Choi, Seung; Revicki, Dennis; Cella, David; Rothrock, Nan; Keefe, Francis; Callahan, Leigh; Lai, Jin-Shei
2010-07-01
This paper describes the psychometric properties of the PROMIS-pain interference (PROMIS-PI) bank. An initial candidate item pool (n=644) was developed and evaluated based on the review of existing instruments, interviews with patients, and consultation with pain experts. From this pool, a candidate item bank of 56 items was selected and responses to the items were collected from large community and clinical samples. A total of 14,848 participants responded to all or a subset of candidate items. The responses were calibrated using an item response theory (IRT) model. A final 41-item bank was evaluated with respect to IRT assumptions, model fit, differential item function (DIF), precision, and construct and concurrent validity. Items of the revised bank had good fit to the IRT model (CFI and NNFI/TLI ranged from 0.974 to 0.997), and the data were strongly unidimensional (e.g., ratio of first and second eigenvalue=35). Nine items exhibited statistically significant DIF. However, adjusting for DIF had little practical impact on score estimates and the items were retained without modifying scoring. Scores provided substantial information across levels of pain; for scores in the T-score range 50-80, the reliability was equivalent to 0.96-0.99. Patterns of correlations with other health outcomes supported the construct validity of the item bank. The scores discriminated among persons with different numbers of chronic conditions, disabling conditions, levels of self-reported health, and pain intensity (p<0.0001). The results indicated that the PROMIS-PI items constitute a psychometrically sound bank. Computerized adaptive testing and short forms are available. Copyright 2010 International Association for the Study of Pain. All rights reserved.
Nayak, Madhabika B; Bond, Jason C; Greenfield, Thomas K
2015-01-01
Efficient alcohol screening measures are important to prevent or treat alcohol use disorders (AUDs). We studied different versions of the Alcohol Use Disorders Identification Test (AUDIT) comparing their performance to the full AUDIT and an AUD measure as screeners for alcohol use problems in Goa, India. Data from a general population study on 743 male drinkers aged 18-49 years are reported. Drinkers completed the AUDIT and an AUD measure. We created shorter versions of the AUDIT by (a) collapsing AUDIT item responses into three and two categories and (b) deleting two items with the lowest factor loadings. Each version was evaluated using factor, reliability and validity, and differential item functioning (DIF) analysis by age, education, standard of living index (SLI), and area of residence. A single factor solution was found for each version with lower factor loadings for items on guilt and concern. There were no significant differences among the different AUDIT versions in predicting AUD. No significant DIF was found by education, SLI or area of residence. DIF was observed for the alcohol frequency item by age. The AUDIT may be used with dichotomized response options without loss of predictive validity. A shortened eight-item dichotomized scale can adequately screen for AUDs in Goa when brevity is of paramount importance, although with lower predictive validity. Although the frequency item was endorsed more by older men, there is no evidence that the AUDIT items perform differently in other groups of male drinkers in Goa.
ERIC Educational Resources Information Center
Aryadoust, Vahid
2012-01-01
This article investigates a version of the International English Language Testing System (IELTS) listening test for evidence of differential item functioning (DIF) based on gender, nationality, age, and degree of previous exposure to the test. Overall, the listening construct was found to be underrepresented, which is probably an important cause…
ERIC Educational Resources Information Center
Quesen, Sarah
2016-01-01
When studying differential item functioning (DIF) with students with disabilities (SWD) focal groups typically suffer from small sample size, whereas the reference group population is usually large. This makes it possible for a researcher to select a sample from the reference population to be similar to the focal group on the ability scale. Doing…
A Robust Outlier Approach to Prevent Type I Error Inflation in Differential Item Functioning
ERIC Educational Resources Information Center
Magis, David; De Boeck, Paul
2012-01-01
The identification of differential item functioning (DIF) is often performed by means of statistical approaches that consider the raw scores as proxies for the ability trait level. One of the most popular approaches, the Mantel-Haenszel (MH) method, belongs to this category. However, replacing the ability level by the simple raw score is a source…
ERIC Educational Resources Information Center
Socha, Alan; DeMars, Christine E.; Zilberberg, Anna; Phan, Ha
2015-01-01
The Mantel-Haenszel (MH) procedure is commonly used to detect items that function differentially for groups of examinees from various demographic and linguistic backgrounds--for example, in international assessments. As in some other DIF methods, the total score is used to match examinees on ability. In thin matching, each of the total score…
Vélez, Claudia Marcela; Villada Ramírez, Adriana C; Arias, Ana Carolina Amaya; Eslava-Schmalbach, Javier H
2016-01-01
The aim of this study was to validate the PedsQL 4.0™ in Colombian children and adolescents using the Rasch model. The Paediatric Quality of Life Inventory (PedsQL 4.0™) has demonstrated to be a reliable and sensitive measurement to changes in health status, as well as being quick and easy to use. Validation study of measurement tools. The PedsQL 4.0™ was applied to a convenience sample of 375 children and adolescents between 5 and 17 years old and 500 caregivers of children between 2 and 18 years old in five Colombian cities. The psychometric properties were analysed according to the Rasch model, including adjustment, separation, and differential item functioning (DIF). The Rasch model provided adequate fits to data. The social dimension, for both versions, had greater difficulty than the physical health dimension. Internal consistency for the items was observed, while for individuals, the values of reliability and separation were lower than that established. The DIF occurred in very few variables, especially when comparing cities. The characteristic curves for the items presented disordered thresholds. The items had adequate internal consistency. Analysis showed adequate individual separation, but disordered thresholds were found in the response categories. No DIF was observed by sex or disease, but it is noteworthy that the DIF occurred between cities. Copyright © 2016 Asociación Colombiana de Psiquiatría. Publicado por Elsevier España. All rights reserved.
Are Teacher Course Evaluations Biased against Faculty That Teach Quantitative Methods Courses?
ERIC Educational Resources Information Center
Royal, Kenneth D.; Stockdale, Myrah R.
2015-01-01
The present study investigated graduate students' responses to teacher/course evaluations (TCE) to determine if students' responses were inherently biased against faculty who teach quantitative methods courses. Item response theory (IRT) and Differential Item Functioning (DIF) techniques were utilized for data analysis. Results indicate students…
ERIC Educational Resources Information Center
Immekus, Jason C.; Maller, Susan J.
2009-01-01
The Kaufman Adolescent and Adult Intelligence Test (KAIT[TM]) is an individually administered test of intelligence for individuals ranging in age from 11 to 85+ years. The item response theory-likelihood ratio procedure, based on the two-parameter logistic model, was used to detect differential item functioning (DIF) in the KAIT across males and…
Teresi, Jeanne A.; Jones, Richard N.
2017-01-01
The purpose of this article is to introduce the methods used and challenges confronted by the authors of this two-part series of articles describing the results of analyses of measurement equivalence of the short form scales from the Patient Reported Outcomes Measurement Information System® (PROMIS®). Qualitative and quantitative approaches used to examine differential item functioning (DIF) are reviewed briefly. Qualitative methods focused on generation of DIF hypotheses. The basic quantitative approaches used all rely on a latent variable model, and examine parameters either derived directly from item response theory (IRT) or from structural equation models (SEM). A key methods focus of these articles is to describe state-of-the art approaches to examination of measurement equivalence in eight domains: physical health, pain, fatigue, sleep, depression, anxiety, cognition, and social function. These articles represent the first time that DIF has been examined systematically in the PROMIS short form measures, particularly among ethnically diverse groups. This is also the first set of analyses to examine the performance of PROMIS short forms in patients with cancer. Latent variable model state-of-the-art methods for examining measurement equivalence are introduced briefly in this paper to orient readers to the approaches adopted in this set of papers. Several methodological challenges underlying (DIF-free) anchor item selection and model assumption violations are presented as a backdrop for the articles in this two-part series on measurement equivalence of PROMIS measures. PMID:28983448
Teresi, Jeanne A; Jones, Richard N
2016-01-01
The purpose of this article is to introduce the methods used and challenges confronted by the authors of this two-part series of articles describing the results of analyses of measurement equivalence of the short form scales from the Patient Reported Outcomes Measurement Information System ® (PROMIS ® ). Qualitative and quantitative approaches used to examine differential item functioning (DIF) are reviewed briefly. Qualitative methods focused on generation of DIF hypotheses. The basic quantitative approaches used all rely on a latent variable model, and examine parameters either derived directly from item response theory (IRT) or from structural equation models (SEM). A key methods focus of these articles is to describe state-of-the art approaches to examination of measurement equivalence in eight domains: physical health, pain, fatigue, sleep, depression, anxiety, cognition, and social function. These articles represent the first time that DIF has been examined systematically in the PROMIS short form measures, particularly among ethnically diverse groups. This is also the first set of analyses to examine the performance of PROMIS short forms in patients with cancer. Latent variable model state-of-the-art methods for examining measurement equivalence are introduced briefly in this paper to orient readers to the approaches adopted in this set of papers. Several methodological challenges underlying (DIF-free) anchor item selection and model assumption violations are presented as a backdrop for the articles in this two-part series on measurement equivalence of PROMIS measures.
Kisala, Pamela A.; Victorson, David; Pace, Natalie; Heinemann, Allen W.; Choi, Seung W.; Tulsky, David S.
2015-01-01
Objective To describe the development and psychometric properties of the SCI-QOL Psychological Trauma item bank and short form. Design Using a mixed-methods design, we developed and tested a Psychological Trauma item bank with patient and provider focus groups, cognitive interviews, and item response theory based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. Setting We tested a 31-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Veterans Administration hospital. Participants A total of 716 individuals with SCI completed the trauma items Results The 31 items fit a unidimensional model (CFI=0.952; RMSEA=0.061) and demonstrated good precision (theta range between 0.6 and 2.5). Nine items demonstrated negligible DIF with little impact on score estimates. The final calibrated item bank contains 19 items Conclusion The SCI-QOL Psychological Trauma item bank is a psychometrically robust measurement tool from which a short form and a computer adaptive test (CAT) version are available. PMID:26010967
Dirven, Linda; Groenvold, Mogens; Taphoorn, Martin J B; Conroy, Thierry; Tomaszewski, Krzysztof A; Young, Teresa; Petersen, Morten Aa
2017-11-01
The European Organisation of Research and Treatment of Cancer (EORTC) Quality of Life Group is developing computerized adaptive testing (CAT) versions of all EORTC Quality of Life Questionnaire (QLQ-C30) scales with the aim to enhance measurement precision. Here we present the results on the field-testing and psychometric evaluation of the item bank for cognitive functioning (CF). In previous phases (I-III), 44 candidate items were developed measuring CF in cancer patients. In phase IV, these items were psychometrically evaluated in a large sample of international cancer patients. This evaluation included an assessment of dimensionality, fit to the item response theory (IRT) model, differential item functioning (DIF), and measurement properties. A total of 1030 cancer patients completed the 44 candidate items on CF. Of these, 34 items could be included in a unidimensional IRT model, showing an acceptable fit. Although several items showed DIF, these had a negligible impact on CF estimation. Measurement precision of the item bank was much higher than the two original QLQ-C30 CF items alone, across the whole continuum. Moreover, CAT measurement may on average reduce study sample sizes with about 35-40% compared to the original QLQ-C30 CF scale, without loss of power. A CF item bank for CAT measurement consisting of 34 items was established, applicable to various cancer patients across countries. This CAT measurement system will facilitate precise and efficient assessment of HRQOL of cancer patients, without loss of comparability of results.
Rasch analysis of the hospital anxiety and depression scale among Chinese cataract patients.
Lin, Xianchai; Chen, Ziyan; Jin, Ling; Gao, Wuyou; Qu, Bo; Zuo, Yajing; Liu, Rongjiao; Yu, Minbin
2017-01-01
To analyze the validity of the Hospital Anxiety and Depression Scale (HADS) among Chinese cataract population. A total of 275 participants with unilateral or bilateral cataract were recruited to complete the Chinese version of HADS. The patients' demographic and ophthalmic characteristics were documented. Rasch analysis was conducted to examine the model fit statistics, the thresholds ordering of the polytomous items, targeting, person separation index and reliability, local dependency, unidimentionality, differential item functioning (DIF) and construct validity of the HADS individual and summary measures. Rasch analysis was performed on anxiety and depression subscales as well as HADS-Total score respectively. The items of original HADS-Anxiety, HADS-Depression and HADS-Total demonstrated evidence of misfit of the Rasch model. Removing items A7 for anxiety subscale and rescoring items D14 for depression subscale significantly improved Rasch model fit. A 12-item higher order total scale with further removal of D12 was found to fit the Rasch model. The modified items had ordered response thresholds. No uniform DIF was detected, whereas notable non-uniform DIF in high-ability group was found. The revised cut-off points were given for the modified anxiety and depression subscales. The modified version of HADS with HADS-A and HADS-D as subscale and HADS-T as a higher-order measure is a reliable and valid instrument that may be useful for assessing anxiety and depression states in Chinese cataract population.
Psychological distress in cancer survivors: the further development of an item bank.
Smith, Adam B; Armes, Jo; Richardson, Alison; Stark, Dan P
2013-02-01
Assessment of psychological distress by patient report is necessary to meet patients' needs throughout the cancer journey. We have previously developed an item bank to assess psychological distress but not evaluated it for cancer survivors. Our first aim in this study was to test whether we could extend our item bank to include cancer survivors. The second aim was to examine whether the item bank could assess positive affect as a single construct alongside negative psychological symptoms. Responses from 1315 cancer survivors to the Hospital Anxiety and Depression Scale (HADS) and the Positive and Negative Affect Scale (PANAS) were considered for inclusion in a pre-existing item bank created from a heterogeneous sample of 4914 cancer patients. Differential item functioning (DIF) was used to assess whether HADS responses drawn from the two samples were equivalent. Common-item equating was used to anchor the shared (HADS) items, whilst the PANAS items were added. Item fit was evaluated at each stage, and misfitting items were removed. Unidimensionality was assessed with a principal components factor analysis. The DIF analysis did not reveal any differences between the HADS item locations from the two samples. Three misfitting PANAS items were removed, resulting in a final unidimensional bank of 80 items with good internal reliability (α = 0.85). The new item bank is valid for use across the cancer journey, including cancer survivors, and modestly improves the assessment of all levels of psychological distress and positive psychological function. Copyright © 2011 John Wiley & Sons, Ltd.
He, Qiwei; Glas, Cees A W; Veldkamp, Bernard P
2014-06-01
This article explores the generalizability of the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) diagnostic criteria for post-traumatic stress disorder (PTSD) to various subpopulations. Besides identifying the differential symptom functioning (also referred to as differential item functioning [DIF]) related to various background variables such as gender, marital status and educational level, this study emphasizes the importance of evaluating the impact of DIF on population inferences as made in health surveys and clinical trials, and on the diagnosis of individual patients. Using a sample from the National Comorbidity Study-Replication (NCS-R), four symptoms for gender, one symptom for marital status, and three symptoms for educational level were significantly flagged as DIF, but their impact on diagnosis was fairly small. We conclude that the DSM-IV diagnostic criteria for PTSD do not produce substantially biased results in the investigated subpopulations, and there should be few reservations regarding their use. Further, although the impact of DIF (i.e. the influence of differential symptom functioning on diagnostic results) was found to be quite small in the current study, we recommend that diagnosticians always perform a DIF analysis of various subpopulations using the methodology presented here to ensure the diagnostic criteria is valid in their own studies. Copyright © 2014 John Wiley & Sons, Ltd.
Fairness in Computerized Testing: Detecting Item Bias Using CATSIB with Impact Present
ERIC Educational Resources Information Center
Chu, Man-Wai; Lai, Hollis
2013-01-01
In educational assessment, there is an increasing demand for tailoring assessments to individual examinees through computer adaptive tests (CAT). As such, it is particularly important to investigate the fairness of these adaptive testing processes, which require the investigation of differential item function (DIF) to yield information about item…
Nayak, Madhabika B.; Bond, Jason C.; Greenfield, Thomas K.
2015-01-01
Background Efficient alcohol screening measures are important to prevent or treat alcohol use disorders (AUDs). Objectives We studied different versions of the Alcohol Use Disorders Identification Test (AUDIT) comparing their performance to the full AUDIT and an AUD measure as screeners for alcohol use problems in Goa, India. Methods Data from a general population study on 743 male drinkers aged 18 to 49 years are reported. Drinkers completed the AUDIT and an AUD measure. We created shorter versions of the AUDIT by a) collapsing AUDIT item responses into 3 and 2 categories and b) deleting 2 items with the lowest factor loadings. Each version was evaluated using factor, reliability and validity, and differential item functioning (DIF) analysis by age, education, standard of living index (SLI), and area of residence. Results A single factor solution was found for each version with lower factor loadings for items on guilt and concern. There were no significant differences among the different AUDIT versions in predicting AUD. No significant DIF was found by education, SLI or area of residence. DIF was observed for the alcohol frequency item by age. Conclusions/Importance The AUDIT may be used with dichotomized response options without loss of predictive validity. A shortened 8-item dichotomized scale can adequately screen for AUDs in Goa when brevity is of paramount importance, although with lower predictive validity. Although the frequency item was endorsed more by older men, there is no evidence that the AUDIT items perform differently in other groups of male drinkers in Goa. PMID:26549791
Computer-adaptive test to measure community reintegration of Veterans.
Resnik, Linda; Tian, Feng; Ni, Pengsheng; Jette, Alan
2012-01-01
The Community Reintegration of Injured Service Members (CRIS) measure consists of three scales measuring extent of, perceived limitations in, and satisfaction with community reintegration. Length of the CRIS may be a barrier to its widespread use. Using item response theory (IRT) and computer-adaptive test (CAT) methodologies, this study developed and evaluated a briefer community reintegration measure called the CRIS-CAT. Large item banks for each CRIS scale were constructed. A convenience sample of 517 Veterans responded to all items. Exploratory and confirmatory factor analyses (CFAs) were used to identify the dimensionality within each domain, and IRT methods were used to calibrate items. Accuracy and precision of CATs of different lengths were compared with the full-item bank, and data were examined for differential item functioning (DIF). CFAs supported unidimensionality of scales. Acceptable item fit statistics were found for final models. Accuracy of 10-, 15-, 20-, and variable-item CATs for all three scales was 0.88 or above. CAT precision increased with number of items administered and decreased at the upper ranges of each scale. Three items exhibited moderate DIF by sex. The CRIS-CAT demonstrated promising measurement properties and is recommended for use in community reintegration assessment.
2010-01-01
Objectives. To evaluate, by age, the performance of 2 disability measures based on needing help: one using 5 classic activities of daily living (ADL) and another using an expanded set of 14 activities including instrumental activities of daily living (IADL), walking, getting outside, and ADL (IADL/ADL). Methods. Guttman and item response theory (IRT) scaling methods are used with a large (N = 25,470) nationally representative household survey of individuals aged 18 years and older. Results. Guttman scalability of the ADL items increases steadily with age, reaching a high level at ages 75 years and older. That is reflected in an IRT model by age-related differential item functioning (DIF) resulting in age-biased measurement of ADL. Guttman scalability of the IADL/ADL items also increases with age but is lower than the ADL. Although age-related DIF also occurs with IADL/ADL items, DIF is lower in magnitude and balances out without causing age bias. Discussion. An IADL/ADL scale measuring need for help is hierarchical, unidimensional, and unbiased by age. It has greater content validity for measuring need for help in the community and shows greater sensitivity by age than the classic ADL measure. As demand for community services is increasing among adults of all ages, an expanded IADL/ADL measure is more useful than ADL. PMID:20100786
Paz, Sylvia H; Spritzer, Karen L; Morales, Leo S; Hays, Ron D
2013-09-01
To evaluate the equivalence of the PROMIS(®) physical functioning item bank by language of administration (English versus Spanish). The PROMIS(®) wave 1 English-language physical functioning bank consists of 124 items, and 114 of these were translated into Spanish. Item frequencies, means and standard deviations, item-scale correlations, and internal consistency reliability were calculated. The IRT assumption of unidimensionality was evaluated by fitting a single-factor confirmatory factor analytic model. IRT threshold and discrimination parameters were estimated using Samejima's Graded Response Model. DIF by language of administration was evaluated. Item means ranged from 2.53 (SD = 1.36) to 4.62 (SD = 0.82). Coefficient alpha was 0.99, and item-rest correlations ranged from 0.41 to 0.89. A one-factor model fits the data well (CFI = 0.971, TLI = 0.970, and RMSEA = 0.052). The slope parameters ranged from 0.45 ("Are you able to run 10 miles?") to 4.50 ("Are you able to put on a shirt or blouse?"). The threshold parameters ranged from -1.92 ("How much do physical health problems now limit your usual physical activities (such as walking or climbing stairs)?") to 6.06 ("Are you able to run 10 miles?"). Fifty of the 114 items were flagged for DIF based on an R(2) of 0.02 or above criterion. The expected total score was higher for Spanish- than English-language respondents. English- and Spanish-speaking subjects with the same level of underlying physical function responded differently to 50 of 114 items. This study has important implications in the study of physical functioning among diverse populations.
Chang, Chih-Cheng; Su, Jian-An; Tsai, Ching-Shu; Yen, Cheng-Fang; Liu, Jiun-Horng; Lin, Chung-Ying
2015-06-01
To examine the psychometrics of the Affiliate Stigma Scale using rigorous psychometric analysis: classical test theory (CTT) (traditional) and Rasch analysis (modern). Differential item functioning (DIF) items were also tested using Rasch analysis. Caregivers of relatives with mental illness (n = 453; mean age: 53.29 ± 13.50 years) were recruited from southern Taiwan. Each participant filled out four questionnaires: Affiliate Stigma Scale, Rosenberg Self-Esteem Scale, Beck Anxiety Inventory, and one background information sheet. CTT analyses showed that the Affiliate Stigma Scale had satisfactory internal consistency (α = 0.85-0.94) and concurrent validity (Rosenberg Self-Esteem Scale: r = -0.52 to -0.46; Beck Anxiety Inventory: r = 0.27-0.34). Rasch analyses supported the unidimensionality of three domains in the Affiliate Stigma Scale and indicated four DIF items (affect domain: 1; cognitive domain: 3) across gender. Our findings, based on rigorous statistical analysis, verified the psychometrics of the Affiliate Stigma Scale and reported its DIF items. We conclude that the three domains of the Affiliate Stigma Scale can be separately used and are suitable for measuring the affiliate stigma of caregivers of relatives with mental illness. Copyright © 2015 Elsevier Inc. All rights reserved.
Wang, Zonghua; Zhou, Juan; Luo, Xingli; Xu, Yan; She, Xi; Chen, Ling; Yin, Honghua; Wang, Xianyuan
2015-01-01
The impact of strabismus on visual function, self-image, self-esteem, and social interactions decrease health-related quality of life (HRQoL).The purpose of this study was to evaluate and refine the adult strabismus quality of life questionnaire (AS-20) by using Rasch analysis among Chinese adult patients with strabismus. We evaluated the fitness of the AS-20 with Rasch model in Chinese population by assessing unidimensionality, infit and outfit, person and item separation index and reliability, response ordering, targeting and differential item functioning (DIF). The overall AS-20 did not demonstrate unidimensional; however, it was achieved separately in the two Rasch-revised subscales: the psychosocial subscale (11 items) and the function subscale (9 items). The features of good targeting, optimal item infit and outfit, and no notable local dependence were found for each of the subscales. The rating scale was appropriate for the psychosocial subscale but a reduction to four response categories was required for the function subscale. No significant DIF were revealed for any demographic and clinical factors (e.g., age, gender, and strabismus types). The AS-20 was demonstrated by Rasch analysis to be a rigorous instrument for measuring health-related quality of life in Chinese strabismus patents if some revisions were made regarding the subscale construct and response options.
Three approaches to investigating the multidimensional nature of a science assessment
NASA Astrophysics Data System (ADS)
Gokiert, Rebecca Jayne
The purpose of this study was to investigate a multi-method approach for collecting validity evidence about the underlying knowledge and skills measured by a large-scale science assessment. The three approaches included analysis of dimensionality, differential item functioning (DIF), and think-aloud interviews. The specific research questions addressed were: (1) Does the 4-factor model previously found by Hamilton et al. (1995) for the grade 8 sample explain the data? (2) Do the performances of male and female students systematically differ? Are these performance differences captured in the dimensions? (3) Can think-aloud reports aid in the generation of hypotheses about the underlying knowledge and skills that are measured by this test? A confirmatory factor analysis of the 4-factor model revealed good model data fit for both the AB and AC tests. Twenty-four of the 83 AB test items and 16 of the 77 AC test items displayed significant DIF, however, items were found, on average, to favour both males and females equally. There were some systematic differences found across the 4-factors; items favouring males tended to be related to earth and space sciences, stereotypical male related activities, and numerical operations. Conversely, females were found to outperform males on items that required careful reading and attention to detail. Concurrent and retrospective verbal reports (Ericsson & Simon, 1993) were collected from 16 grade 8 students (9 male and 7 female) while they solved 12 DIF items. Four general cognitive processing themes were identified from the student protocols that could be used to explain male and female problem solving. The themes included comprehension (verbal and visual), visualization, background knowledge/experience (school or life), and strategy use. There were systematic differences in cognitive processing between the students that answered the items correctly and the students who answered the items incorrectly; however, this did not always correspond with the statistical gender DIF results. Although the multifaceted approach produced interpretable and meaningful validity evidence about the knowledge and skills, these forms of validity evidence only begin to provide a basic understanding of the underlying construct(s) that are being measured.
Examination of a Social-Networking Site Activities Scale (SNSAS) Using Rasch Analysis
ERIC Educational Resources Information Center
Alhaythami, Hassan; Karpinski, Aryn; Kirschner, Paul; Bolden, Edward
2017-01-01
This study examined the psychometric properties of a social-networking site (SNS) activities scale (SNSAS) using Rasch Analysis. Items were also examined with Rasch Principal Components Analysis (PCA) and Differential Item Functioning (DIF) across groups of university students (i.e., males and females from the United States [US] and Europe; N =…
ERIC Educational Resources Information Center
Dai, Yunyun
2013-01-01
Mixtures of item response theory (IRT) models have been proposed as a technique to explore response patterns in test data related to cognitive strategies, instructional sensitivity, and differential item functioning (DIF). Estimation proves challenging due to difficulties in identification and questions of effect size needed to recover underlying…
ERIC Educational Resources Information Center
Baylor, Carolyn; McAuliffe, Megan J.; Hughes, Louise E.; Yorkston, Kathryn; Anderson, Tim; Jiseon, Kim; Amtmann, Dagmar
2014-01-01
Purpose: To examine the cross-cultural applicability of the Communicative Participation Item Bank (CPIB) through a comparison of respondents with Parkinson's disease (PD) from the United States and New Zealand. Method: A total of 428 respondents--218 from the United States and 210 from New Zealand-completed the self-report CPIB and a series of…
Romero, Dulce; Ricarte, Jorge J.; Serrano, Juan P.; Nieto, Marta; Latorre, Jose M.
2018-01-01
The Autobiographical Memory Test (AMT) is the most widely used measure of overgeneral autobiographical memory (OGM). The AMT appears to have good psychometric properties, but more research is needed on the influence and applicability of individual cue words in different languages and populations. To date, no studies have evaluated its usefulness as a measure of OMG in Spanish or older populations. This work aims to analyze the applicability of the AMT in young and older Spanish samples. We administered a Spanish version of the AMT to samples of young (N = 520) and older adults (N = 155). We conducted confirmatory factor analysis (CFA), item response theory-based analysis (IRT) and differential item functioning (DIF). Results confirm the one-factor structure for the AMT. IRT analysis suggests that both groups find the AMT easy given that they generally perform well, and that it is more precise in individuals who score low on memory specificity. DIF analysis finds three items differ in their functioning depending on age group. This differential functioning of these items affects the overall AMT scores and, thus, they should be excluded from the AMT in studies comparing young and older samples. We discuss the possible implications of the samples and cue words used. PMID:29672583
Ros, Laura; Romero, Dulce; Ricarte, Jorge J; Serrano, Juan P; Nieto, Marta; Latorre, Jose M
2018-01-01
The Autobiographical Memory Test (AMT) is the most widely used measure of overgeneral autobiographical memory (OGM). The AMT appears to have good psychometric properties, but more research is needed on the influence and applicability of individual cue words in different languages and populations. To date, no studies have evaluated its usefulness as a measure of OMG in Spanish or older populations. This work aims to analyze the applicability of the AMT in young and older Spanish samples. We administered a Spanish version of the AMT to samples of young (N = 520) and older adults (N = 155). We conducted confirmatory factor analysis (CFA), item response theory-based analysis (IRT) and differential item functioning (DIF). Results confirm the one-factor structure for the AMT. IRT analysis suggests that both groups find the AMT easy given that they generally perform well, and that it is more precise in individuals who score low on memory specificity. DIF analysis finds three items differ in their functioning depending on age group. This differential functioning of these items affects the overall AMT scores and, thus, they should be excluded from the AMT in studies comparing young and older samples. We discuss the possible implications of the samples and cue words used.
Wu, Li-Tzy; Pan, Jeng-Jong; Yang, Chongming; Reeve, Bryce B.; Blazer, Dan G.
2009-01-01
Aim This study applied both item response theory (IRT) and multiple indicators–multiple causes (MIMIC) methods to evaluate item-level psychometric properties of diagnostic questions for hallucinogen use disorders (HUDs), differential item functioning (DIF), and predictors of latent HUD. Methods Data were drawn from 2004–2006 National Surveys on Drug Use and Health. Analyses were based on 1548 past-year hallucinogen users aged 12–17 years. Substance use and symptoms were assessed by audio computer-assisted self-interviewing methods. Results Abuse and dependence criteria empirically were arrayed along a single continuum of severity. All abuse criteria indicated middle-to-high severity on the IRT-defined HUD continuum, while dependence criteria captured a wider range from the lowest (tolerance and time spent) to the highest (taking larger amounts and inability to cut down) severity levels. There was indication of DIF by hallucinogen users’ age, gender, race/ethnicity, and ecstasy use status. Adjusting for DIF, ecstasy users (vs. non-ecstasy hallucinogen users), females (vs. males), and whites (vs. Hispanics) exhibited increased odds of HUD. Conclusions Symptoms of hallucinogen abuse and dependence empirically do not reflect two discrete conditions in adolescents. Trends and problems related to hallucinogen use among girls and whites should be examined further to inform the designs of effective gender-appropriate and culturally sensitive prevention programs. PMID:19896773
Differential item functioning magnitude and impact measures from item response theory models.
Kleinman, Marjorie; Teresi, Jeanne A
2016-01-01
Measures of magnitude and impact of differential item functioning (DIF) at the item and scale level, respectively are presented and reviewed in this paper. Most measures are based on item response theory models. Magnitude refers to item level effect sizes, whereas impact refers to differences between groups at the scale score level. Reviewed are magnitude measures based on group differences in the expected item scores and impact measures based on differences in the expected scale scores. The similarities among these indices are demonstrated. Various software packages are described that provide magnitude and impact measures, and new software presented that computes all of the available statistics conveniently in one program with explanations of their relationships to one another.
Yount, Kathryn M; VanderEnde, Kristin; Zureick-Brown, Sarah; Minh, Tran Hung; Schuler, Sidney Ruth; Anh, Hoang Tu
2014-06-01
Attitudes about intimate partner violence (IPV) against women are widely surveyed, but attitudes about women's recourse after exposure to IPV are understudied, despite their importance for intervention. Designed through qualitative research and administered in a probability sample of 1,054 married men and women 18 to 50 years in My Hao District, Vietnam, the ATT-RECOURSE scale measures men's and women's attitudes about a wife's recourse after exposure to physical IPV. Data were initially collected for nine items. Exploratory factor analysis (EFA) with one random split-half sample (N 1 = 526) revealed a one-factor model with significant loadings (0.316-0.686) for six items capturing a wife's silence, informal recourse, and formal recourse. A confirmatory factor analysis (CFA) with the other random split-half sample (N 2 = 528) showed adequate fit for the six-item model and significant factor loadings of similar magnitude to the EFA results (0.412-0.669). For the six items retained, men consistently favored recourse more often than did women (52.4%-66.0% of men vs. 41.9%-55.2% of women). Tests for uniform differential item functioning (DIF) by gender revealed one item with significant uniform DIF, and adjusting for this revealed an even larger gap in men's and women's attitudes, with men favoring recourse, on average, more than women. The six-item ATT-RECOURSE scale is reliable across independent samples and exhibits little uniform DIF by gender, supporting its use in surveys of men and women. Further methodological research is discussed. Research is needed in Vietnam about why women report less favorable attitudes than men regarding women's recourse after physical IPV.
Assessment of the psychometrics of a PROMIS item bank: self-efficacy for managing daily activities
Hong, Ickpyo; Li, Chih-Ying; Romero, Sergio; Gruber-Baldini, Ann L.; Shulman, Lisa M.
2017-01-01
Purpose The aim of this study is to investigate the psychometrics of the Patient-Reported Outcomes Measurement Information System self-efficacy for managing daily activities item bank. Methods The item pool was field tested on a sample of 1087 participants via internet (n = 250) and in-clinic (n = 837) surveys. All participants reported having at least one chronic health condition. The 35 item pool was investigated for dimensionality (confirmatory factor analyses, CFA and exploratory factor analysis, EFA), item-total correlations, local independence, precision, and differential item functioning (DIF) across gender, race, ethnicity, age groups, data collection modes, and neurological chronic conditions (McFadden Pseudo R2 less than 10 %). Results The item pool met two of the four CFA fit criteria (CFI = 0.952 and SRMR = 0.07). EFA analysis found a dominant first factor (eigenvalue = 24.34) and the ratio of first to second eigenvalue was 12.4. The item pool demonstrated good item-total correlations (0.59–0.85) and acceptable internal consistency (Cronbach’s alpha = 0.97). The item pool maintained its precision (reliability over 0.90) across a wide range of theta (3.70), and there was no significant DIF. Conclusion The findings indicated the item pool has sound psychometric properties and the test items are eligible for development of computerized adaptive testing and short forms. PMID:27048495
Assessment of the psychometrics of a PROMIS item bank: self-efficacy for managing daily activities.
Hong, Ickpyo; Velozo, Craig A; Li, Chih-Ying; Romero, Sergio; Gruber-Baldini, Ann L; Shulman, Lisa M
2016-09-01
The aim of this study is to investigate the psychometrics of the Patient-Reported Outcomes Measurement Information System self-efficacy for managing daily activities item bank. The item pool was field tested on a sample of 1087 participants via internet (n = 250) and in-clinic (n = 837) surveys. All participants reported having at least one chronic health condition. The 35 item pool was investigated for dimensionality (confirmatory factor analyses, CFA and exploratory factor analysis, EFA), item-total correlations, local independence, precision, and differential item functioning (DIF) across gender, race, ethnicity, age groups, data collection modes, and neurological chronic conditions (McFadden Pseudo R (2) less than 10 %). The item pool met two of the four CFA fit criteria (CFI = 0.952 and SRMR = 0.07). EFA analysis found a dominant first factor (eigenvalue = 24.34) and the ratio of first to second eigenvalue was 12.4. The item pool demonstrated good item-total correlations (0.59-0.85) and acceptable internal consistency (Cronbach's alpha = 0.97). The item pool maintained its precision (reliability over 0.90) across a wide range of theta (3.70), and there was no significant DIF. The findings indicated the item pool has sound psychometric properties and the test items are eligible for development of computerized adaptive testing and short forms.
Kalpakjian, Claire Z.; Tate, Denise G.; Kisala, Pamela A.; Tulsky, David S.
2015-01-01
Objective To describe the development and psychometric properties of the Spinal Cord Injury-Quality of Life (SCI-QOL) Self-esteem item bank. Design Using a mixed-methods design, we developed and tested a self-esteem item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory- (IRT) based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. Setting We tested a pool of 30 items at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters/Bronx Department of Veterans Affairs hospital. Participants A total of 717 individuals with SCI completed the self-esteem items. Results A unidimensional model was observed (CFI = 0.946; RMSEA = 0.087) and measurement precision was good (theta range between −2.7 and 0.7). Eleven items were flagged for DIF; however, effect sizes were negligible with little practical impact on score estimates. The final calibrated item bank resulted in 23 retained items. Conclusion This study indicates that the SCI-QOL Self-esteem item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010972
Kalpakjian, Claire Z; Tate, Denise G; Kisala, Pamela A; Tulsky, David S
2015-05-01
To describe the development and psychometric properties of the Spinal Cord Injury-Quality of Life (SCI-QOL) Self-esteem item bank. Using a mixed-methods design, we developed and tested a self-esteem item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory-(IRT) based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. We tested a pool of 30 items at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters/Bronx Department of Veterans Affairs hospital. A total of 717 individuals with SCI completed the self-esteem items. A unidimensional model was observed (CFI=0.946; RMSEA=0.087) and measurement precision was good (theta range between -2.7 and 0.7). Eleven items were flagged for DIF; however, effect sizes were negligible with little practical impact on score estimates. The final calibrated item bank resulted in 23 retained items. This study indicates that the SCI-QOL Self-esteem item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available.
Victorson, David; Tulsky, David S; Kisala, Pamela A; Kalpakjian, Claire Z; Weiland, Brian; Choi, Seung W
2015-05-01
To describe the development and psychometric properties of the Spinal Cord Injury--Quality of Life (SCI-QOL) Resilience item bank and short form. Using a mixed-methods design, we developed and tested a resilience item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory based analytic approaches, including tests of model fit and differential item functioning (DIF). We tested a 32-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs medical center. A total of 717 individuals with SCI completed the Resilience items. A unidimensional model was observed (CFI=0.968; RMSEA=0.074) and measurement precision was good (theta range between -3.1 and 0.9). Ten items were flagged for DIF, however, after examination of effect sizes we found this to be negligible with little practical impact on score estimates. The final calibrated item bank resulted in 21 retained items. This study indicates that the SCI-QOL Resilience item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available.
Victorson, David; Tulsky, David S.; Kisala, Pamela A.; Kalpakjian, Claire Z.; Weiland, Brian; Choi, Seung W.
2015-01-01
Objective To describe the development and psychometric properties of the Spinal Cord Injury - Quality of Life (SCI-QOL) Resilience item bank and short form. Design Using a mixed-methods design, we developed and tested a resilience item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory based analytic approaches, including tests of model fit and differential item functioning (DIF). Setting We tested a 32-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs medical center. Participants A total of 717 individuals with SCI completed the Resilience items. Results A unidimensional model was observed (CFI = 0.968; RMSEA = 0.074) and measurement precision was good (theta range between −3.1 and 0.9). Ten items were flagged for DIF, however, after examination of effect sizes we found this to be negligible with little practical impact on score estimates. The final calibrated item bank resulted in 21 retained items. Conclusion This study indicates that the SCI-QOL Resilience item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010971
Debast, Inge; Rossi, Gina; Feenstra, Dineke; Hutsebaut, Joost
2017-04-01
Criterion D of the Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM-5 ; American Psychiatric Association [APA], 2013) refers to a possible onset of personality disorders (PDs) in adolescence and in Section II the development/course in adolescence is described by some typical characteristics for several PDs. Yet, age-specific expressions of PDs are lacking in Section III. We urgently need a developmentally sensitive assessment instrument that differentiates developmental and contextual changes on the one hand from expressions of personality pathology on the other hand. Therefore we investigated which items of the Severity Indices for Personality Problems-118 (SIPP-118) were developmentally sensitive throughout adolescence and adulthood and which could be considered more age-specific markers requiring other content or thresholds over age groups. Applying item response theory (IRT) we detected differential item functioning (DIF) in 36% of the items in matched samples of 639 adolescents versus 639 adults. The DIF across age groups mainly reflected a different degree of symptom expressions for the same underlying level of functioning. The threshold for exhibiting symptoms given a certain degree of personality dysfunction was lower in adolescence for areas of personality functioning related to the Self and Interpersonal domains. Some items also measured a latent construct of personality functioning differently across adolescents and adults. This suggests that several facets of the SIPP-118 do not solely measure aspects of personality pathology in adolescents, but likely include more developmental issues. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
ERIC Educational Resources Information Center
Puhan, Gautam; Boughton, Keith A.; Kim, Sooyeon
2005-01-01
The study evaluated the comparability of two versions of a teacher certification test: a paper-and-pencil test (PPT) and computer-based test (CBT). Standardized mean difference (SMD) and differential item functioning (DIF) analyses were used as measures of comparability at the test and item levels, respectively. Results indicated that effect sizes…
Mokkink, Lidwine Brigitta; Galindo-Garre, Francisca; Uitdehaag, Bernard Mj
2016-12-01
The Multiple Sclerosis Walking Scale-12 (MSWS-12) measures walking ability from the patients' perspective. We examined the quality of the MSWS-12 using an item response theory model, the graded response model (GRM). A total of 625 unique Dutch multiple sclerosis (MS) patients were included. After testing for unidimensionality, monotonicity, and absence of local dependence, a GRM was fit and item characteristics were assessed. Differential item functioning (DIF) for the variables gender, age, duration of MS, type of MS and severity of MS, reliability, total test information, and standard error of the trait level (θ) were investigated. Confirmatory factor analysis showed a unidimensional structure of the 12 items of the scale, explaining 88% of the variance. Item 2 did not fit into the GRM model. Reliability was 0.93. Items 8 and 9 (of the 11 and 12 item version respectively) showed DIF on the variable severity, based on the Expanded Disability Status Scale (EDSS). However, the EDSS is strongly related to the content of both items. Our results confirm the good quality of the MSWS-12. The trait level (θ) scores and item parameters of both the 12- and 11-item versions were highly comparable, although we do not suggest to change the content of the MSWS-12. © The Author(s), 2016.
The Empirical Selection of Anchor Items Using a Multistage Approach
ERIC Educational Resources Information Center
Craig, Brandon
2017-01-01
The purpose of this study was to determine if using a multistage approach for the empirical selection of anchor items would lead to more accurate DIF detection rates than the anchor selection methods proposed by Kopf, Zeileis, & Strobl (2015b). A simulation study was conducted in which the sample size, percentage of DIF, and balance of DIF…
Item Response Theory Analyses of the Cambridge Face Memory Test (CFMT)
Cho, Sun-Joo; Wilmer, Jeremy; Herzmann, Grit; McGugin, Rankin; Fiset, Daniel; Van Gulick, Ana E.; Ryan, Katie; Gauthier, Isabel
2014-01-01
We evaluated the psychometric properties of the Cambridge face memory test (CFMT; Duchaine & Nakayama, 2006). First, we assessed the dimensionality of the test with a bi-factor exploratory factor analysis (EFA). This EFA analysis revealed a general factor and three specific factors clustered by targets of CFMT. However, the three specific factors appeared to be minor factors that can be ignored. Second, we fit a unidimensional item response model. This item response model showed that the CFMT items could discriminate individuals at different ability levels and covered a wide range of the ability continuum. We found the CFMT to be particularly precise for a wide range of ability levels. Third, we implemented item response theory (IRT) differential item functioning (DIF) analyses for each gender group and two age groups (Age ≤ 20 versus Age > 21). This DIF analysis suggested little evidence of consequential differential functioning on the CFMT for these groups, supporting the use of the test to compare older to younger, or male to female, individuals. Fourth, we tested for a gender difference on the latent facial recognition ability with an explanatory item response model. We found a significant but small gender difference on the latent ability for face recognition, which was higher for women than men by 0.184, at age mean 23.2, controlling for linear and quadratic age effects. Finally, we discuss the practical considerations of the use of total scores versus IRT scale scores in applications of the CFMT. PMID:25642930
McGrane, J A; Butow, P N; Sze, M; Eisenbruch, M; Goldstein, D; King, M T
2014-12-01
The purpose of this study was to assess the invariance of a culturally competent multi-lingual unmet needs survey. A cross-sectional study was conducted among immigrants of Arabic-, Chinese- and Greek-speaking backgrounds, and Anglo-Australian-born controls, recruited through Cancer Registries (n = 591) and oncology clinics (n = 900). The survey included four subscales, with newly developed items addressing unmet need in culturally competent health information and patient support (CCHIPS), and items adapted from existing questionnaires addressing physical and daily living (PDL), sexuality (SEX) and survivorship (SURV) unmet need. The survey was translated into Arabic, Chinese and Greek. Rasch analysis was carried out on the four domains. Whilst many items were mistargeted to less prevalent areas of unmet need, causing substantial floor effects in person estimates, reliability indices were acceptable. The CCHIPS domain showed differential item functioning (DIF) for cultural background and language, and the PDL domain showed DIF for treatment phase and gender. The results for SEX and SURV domains were limited by floor effects and missing responses. All domains showed adequate fit to the model after DIF was resolved and a small number of items were deleted. The study highlights the intricacies in designing a culturally competent survey that can be applied to culturally and linguistically diverse groups across different treatment contexts. Overall, the results demonstrate that this survey is somewhat invariant with respect to these factors. Future refinements are suggested to enhance the survey's cultural competence and general validity.
Schneider, Stefan; Choi, Seung W; Junghaenel, Doerte U; Schwartz, Joseph E; Stone, Arthur A
2013-09-01
The Patient-Reported Outcomes (PRO) Measurement Information System (PROMIS(®)) has developed assessment tools for numerous PROs, most using a 7-day recall format. We examined whether modifying the recall period for use in daily diary research would affect the psychometric characteristics of several PROMIS measures. Daily versions of short-forms for three PROMIS domains (pain interference, fatigue, depression) were administered to a general population sample (n = 100) for 28 days. Analyses used multilevel item response theory (IRT) models. We examined differential item functioning (DIF) across recall periods by comparing the IRT parameters from the daily data with the PROMIS 7-day recall IRT parameters. Additionally, we examined whether the IRT parameters for day-to-day within-person changes are invariant to those for between-person (cross-sectional) differences in PROs. Dimensionality analyses of the daily data suggested a single dimension for each PRO domain, consistent with PROMIS instruments. One-third of the daily items showed uniform DIF when compared with PROMIS 7-day recall, but the impact of DIF on the scale level was minor. IRT parameters for within-person changes differed from between-person parameters for 3 depression items, which were more sensitive for measuring change than between-person differences, but not for pain interference and fatigue items. Notably, mean scores from daily diaries were significantly lower than the PROMIS 7-day recall norms. The results provide initial evidence supporting the adaptation of PROMIS measures for daily diary research. However, scores from daily diaries cannot be directly interpreted on PROMIS norms established for 7-day recall.
Hackett, Michelle; Melgar-Quinonez, Hugo; Uribe, Martha C Alvarez
2008-01-01
Objective We assessed the validity of a locally adapted Colombian Household Food Security Scale (CHFSS) used as a part of the 2006 evaluation of the food supplement component of the Plan for Improving Food and Nutrition in Antioquia, Colombia (MANA – Plan Departamental de Seguridad Alimentaria y Nutricional de Antioquia). Methods Subjects included low-income families with pre-school age children in MANA that responded affirmatively to at least one CHFSS item (n = 1,319). Rasch Modeling was used to evaluate the psychometric characteristics of the items through measure and INFIT values. Differences in CHFSS performance were assessed by area of residency, socioeconomic status and number of children enrolled in MANA. Unidimensionality of a scale by group was further assessed using Differential Item Functioning (DIF). Results Most CHFSS items presented good fitness with most INFIT values within the adequate range of 0.8 to 1.2. Consistency in item measure values between groups was found for all but two items in the comparison by area of residency. Only two adult items exhibited DIF between urban and rural households. Conclusion The results indicate that the adapted CHFSS is a valid tool to assess the household food security of participants in food assistance programs like MANA. PMID:18500988
Hackett, Michelle; Melgar-Quinonez, Hugo; Uribe, Martha C Alvarez
2008-05-23
We assessed the validity of a locally adapted Colombian Household Food Security Scale (CHFSS) used as a part of the 2006 evaluation of the food supplement component of the Plan for Improving Food and Nutrition in Antioquia, Colombia (MANA - Plan Departamental de Seguridad Alimentaria y Nutricional de Antioquia). Subjects included low-income families with pre-school age children in MANA that responded affirmatively to at least one CHFSS item (n = 1,319). Rasch Modeling was used to evaluate the psychometric characteristics of the items through measure and INFIT values. Differences in CHFSS performance were assessed by area of residency, socioeconomic status and number of children enrolled in MANA. Unidimensionality of a scale by group was further assessed using Differential Item Functioning (DIF). Most CHFSS items presented good fitness with most INFIT values within the adequate range of 0.8 to 1.2. Consistency in item measure values between groups was found for all but two items in the comparison by area of residency. Only two adult items exhibited DIF between urban and rural households. The results indicate that the adapted CHFSS is a valid tool to assess the household food security of participants in food assistance programs like MANA.
ERIC Educational Resources Information Center
Sinharay, Sandip; Dorans, Neil J.
2010-01-01
The Mantel-Haenszel (MH) procedure (Mantel and Haenszel) is a popular method for estimating and testing a common two-factor association parameter in a 2 x 2 x K table. Holland and Holland and Thayer described how to use the procedure to detect differential item functioning (DIF) for tests with dichotomously scored items. Wang, Bradlow, Wainer, and…
Kalibatseva, Z; Leong, F T L; Ham, E H
2014-09-01
Theoretical and clinical publications suggest the existence of cultural differences in the expression and experience of depression. Measurement non-equivalence remains a potential methodological explanation for the lower prevalence of depression among Asian Americans compared to European Americans. This study compared DSM-IV depressive symptoms among Asian Americans and European Americans using secondary data analysis of the Collaborative Psychiatric Epidemiology Surveys (CPES). The Composite International Diagnostic Interview (CIDI) was used for the assessment of depressive symptoms. Of the entire sample, 310 Asian Americans and 1974 European Americans reported depressive symptoms and were included in the analyses. Measurement variance was examined with an item response theory differential item functioning (IRT DIF) analysis. χ2 analyses indicated that, compared to Asian Americans, European American participants more frequently endorsed affective symptoms such as 'feeling depressed', 'feeling discouraged' and 'cried more often'. The IRT analysis detected DIF for four out of the 15 depression symptom items. At equal levels of depression, Asian Americans endorsed feeling worthless and appetite changes more easily than European Americans, and European Americans endorsed feeling nervous and crying more often than Asian Americans. Asian Americans did not seem to over-report somatic symptoms; however, European Americans seemed to report more affective symptoms than Asian Americans. The results suggest that there was measurement variance in a few of the depression items.
Doostfatemeh, Marziyeh; Ayatollahi, Seyyed Mohammad Taghi; Jafari, Peyman
2015-08-01
In child-parent agreement studies in the field of paediatric health-related quality of life (HRQoL), little attention has been paid to the effect of gender in parental proxy rating of children's HRQoL. This study aims to test the potential interchangeability of parent dyads in reporting children's HRQoL on both item and scale levels of the PedsQL™ 4.0 instrument, using the approach of differential item functioning (DIF). The PedsQL™ 4.0 Generic Core Scales were completed by 576 father-and-mother dyads. A polytomous item response theory model, graded response model, was used to detect DIF across fathers and mothers. Assessment at item level showed that fathers and mothers perceived the meaning of items of the PedsQL™ 4.0 consistently. Regarding the scale level, a moderate to high level of agreement was observed between mothers' and fathers' reports on all similar subscales. Although the significant mean score differences in total, physical and emotional functioning indicated that fathers gave higher scores to their children, the small effect size implied that this difference may not be practically meaningful. Our findings revealed that discrepancy in parent dyads in rating children's HRQoL is a "real" difference and not an artefact due to measurement non-invariance. Fathers were seen to have slightly different insights into their children, especially for emotional functioning, but overall the results were not all that different. This suggests that paternal proxy-reports can be included in studies along with maternal proxy-reports, and the two may be combined when looking at parent-child agreement. Parent-child agreement studies in Iran are not affected by parents' gender, and therefore, researchers may rely on the assumption of the interchangeability of fathers and mothers in these studies.
Kalpakjian, Claire Z.; Tulsky, David S.; Kisala, Pamela A.; Bombardier, Charles H.
2015-01-01
Objective To develop an item response theory (IRT) calibrated Grief and Loss item bank as part of the Spinal Cord Injury – Quality of Life (SCI-QOL) measurement system. Design A literature review guided framework development of grief/loss. New items were created from focus groups. Items were revised based on expert review and patient feedback and were then field tested. Analyses included confirmatory factor analysis (CFA), graded response IRT modeling and evaluation of differential item functioning (DIF). Setting We tested a 20-item pool at several rehabilitation centers across the United States, including the University of Michigan, Kessler Foundation, Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs hospital. Participants A total of 717 individuals with SCI answered the grief and loss questions. Results The final calibrated item bank resulted in 17 retained items. A unidimensional model was observed (CFI = 0.976; RMSEA = 0.078) and measurement precision was good (theta range between −1.48 to 2.48). Ten items were flagged for DIF, however, after examination of effect sizes found this to be negligible with little practical impact on score estimates. Conclusions This study indicates that the SCI-QOL Grief and Loss item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010969
Calibration of the Spanish PROMIS Smoking Item Banks.
Huang, Wenjing; Stucky, Brian D; Edelen, Maria O; Tucker, Joan S; Shadel, William G; Hansen, Mark; Cai, Li
2016-07-01
The Patient-Reported Outcomes Measurement Information System (PROMIS) Smoking Initiative has developed item banks for assessing six smoking behaviors and biopsychosocial correlates of smoking among adult cigarette smokers. The goal of this study is to evaluate the performance of the Spanish version of the PROMIS smoking item banks as compared to the original banks developed in English. The six PROMIS banks for daily smokers were translated into Spanish and administered to a sample of Spanish-speaking adult daily smokers in the United States (N = 302). We first evaluated the unidimensionality of each bank using confirmatory factor analysis. We then conducted a two-group item response theory calibration, including an item response theory-based Differential Item Functioning (DIF) analysis by language of administration (Spanish vs. English). Finally, we generated full bank and short form scores for the translated banks and evaluated their psychometric performance. Unidimensionality of the Spanish smoking item banks was supported by confirmatory factor analysis results. Out of a total of 109 items that were evaluated for language DIF, seven items in three of the six banks were identified as having levels of DIF that exceeded an established criterion. The psychometric performance of the Spanish daily smoker banks is largely comparable to that of the English versions. The Spanish PROMIS smoking item banks are highly similar, but not entirely equivalent, to the original English versions. The parameters from these two-group calibrations can be used to generate comparable bank scores across the two language versions. In this study, we developed a Spanish version of the PROMIS smoking toolkit, which was originally designed and developed for English speakers. With the growing Spanish-speaking population, it is important to make the toolkit more accessible by translating the items and calibrating the Spanish version to be comparable with English-language scores. This study provided the translated item banks and short forms, comparable unbiased scores for Spanish speakers and evaluations of the psychometric properties of the new Spanish toolkit. © The Author 2016. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Methodology for Developing and Evaluating the PROMIS® Smoking Item Banks
Cai, Li; Stucky, Brian D.; Tucker, Joan S.; Shadel, William G.; Edelen, Maria Orlando
2014-01-01
Introduction: This article describes the procedures used in the PROMIS® Smoking Initiative for the development and evaluation of item banks, short forms (SFs), and computerized adaptive tests (CATs) for the assessment of 6 constructs related to cigarette smoking: nicotine dependence, coping expectancies, emotional and sensory expectancies, health expectancies, psychosocial expectancies, and social motivations for smoking. Methods: Analyses were conducted using response data from a large national sample of smokers. Items related to each construct were subjected to extensive item factor analyses and evaluation of differential item functioning (DIF). Final item banks were calibrated, and SF assessments were developed for each construct. The performance of the SFs and the potential use of the item banks for CAT administration were examined through simulation study. Results: Item selection based on dimensionality assessment and DIF analyses produced item banks that were essentially unidimensional in structure and free of bias. Simulation studies demonstrated that the constructs could be accurately measured with a relatively small number of carefully selected items, either through fixed SFs or CAT-based assessment. Illustrative results are presented, and subsequent articles provide detailed discussion of each item bank in turn. Conclusions: The development of the PROMIS smoking item banks provides researchers with new tools for measuring smoking-related constructs. The use of the calibrated item banks and suggested SF assessments will enhance the quality of score estimates, thus advancing smoking research. Moreover, the methods used in the current study, including innovative approaches to item selection and SF construction, may have general relevance to item bank development and evaluation. PMID:23943843
Lin, Chung-Ying; Griffiths, Mark D; Pakpour, Amir H
2018-03-01
Background and aims Research examining problematic mobile phone use has increased markedly over the past 5 years and has been related to "no mobile phone phobia" (so-called nomophobia). The 20-item Nomophobia Questionnaire (NMP-Q) is the only instrument that assesses nomophobia with an underlying theoretical structure and robust psychometric testing. This study aimed to confirm the construct validity of the Persian NMP-Q using Rasch and confirmatory factor analysis (CFA) models. Methods After ensuring the linguistic validity, Rasch models were used to examine the unidimensionality of each Persian NMP-Q factor among 3,216 Iranian adolescents and CFAs were used to confirm its four-factor structure. Differential item functioning (DIF) and multigroup CFA were used to examine whether males and females interpreted the NMP-Q similarly, including item content and NMP-Q structure. Results Each factor was unidimensional according to the Rach findings, and the four-factor structure was supported by CFA. Two items did not quite fit the Rasch models (Item 14: "I would be nervous because I could not know if someone had tried to get a hold of me;" Item 9: "If I could not check my smartphone for a while, I would feel a desire to check it"). No DIF items were found across gender and measurement invariance was supported in multigroup CFA across gender. Conclusions Due to the satisfactory psychometric properties, it is concluded that the Persian NMP-Q can be used to assess nomophobia among adolescents. Moreover, NMP-Q users may compare its scores between genders in the knowledge that there are no score differences contributed by different understandings of NMP-Q items.
Ashley, Laura; Smith, Adam B; Keding, Ada; Jones, Helen; Velikova, Galina; Wright, Penny
2013-12-01
To provide new insights into the psychometrics of the revised Illness Perception Questionnaire (IPQ-R) in cancer patients. To undertake, for the first time using data from breast, colorectal and prostate cancer patients, a confirmatory factor analysis (CFA) to assess the validity of the IPQ-R's core seven-factor structure. Also, for the first time in any illness group, to undertake Rasch analysis to explore the extent to which the IPQ-R factors form unidimensional scales, with linear measurement properties and no Differential Item Functioning (DIF). Patients with potentially curable breast, colorectal or prostate cancer, within 6months post-diagnosis, completed the IPQ-R online (N=531). CFA was conducted, including multi-sample analysis, and for each IPQ-R factor fit to the Rasch model was assessed by examining, amongst other things, item fit, DIF and unidimensionality. The CFA showed a moderate fit of the data to the IPQ-R model, and stability across diagnosis, although fit was significantly improved following the removal of selected items. All seven factors achieved fit to the Rasch model, and exhibited unidimensionality and minimal DIF, although in most cases this was after some item rescoring and/or deletion. In both analyses, IPQ-R items 12, 18 and 24 were indicated as misfitting and removed. Given the rigorous standard of Rasch measurement, and the generic nature of the IPQ-R, it stood up well to the demands of the Rasch model in this study. Importantly, the results show that with some relatively minor, pragmatic modifications the IPQ-R could possess Rasch-standard measurement in cancer patients. © 2013.
The PU-PROM: A patient-reported outcome measure for peptic ulcer disease.
Liu, Na; Lv, Jing; Liu, Jinchun; Zhang, Yanbo
2017-12-01
Patient-reported outcome measure (PROM) conceived to enable description of treatment-related effects, from the patient perspective, bring the potential to improve in clinical research, and to provide patients with accurate information. Therefore, the aim of this study was to develop a patient-centred peptic ulcer patient-reported outcome measure (PU-PROM) and evaluate its reliability, validity, differential item functioning (DIF) and feasibility. To develop a conceptual framework and item pool for the PU-PROM, we performed a literature review and consulted other measures created in China and other countries. Beyond that, we interviewed 10 patients with peptic ulcers, and consulted six key experts to ensure that all germane parameters were included. In the first item selection phase, classical test theory and item response theory were used to select and adjust items to shape the preliminary measure completed by 130 patients and 50 controls. In the next phase, the measure was evaluated used the same methods with 492 patients and 124 controls. Finally, we used the same population in the second item reselection to assess the reliability, validity, DIF and feasibility of the final measure. The final peptic ulcer PRO measure comprised four domains (physiology, psychology, society and treatment), with 11 subdomains, and 54 items. The Cronbach's α coefficient of each subdomain for the measure was >0.800. Confirmatory factory analysis indicated that the construct validity fulfilled expectations. Model fit indices, such as RMR, RMSEA, NFI, NNFI, CFI and IFI, showed acceptable fit. The measure showed a good response rate. The peptic ulcer PRO measure had good reliability, validity, DIF and feasibility, and can be used as a clinical research evaluation instrument with patients with peptic ulcers to assess their condition focus on treatment. This measure may also be applied in other health areas, especially in clinical trials of new drugs, and may be helpful in clinical decision making. © 2017 The Authors Health Expectations Published by John Wiley & Sons Ltd.
Identifying Country-Specific Cultures of Physics Education: A differential item functioning approach
NASA Astrophysics Data System (ADS)
Mesic, Vanes
2012-11-01
In international large-scale assessments of educational outcomes, student achievement is often represented by unidimensional constructs. This approach allows for drawing general conclusions about country rankings with respect to the given achievement measure, but it typically does not provide specific diagnostic information which is necessary for systematic comparisons and improvements of educational systems. Useful information could be obtained by exploring the differences in national profiles of student achievement between low-achieving and high-achieving countries. In this study, we aimed to identify the relative weaknesses and strengths of eighth graders' physics achievement in Bosnia and Herzegovina in comparison to the achievement of their peers from Slovenia. For this purpose, we ran a secondary analysis of Trends in International Mathematics and Science Study (TIMSS) 2007 data. The student sample consisted of 4,220 students from Bosnia and Herzegovina and 4,043 students from Slovenia. After analysing the cognitive demands of TIMSS 2007 physics items, the correspondent differential item functioning (DIF)/differential group functioning contrasts were estimated. Approximately 40% of items exhibited large DIF contrasts, indicating significant differences between cultures of physics education in Bosnia and Herzegovina and Slovenia. The relative strength of students from Bosnia and Herzegovina showed to be mainly associated with the topic area 'Electricity and magnetism'. Classes of items which required the knowledge of experimental method, counterintuitive thinking, proportional reasoning and/or the use of complex knowledge structures proved to be differentially easier for students from Slovenia. In the light of the presented results, the common practice of ranking countries with respect to universally established cognitive categories seems to be potentially misleading.
Rasch analysis of the patient-rated wrist evaluation questionnaire.
Esakki, Saravanan; MacDermid, Joy C; Vincent, Joshua I; Packham, Tara L; Walton, David; Grewal, Ruby
2018-01-01
The Patient-Rated Wrist Evaluation (PRWE) was developed as a wrist joint specific measure of pain and disability and evidence of sound validity has been accumulated through classical psychometric methods. Rasch analysis (RA) has been endorsed as a newer method for analyzing the clinical measurement properties of self-report outcome measures. The purpose of this study was to evaluate the PRWE using Rasch modeling. We employed the Rasch model to assess overall fit, response scaling, individual item fit, differential item functioning (DIF), local dependency, unidimensionality and person separation index (PSI). A convenience sample of 382 patients with distal radius fracture was recruited from the hand and upper limb clinic at large academic healthcare organization, London, Ontario, Canada, 6-month post-injury scores of the PRWE was used. RA was conducted on the 3 subscales (pain, specific activities, and usual activities) of the PRWE separately. The pain subscale adequately fit the Rasch model when item 4 "Pain - When it is at its worst" was deleted to eliminate non-uniform DIF by age group, and item 5 "How often do you have pain" was rescored by collapsing into 8 intervals to eliminate disordered thresholds. Uniform DIF for "Use my affected hand to push up from the chair" (by work status) and "Use bathroom tissue with my affected hand" (by injured hand) was addressed by splitting the items for analysis. After background rescoring of 2 items in pain subscale, 2 items in specific activities and 3 items in usual activities, all three subscales of the PRWE were well targeted and had high reliability (PSI = 0.86). These changes provided a unidimensional, interval-level scaled measure. Like a previous analysis of the Patient-Rated Wrist and Hand Evaluation, this study found the PRWE could be fit to the Rasch model with rescoring of multiple items. However, the modifications required to achieve fit were not the same across studies, our fit statistics also suggested one of the pain items should be deleted. This study adds to the pool of evidence supporting the PRWE, but cannot confidently provide a Rasch-based scoring algorithm.
Calibration of the Dutch-Flemish PROMIS Pain Behavior item bank in patients with chronic pain.
Crins, M H P; Roorda, L D; Smits, N; de Vet, H C W; Westhovens, R; Cella, D; Cook, K F; Revicki, D; van Leeuwen, J; Boers, M; Dekker, J; Terwee, C B
2016-02-01
The aims of the current study were to calibrate the item parameters of the Dutch-Flemish PROMIS Pain Behavior item bank using a sample of Dutch patients with chronic pain and to evaluate cross-cultural validity between the Dutch-Flemish and the US PROMIS Pain Behavior item banks. Furthermore, reliability and construct validity of the Dutch-Flemish PROMIS Pain Behavior item bank were evaluated. The 39 items in the bank were completed by 1042 Dutch patients with chronic pain. To evaluate unidimensionality, a one-factor confirmatory factor analysis (CFA) was performed. A graded response model (GRM) was used to calibrate the items. To evaluate cross-cultural validity, Differential item functioning (DIF) for language (Dutch vs. English) was evaluated. Reliability of the item bank was also examined and construct validity was studied using several legacy instruments, e.g. the Roland Morris Disability Questionnaire. CFA supported the unidimensionality of the Dutch-Flemish PROMIS Pain Behavior item bank (CFI = 0.960, TLI = 0.958), the data also fit the GRM, and demonstrated good coverage across the pain behavior construct (threshold parameters range: -3.42 to 3.54). Analysis showed good cross-cultural validity (only six DIF items), reliability (Cronbach's α = 0.95) and construct validity (all correlations ≥0.53). The Dutch-Flemish PROMIS Pain Behavior item bank was found to have good cross-cultural validity, reliability and construct validity. The development of the Dutch-Flemish PROMIS Pain Behavior item bank will serve as the basis for Dutch-Flemish PROMIS short forms and computer adaptive testing (CAT). © 2015 European Pain Federation - EFIC®
Lundgren-Nilsson, Asa; Dencker, Anna; Jakobsson, Sofie; Taft, Charles; Tennant, Alan
2014-06-01
Fatigue is a common and distressing symptom in cancer patients due to both the disease and its treatments. The concept of fatigue is multidimensional and includes both physical and mental components. The 22-item Revised Piper Fatigue Scale (RPFS) is a multidimensional instrument developed to assess cancer-related fatigue. This study reports on the construct validity of the Swedish version of the RPFS from the perspective of Rasch measurement. The Swedish version of the RPFS was answered by 196 cancer patients fatigued after 4 to 5 weeks of curative radiation therapy. Data from the scale were fitted to the Rasch measurement model. This involved testing a series of assumptions, including the stochastic ordering of items, local response dependency, and unidimensionality. A series of fit statistics were computed, differential item functioning (DIF) was tested, and local response dependency was accommodated through testlets. The Behavioral, Affective and Sensory domains all satisfied the Rasch model expectations. No DIF was observed, and all domains were found to be unidimensional. The Mood/Cognitive scale failed to fit the model, and substantial multidimensionality was found. Splitting the scale between Mood and Cognitive items resolved fit to the Rasch model, and new domains were unidimensional without DIF. The current Rasch analyses add to the evidence of measurement properties of the scale and show that the RPFS has good psychometric properties and works well to measure fatigue. The original four-factor structure, however, was not supported. Copyright © 2014 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Preti, Antonio; Vellante, Marcello; Petretto, Donatella R
2017-05-01
The "Reading the Mind in the Eyes" Test (hereafter: Eyes Test) is considered an advanced task of the Theory of Mind aimed at assessing the performance of the participant in perspective-takingthat is, the ability to sense or understand other people's cognitive and emotional states. In this study, the item response theory analysis was applied to the adult version of the Eyes Test. The Italian version of the Eyes Test was administered to 200 undergraduate students of both genders (males = 46%). Modified parallel analysis (MPA) was used to test unidimensionality. Marginal maximum likelihood estimation was used to fit the 1-, 2-, and 3-parameter logistic (PL) model to the data. Differential Item Functioning (DIF) due to gender was explored with five independent methods. MPA provided evidence in favour of unidimensionality. The Rasch model (1-PL) was superior to the other two models in explaining participants' responses to the Eyes Test. There was no robust evidence of gender-related DIF in the Eyes Test, although some differences may exist for some items as a reflection of real differences by group. The study results support a one-factor model of the Eyes Test. Performance on the Eyes Test is defined by the participant's ability in perspective-taking. Researchers should cease using arbitrarily selected subscores in assessing the performance of participants to the Eyes Test. Lack of gender-related DIF favours the use of the Eyes Test in the investigation of gender differences concerning empathy and social cognition.
Covic, Tanya; Pallant, Julie F; Conaghan, Philip G; Tennant, Alan
2007-01-01
Background The aim of this study was to test the internal validity of the total Center for Epidemiologic Studies-Depression (CES-D) scale using Rasch analysis in a rheumatoid arthritis (RA) population. Methods CES-D was administered to 157 patients with RA over three time points within a 12 month period. Rasch analysis was applied using RUMM2020 software to assess the overall fit of the model, the response scale used, individual item fit, differential item functioning (DIF) and person separation. Results Pooled data across three time points was shown to fit the Rasch model with removal of seven items from the original 20-item CES-D scale. It was necessary to rescore the response format from four to three categories in order to improve the scale's fit. Two items demonstrated some DIF for age and gender but were retained within the 13-item CES-D scale. A new cut point for depression score of 9 was found to correspond to the original cut point score of 16 in the full CES-D scale. Conclusion This Rasch analysis of the CES-D in a longstanding RA cohort resulted in the construction of a modified 13-item scale with good internal validity. Further validation of the modified scale is recommended particularly in relation to the new cut point for depression. PMID:17629902
The impact of gender on the assessment of body checking behavior.
Alfano, Lauren; Hildebrandt, Tom; Bannon, Katie; Walker, Catherine; Walton, Kate E
2011-01-01
Body checking includes any behavior aimed at global or specific evaluations of appearance characteristics. Men and women are believed to express these behaviors differently, possibly reflecting different socialization. However, there has been no empirical test of the impact of gender on body checking. A total of 1024 male and female college students completed two measures of body checking, the Body Checking Questionnaire and the Male Body Checking Questionnaire. Using multiple group confirmatory factor analysis, differential item functioning (DIF) was explored in a composite of these measures. Two global latent factors were identified (female and male body checking severity), and there were expected gender differences in these factors even after controlling for DIF. Ten items were found to be unbiased by gender and provide a suitable brief measure of body checking for mixed gender research. Practical applications for body checking assessment and theoretical implications are discussed. Copyright © 2010 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Chiu, Tina
This dissertation includes three studies that analyze a new set of assessment tasks developed by the Learning Progressions in Middle School Science (LPS) Project. These assessment tasks were designed to measure science content knowledge on the structure of matter domain and scientific argumentation, while following the goals from the Next Generation Science Standards (NGSS). The three studies focus on the evidence available for the success of this design and its implementation, generally labelled as "validity" evidence. I use explanatory item response models (EIRMs) as the overarching framework to investigate these assessment tasks. These models can be useful when gathering validity evidence for assessments as they can help explain student learning and group differences. In the first study, I explore the dimensionality of the LPS assessment by comparing the fit of unidimensional, between-item multidimensional, and Rasch testlet models to see which is most appropriate for this data. By applying multidimensional item response models, multiple relationships can be investigated, and in turn, allow for a more substantive look into the assessment tasks. The second study focuses on person predictors through latent regression and differential item functioning (DIF) models. Latent regression models show the influence of certain person characteristics on item responses, while DIF models test whether one group is differentially affected by specific assessment items, after conditioning on latent ability. Finally, the last study applies the linear logistic test model (LLTM) to investigate whether item features can help explain differences in item difficulties.
Development of the PROMIS coping expectancies of smoking item banks.
Shadel, William G; Edelen, Maria Orlando; Tucker, Joan S; Stucky, Brian D; Hansen, Mark; Cai, Li
2014-09-01
Smoking is a coping strategy for many smokers who then have difficulty finding new ways to cope with negative affect when they quit. This paper describes analyses conducted to develop and evaluate item banks for assessing the coping expectancies of smoking for daily and nondaily smokers. Using data from a large sample of daily (N = 4,201) and nondaily (N = 1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning (DIF) analyses (according to gender, age, and ethnicity) to arrive at a unidimensional set of items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) for assessing coping expectancies of smoking. For both daily and nondaily smokers, the unidimensional Coping Expectancies item banks (21 items) are relatively DIF free and are highly reliable (0.96 and 0.97, respectively). A common 4-item SF for daily and nondaily smokers also showed good reliability (0.85). Adaptive tests required an average of 4.3 and 3.7 items for simulated daily and nondaily respondents, respectively, and achieved reliabilities of 0.91 for both when the maximum test length was 10 items. This research provides a new set of items that can be used to reliably assess coping expectancies of smoking, through a SF, CAT, or a tailored set selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Inchausti, Felix; Mole, Joe; Fonseca-Pedrero, Eduardo; Ortuño-Sierra, Javier
2015-01-01
The aim of this study was to analyse the psychometric properties of the Spanish NEO Five Factor Inventory–Revised (NEO-FFI-R) using Rasch analyses, in order to test its rating scale functioning, the reliability of scores, internal structure, and differential item functioning (DIF) by gender in a psychiatric sample. The NEO-FFI-R responses of 433 Spanish adults (154 males) with an anxiety disorder as primary diagnosis were analysed using the Rasch model for rating scales. Two intermediate categories of response (‘neutral’ and ‘agree’) malfunctioned in the Neuroticism and Conscientiousness scales. In addition, model reliabilities were lower than expected in Agreeableness and Neuroticism, and the item fit values indicated each scale had items that did not achieve moderate to high discrimination on its dimension, particularly in the Agreeableness scale. Concerning unidimensionality, the five NEO-FFI-R scales showed large first components of unexplained variance. Finally, DIF by gender was detected in many items. The results suggest that the scores of the Spanish NEO-FFI-R are unreliable in psychiatric samples and cannot be generalized between males and females, especially in the Openness, Conscientiousness, and Agreeableness scales. Future directions for testing and refinement should be developed before the NEO-FFI-R can be used reliably in clinical samples. PMID:25954224
Jamali, Jamshid; Ayatollahi, Seyyed Mohammad Taghi; Jafari, Peyman
2017-01-01
Evaluating measurement equivalence (also known as differential item functioning (DIF)) is an important part of the process of validating psychometric questionnaires. This study aimed at evaluating the multiple indicators multiple causes (MIMIC) model for DIF detection when latent construct distribution is nonnormal and the focal group sample size is small. In this simulation-based study, Type I error rates and power of MIMIC model for detecting uniform-DIF were investigated under different combinations of reference to focal group sample size ratio, magnitude of the uniform-DIF effect, scale length, the number of response categories, and latent trait distribution. Moderate and high skewness in the latent trait distribution led to a decrease of 0.33% and 0.47% power of MIMIC model for detecting uniform-DIF, respectively. The findings indicated that, by increasing the scale length, the number of response categories and magnitude DIF improved the power of MIMIC model, by 3.47%, 4.83%, and 20.35%, respectively; it also decreased Type I error of MIMIC approach by 2.81%, 5.66%, and 0.04%, respectively. This study revealed that power of MIMIC model was at an acceptable level when latent trait distributions were skewed. However, empirical Type I error rate was slightly greater than nominal significance level. Consequently, the MIMIC was recommended for detection of uniform-DIF when latent construct distribution is nonnormal and the focal group sample size is small.
Jafari, Peyman
2017-01-01
Evaluating measurement equivalence (also known as differential item functioning (DIF)) is an important part of the process of validating psychometric questionnaires. This study aimed at evaluating the multiple indicators multiple causes (MIMIC) model for DIF detection when latent construct distribution is nonnormal and the focal group sample size is small. In this simulation-based study, Type I error rates and power of MIMIC model for detecting uniform-DIF were investigated under different combinations of reference to focal group sample size ratio, magnitude of the uniform-DIF effect, scale length, the number of response categories, and latent trait distribution. Moderate and high skewness in the latent trait distribution led to a decrease of 0.33% and 0.47% power of MIMIC model for detecting uniform-DIF, respectively. The findings indicated that, by increasing the scale length, the number of response categories and magnitude DIF improved the power of MIMIC model, by 3.47%, 4.83%, and 20.35%, respectively; it also decreased Type I error of MIMIC approach by 2.81%, 5.66%, and 0.04%, respectively. This study revealed that power of MIMIC model was at an acceptable level when latent trait distributions were skewed. However, empirical Type I error rate was slightly greater than nominal significance level. Consequently, the MIMIC was recommended for detection of uniform-DIF when latent construct distribution is nonnormal and the focal group sample size is small. PMID:28713828
Janevic, T; Gundersen, D; Stojanovski, K; Jankovic, J; Nikolic, Z; Kasapinov, B
2015-09-01
Scales used to assess discrimination in public health research have rarely been validated outside of high income countries. Our objective was to validate the Experiences of Discrimination (EOD) scale and the Everyday Discrimination Scale (EDS) among 410 Romani women in Macedonia and Serbia. Romani female interviewers conducted interviews in 2012-2013. We used a multiple indicator multiple cause approach to test a one-factor model for each scale and to assess differential item functioning (DIF) by age, wealth, country, and education. We also measured associations between the EOD and EDS with smoking in the past year and psychological distress. Three items of the EOD were conceptually irrelevant. Two items of the EDS were not conditionally independent. DIF was found by country for one item in each scale. After excluding these items, all scales exhibited good model fit and were associated with smoking (EOD beta = 0.40, 95% CI = 0.18, 0.63; EDS beta = 0.33, 95% CI = 0.12, 0.54) and psychological distress (EOD beta = 0.26, 95% CI = 0.15, 0.37; EDS beta = 0.26, 95% CI = 0.04, 0.47). Discrimination scales can be adapted for use among Romani women and are associated with both smoking and psychological distress.
Methodology for developing and evaluating the PROMIS smoking item banks.
Hansen, Mark; Cai, Li; Stucky, Brian D; Tucker, Joan S; Shadel, William G; Edelen, Maria Orlando
2014-09-01
This article describes the procedures used in the PROMIS Smoking Initiative for the development and evaluation of item banks, short forms (SFs), and computerized adaptive tests (CATs) for the assessment of 6 constructs related to cigarette smoking: nicotine dependence, coping expectancies, emotional and sensory expectancies, health expectancies, psychosocial expectancies, and social motivations for smoking. Analyses were conducted using response data from a large national sample of smokers. Items related to each construct were subjected to extensive item factor analyses and evaluation of differential item functioning (DIF). Final item banks were calibrated, and SF assessments were developed for each construct. The performance of the SFs and the potential use of the item banks for CAT administration were examined through simulation study. Item selection based on dimensionality assessment and DIF analyses produced item banks that were essentially unidimensional in structure and free of bias. Simulation studies demonstrated that the constructs could be accurately measured with a relatively small number of carefully selected items, either through fixed SFs or CAT-based assessment. Illustrative results are presented, and subsequent articles provide detailed discussion of each item bank in turn. The development of the PROMIS smoking item banks provides researchers with new tools for measuring smoking-related constructs. The use of the calibrated item banks and suggested SF assessments will enhance the quality of score estimates, thus advancing smoking research. Moreover, the methods used in the current study, including innovative approaches to item selection and SF construction, may have general relevance to item bank development and evaluation. © The Author 2013. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Ndosi, Mwidimi; Bremander, Ann; Hamnes, Bente; Horton, Mike; Kukkurainen, Marja Leena; Machado, Pedro; Marques, Andrea; Meesters, Jorit; Stamm, Tanja A; Tennant, Alan; de la Torre-Aboki, Jenny; Vliet Vlieland, Theodora P M; Zangi, Heidi A; Hill, Jackie
2014-12-01
To validate the educational needs assessment tool (ENAT) as a generic tool for assessing the educational needs of patients with rheumatic diseases in European Countries. A convenience sample of patients from seven European countries was included comprising the following diagnostic groups: ankylosing spondylitis, psoriatic arthritis, systemic sclerosis, systemic lupus erythematosus, osteoarthritis (OA) and fibromyalgia syndrome. Translated versions of the ENAT were completed through surveys in each country. Rasch analysis was used to assess the construct validity of the adapted ENATs including differential item functioning by culture (cross-cultural DIF). Initially, the data from each country and diagnostic group were fitted to the Rasch model separately, and then the pooled data from each diagnostic group. The sample comprised 3015 patients; the majority, 1996 (66.2%), were women. Patient characteristics (stratified by diagnostic group) were comparable across countries except the educational background, which was variable. In most occasions, the 39-item ENAT deviated significantly from the Rasch model expectations (item-trait interaction χ(2) p<0.05). After correction for local dependency (grouping the items into seven domains and analysing them as 'testlets'), fit to the model was satisfied (item-trait interaction χ(2) p>0.18) in all pooled disease group datasets except OA (χ(2)=99.91; p=0.002). The internal consistency in each group was high (Person Separation Index above 0.90). There was no significant DIF by person characteristics. Cross-cultural DIF was found in some items, which required adjustments. Subsequently, interval-level scales were calibrated to enable transformation of ENAT scores when required. The adapted ENAT is a valid tool with high internal consistency providing accurate estimation of the educational needs of people with rheumatic diseases. Cross-cultural comparison of educational needs is now possible. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
The Effect of Missing Data Treatment on Mantel-Haenszel DIF Detection
ERIC Educational Resources Information Center
Emenogu, Barnabas C.; Falenchuk, Olesya; Childs, Ruth A.
2010-01-01
Most implementations of the Mantel-Haenszel differential item functioning procedure delete records with missing responses or replace missing responses with scores of 0. These treatments of missing data make strong assumptions about the causes of the missing data. Such assumptions may be particularly problematic when groups differ in their patterns…
ERIC Educational Resources Information Center
Nye, Christopher D.; Drasgow, Fritz
2011-01-01
Because of the practical, theoretical, and legal implications of differential item functioning (DIF) for organizational assessments, studies of measurement equivalence are a necessary first step before scores can be compared across individuals from different groups. However, commonly recommended criteria for evaluating results from these analyses…
Validity Evidence in Accommodations for English Language Learners and Students with Disabilities
ERIC Educational Resources Information Center
Camara, Wayne
2009-01-01
The five papers in this special issue of the "Journal of Applied Testing Technology" address fundamental issues of validity when tests are modified or accommodations are provided to English Language Learners (ELL) or students with disabilities. Three papers employed differential item functioning (DIF) and factor analysis and found the…
Crins, Martine H. P.; Roorda, Leo D.; Smits, Niels; de Vet, Henrica C. W.; Westhovens, Rene; Cella, David; Cook, Karon F.; Revicki, Dennis; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Terwee, Caroline B.
2015-01-01
The Dutch-Flemish PROMIS Group translated the adult PROMIS Pain Interference item bank into Dutch-Flemish. The aims of the current study were to calibrate the parameters of these items using an item response theory (IRT) model, to evaluate the cross-cultural validity of the Dutch-Flemish translations compared to the original English items, and to evaluate their reliability and construct validity. The 40 items in the bank were completed by 1085 Dutch chronic pain patients. Before calibrating the items, IRT model assumptions were evaluated using confirmatory factor analysis (CFA). Items were calibrated using the graded response model (GRM), an IRT model appropriate for items with more than two response options. To evaluate cross-cultural validity, differential item functioning (DIF) for language (Dutch vs. English) was examined. Reliability was evaluated based on standard errors and Cronbach’s alpha. To evaluate construct validity correlations with scores on legacy instruments (e.g., the Disabilities of the Arm, Shoulder and Hand Questionnaire) were calculated. Unidimensionality of the Dutch-Flemish PROMIS Pain Interference item bank was supported by CFA tests of model fit (CFI = 0.986, TLI = 0.986). Furthermore, the data fit the GRM and showed good coverage across the pain interference continuum (threshold-parameters range: -3.04 to 3.44). The Dutch-Flemish PROMIS Pain Interference item bank has good cross-cultural validity (only two out of 40 items showing DIF), good reliability (Cronbach’s alpha = 0.98), and good construct validity (Pearson correlations between 0.62 and 0.75). A computer adaptive test (CAT) and Dutch-Flemish PROMIS short forms of the Dutch-Flemish PROMIS Pain Interference item bank can now be developed. PMID:26214178
Crins, Martine H P; Roorda, Leo D; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Terwee, Caroline B
2015-01-01
The Dutch-Flemish PROMIS Group translated the adult PROMIS Pain Interference item bank into Dutch-Flemish. The aims of the current study were to calibrate the parameters of these items using an item response theory (IRT) model, to evaluate the cross-cultural validity of the Dutch-Flemish translations compared to the original English items, and to evaluate their reliability and construct validity. The 40 items in the bank were completed by 1085 Dutch chronic pain patients. Before calibrating the items, IRT model assumptions were evaluated using confirmatory factor analysis (CFA). Items were calibrated using the graded response model (GRM), an IRT model appropriate for items with more than two response options. To evaluate cross-cultural validity, differential item functioning (DIF) for language (Dutch vs. English) was examined. Reliability was evaluated based on standard errors and Cronbach's alpha. To evaluate construct validity correlations with scores on legacy instruments (e.g., the Disabilities of the Arm, Shoulder and Hand Questionnaire) were calculated. Unidimensionality of the Dutch-Flemish PROMIS Pain Interference item bank was supported by CFA tests of model fit (CFI = 0.986, TLI = 0.986). Furthermore, the data fit the GRM and showed good coverage across the pain interference continuum (threshold-parameters range: -3.04 to 3.44). The Dutch-Flemish PROMIS Pain Interference item bank has good cross-cultural validity (only two out of 40 items showing DIF), good reliability (Cronbach's alpha = 0.98), and good construct validity (Pearson correlations between 0.62 and 0.75). A computer adaptive test (CAT) and Dutch-Flemish PROMIS short forms of the Dutch-Flemish PROMIS Pain Interference item bank can now be developed.
Twiss, J; McKenna, S P; Graham, J; Swetz, K; Sloan, J; Gomberg-Maitland, M
2016-04-09
Electronic formats of patient-reported outcome (PRO) measures are now routinely used in clinical research studies. When changing from a validated paper and pen to electronic administration it is necessary to establish their equivalence. This study reports on the value of Rasch analysis in this process. Three groups of US pulmonary hypertension (PH) patients participated. The first completed an electronic version of the CAMPHOR Activity Limitation scale (e-sample) and this was compared with two pen and paper administrated samples (pp1 and pp2). The three databases were combined and analysed for fit to the Rasch model. Equivalence was evaluated by differential item functioning (DIF) analyses. The three datasets were matched randomly in terms of sample size (n = 147). Mean age (years) and percentage of male respondents were as follows: e-sample (51.7, 16.0 %); pp1 (50.0, 14.0 %); pp2 (55.5, 40.4 %). The combined dataset achieved fit to the Rasch model. Two items showed evidence of borderline DIF. Further analyses showed the inclusion of these items had little impact on Rasch estimates indicating the DIF identified was unimportant. Differences between the performance of the electronic and pen and paper administrations of the CAMPHOR Activity Limitation scale were minor. The results were successful in showing how the Rasch model can be used to determine the equivalence of alternative formats of PRO measures.
Dikken, Jeroen; Hoogerduijn, Jita G; Kruitwagen, Cas; Schuurmans, Marieke J
2016-11-01
To assess the content validity and psychometric characteristics of the Knowledge about Older Patients Quiz (KOP-Q), which measures nurses' knowledge regarding older hospitalized adults and their certainty regarding this knowledge. Cross-sectional. Content validity: general hospitals. Psychometric characteristics: nursing school and general hospitals in the Netherlands. Content validity: 12 nurse specialists in geriatrics. Psychometric characteristics: 107 first-year and 78 final-year bachelor of nursing students, 148 registered nurses, and 20 nurse specialists in geriatrics. Content validity: The nurse specialists rated each item of the initial KOP-Q (52 items) on relevance. Ratings were used to calculate Item-Content Validity Index and average Scale-Content Validity Index (S-CVI/ave) scores. Items with insufficient content validity were removed. Psychometric characteristics: Ratings of students, nurses, and nurse specialists were used to test for different item functioning (DIF) and unidimensionality before item characteristics (discrimination and difficulty) were examined using Item Response Theory. Finally, norm references were calculated and nomological validity was assessed. Content validity: Forty-three items remained after assessing content validity (S-CVI/ave = 0.90). Psychometric characteristics: Of the 43 items, two demonstrating ceiling effects and 11 distorting ability estimates (DIF) were subsequently excluded. Item characteristics were assessed for the remaining 30 items, all of which demonstrated good discrimination and difficulty parameters. Knowledge was positively correlated with certainty about this knowledge. The final 30-item KOP-Q is a valid, psychometrically sound, comprehensive instrument that can be used to assess the knowledge of nursing students, hospital nurses, and nurse specialists in geriatrics regarding older hospitalized adults. It can identify knowledge and certainty deficits for research purposes or serve as a tool in educational or quality improvement programs. © 2016, Copyright the Authors Journal compilation © 2016, The American Geriatrics Society.
Debast, Inge; Rossi, Gina; van Alphen, S P J
2018-04-01
The alternative model for personality disorders in the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders ( DSM-5) is considered an important step toward a possibly better conceptualization of personality pathology in older adulthood, by the introduction of levels of personality functioning (Criterion A) and trait dimensions (Criterion B). Our main aim was to examine age-neutrality of the Short Form of the Severity Indices of Personality Problems (SIPP-SF; Criterion A) and Personality Inventory for DSM-5-Brief Form (PID-5-BF; Criterion B). Differential item functioning (DIF) analyses and more specifically the impact on scale level through differential test functioning (DTF) analyses made clear that the SIPP-SF was more age-neutral (6% DIF, only one of four domains showed DTF) than the PID-5-BF (25% DIF, all four tested domains had DTF) in a community sample of older and younger adults. Age differences in convergent validity also point in the direction of differences in underlying constructs. Concurrent and criterion validity in geriatric psychiatry inpatients suggest that both the SIPP-SF scales measuring levels of personality functioning (especially self-functioning) and the PID-5-BF might be useful screening measures in older adults despite age-neutrality not being confirmed.
Djukanovic, Ingrid; Carlsson, Jörg; Årestedt, Kristofer
2017-10-04
The HADS (Hospital Anxiety and Depression Scale) aims to measure symptoms of anxiety (HADS Anxiety) and depression (HADS Depression). The HADS is widely used but has shown ambiguous results both regarding the factor structure and sex differences in the prevalence of depressive symptoms. There is also a lack of psychometric evaluations of the HADS in non-clinical samples of older people. The aim of the study was to evaluate the factor structure of the HADS in a general population 65-80 years old and to exam possible presence of differential item functioning (DIF) with respect to sex. This study was based on data from a Swedish sample, randomized from the total population in the age group 65-80 years (n = 6659). Confirmatory factor analyses (CFA) were performed to examine the factor structure. Ordinal regression analyses were conducted to detect DIF for sex. Reliability was examined by both ordinal as well as traditional Cronbach's alpha. The CFA showed a two-factor model with cross-loadings for two items (7 and 8) had excellent model fit. Internal consistency was good in both subscales, measured with ordinal and traditional alpha. Floor effects were presented for all items. No indication for meaningful DIF regarding sex was found for any of the subscales. HADS Anxiety and HADS Depression are unidimensional measures with acceptable internal consistency and are invariant with regard to sex. Despite pronounced ceiling effects and cross-loadings for item 7 and 8, the hypothesized two-factor model of HADS can be recommended to assess psychological distress among a general population 65-80 years old.
Devine, J; Otto, C; Rose, M; Barthel, D; Fischer, F; Mühlan, H; Mülhan, H; Nolte, S; Schmidt, S; Ottova-Jordan, V; Ravens-Sieberer, U
2015-04-01
Assessing health-related quality of life (HRQoL) via Computerized Adaptive Tests (CAT) provides greater measurement precision coupled with a lower test burden compared to conventional tests. Currently, there are no European pediatric HRQoL CATs available. This manuscript aims at describing the development of a HRQoL CAT for children and adolescents: the Kids-CAT, which was developed based on the established KIDSCREEN-27 HRQoL domain structure. The Kids-CAT was developed combining classical test theory and item response theory methods and using large archival data of European KIDSCREEN norm studies (n = 10,577-19,580). Methods were applied in line with the US PROMIS project. Item bank development included the investigation of unidimensionality, local independence, exploration of Differential Item Functioning (DIF), evaluation of Item Response Curves (IRCs), estimation and norming of item parameters as well as first CAT simulations. The Kids-CAT was successfully built covering five item banks (with 26-46 items each) to measure physical well-being, psychological well-being, parent relations, social support and peers, and school well-being. The Kids-CAT item banks proved excellent psychometric properties: high content validity, unidimensionality, local independence, low DIF, and model conform IRCs. In CAT simulations, seven items were needed to achieve a measurement precision between .8 and .9 (reliability). It has a child-friendly design, is easy accessible online and gives immediate feedback reports of scores. The Kids-CAT has the potential to advance pediatric HRQoL measurement by making it less burdensome and enhancing the patient-doctor communication.
ERIC Educational Resources Information Center
Lee, John Chi-Kin; Zhang, Zhonghua; Yin, Hongbiao
2010-01-01
This article used the multidimensional random coefficients multinomial logit model to examine the construct validity and detect the substantial differential item functioning (DIF) of the Chinese version of motivated strategies for learning questionnaire (MSLQ-CV). A total of 1,354 Hong Kong junior high school students were administered the…
Mitchell, Alex J; Smith, Adam B; Al-salihy, Zerak; Rahim, Twana A; Mahmud, Mahmud Q; Muhyaldin, Asma S
2011-10-01
We aimed to redefine the optimal self-report symptoms of depression suitable for creation of an item bank that could be used in computer adaptive testing or to develop a simplified screening tool for DSM-V. Four hundred subjects (200 patients with primary depression and 200 non-depressed subjects), living in Iraqi Kurdistan were interviewed. The Mini International Neuropsychiatric Interview (MINI) was used to define the presence of major depression (DSM-IV criteria). We examined symptoms of depression using four well-known scales delivered in Kurdish. The Partial Credit Model was applied to each instrument. Common-item equating was subsequently used to create an item bank and differential item functioning (DIF) explored for known subgroups. A symptom level Rasch analysis reduced the original 45 items to 24 items of the original after the exclusion of 21 misfitting items. A further six items (CESD13 and CESD17, HADS-D4, HADS-D5 and HADS-D7, and CDSS3 and CDSS4) were removed due to misfit as the items were added together to form the item bank, and two items were subsequently removed following the DIF analysis by diagnosis (CESD20 and CDSS9, both of which were harder to endorse for women). Therefore the remaining optimal item bank consisted of 17 items and produced an area under the curve (AUC) of 0.987. Using a bank restricted to the optimal nine items revealed only minor loss of accuracy (AUC = 0.989, sensitivity 96%, specificity 95%). Finally, when restricted to only four items accuracy was still high (AUC was still 0.976; sensitivity 93%, specificity 96%). An item bank of 17 items may be useful in computer adaptive testing and nine or even four items may be used to develop a simplified screening tool for DSM-V major depressive disorder (MDD). Further examination of this item bank should be conducted in different cultural settings.
An analysis of the DuPage County Regional Office of Education physics exam
NASA Astrophysics Data System (ADS)
Muehsler, Hans
In 2009, the DuPage County Regional Office of Education (ROE) tasked volunteer physics teachers with creating a basic skills physics exam reflecting what the participants valued and shared in common across curricula. Mechanics, electricity & magnetism (E&M), and wave phenomena emerged as the primary constructs. The resulting exam was intended for first-exposure physics students. The most recently completed version was psychometrically assessed for unidimensionality within the constructs using a robust WLS structural equation model and for reliability. An item analysis using a 3-PL IRT model was performed on the mechanics items and a 2-PL IRT model was performed on the E&M and waves items; a distractor analysis was also performed on all items. Lastly, differential item functioning (DIF) and differential test functioning (DTF) analyses, using the Mantel-Haenszel procedure, were performed using gender, ethnicity, year in school, ELL, physics level, and math level as groupings.
Exploring the impact of disability on self-determination measurement.
Mumbardó-Adam, Cristina; Guàrdia-Olmos, Joan; Giné, Climent
2018-07-01
Self-determination is a psychological construct that applies to both the general population and to individuals with disabilities that can be self-determined with adequate accommodations and opportunities. As the relevance of self-determination-related skills in life has been recently acknowledged, researchers have created a measure to assess self-determination in adolescents and young adults with and without disabilities. The Self-Determination Inventory: Student Report (Spanish interim version) is empirically being validated into Spanish. As this scale is the first assessment addressed to all youth, further exploration of its psychometric properties is required to ensure the reliability of the self-determination measurement and gain further insight into the construct when applied to youth with and without disabilities. More than 600 participants were asked to complete the scale. The impact of disability on the item response distributions across the dimensions of self-determination was explored. Differential item functioning (DIF) was found in only 5 of the scale's 45 items. Differences primary favored youth without disabilities. The weak presence of DIF across the items supports the instrument's psychometrical robustness when measuring self-determination in youth with and without disabilities and provides further understanding of the self-determination construct. Implications and future research directions are also discussed. Copyright © 2018 Elsevier Ltd. All rights reserved.
A Non-Parametric Item Response Theory Evaluation of the CAGE Instrument Among Older Adults.
Abdin, Edimansyah; Sagayadevan, Vathsala; Vaingankar, Janhavi Ajit; Picco, Louisa; Chong, Siow Ann; Subramaniam, Mythily
2018-02-23
The validity of the CAGE using item response theory (IRT) has not yet been examined in older adult population. This study aims to investigate the psychometric properties of the CAGE using both non-parametric and parametric IRT models, assess whether there is any differential item functioning (DIF) by age, gender and ethnicity and examine the measurement precision at the cut-off scores. We used data from the Well-being of the Singapore Elderly study to conduct Mokken scaling analysis (MSA), dichotomous Rasch and 2-parameter logistic IRT models. The measurement precision at the cut-off scores were evaluated using classification accuracy (CA) and classification consistency (CC). The MSA showed the overall scalability H index was 0.459, indicating a medium performing instrument. All items were found to be homogenous, measuring the same construct and able to discriminate well between respondents with high levels of the construct and the ones with lower levels. The item discrimination ranged from 1.07 to 6.73 while the item difficulty ranged from 0.33 to 2.80. Significant DIF was found for 2-item across ethnic group. More than 90% (CC and CA ranged from 92.5% to 94.3%) of the respondents were consistently and accurately classified by the CAGE cut-off scores of 2 and 3. The current study provides new evidence on the validity of the CAGE from the IRT perspective. This study provides valuable information of each item in the assessment of the overall severity of alcohol problem and the precision of the cut-off scores in older adult population.
Munkholm, Anja; Bjorner, Jakob B; Petersen, Janne; Micali, Nadia; Olsen, Else Marie; Skovgaard, Anne Mette
2017-09-01
Previous research suggests that the Eating Pattern Inventory for Children (EPI-C) is best conceptualized as comprising four factors: dietary restraint, emotional, external eating and parental pressure to eat. This study aims to examine the psychometric properties of the EPI-C and to test gender and weight group differences. The population-based study sample comprised 1,939 children aged 11 to 12 years from the Copenhagen Child Cohort (CCC2000). Psychometric properties were evaluated using multigroup categorical data in confirmatory factor analysis (CFA) and differential item functioning (DIF) tests. CFA supported the four-factor solution for the EPI-C. Reliability estimates were satisfactory for three of the four scales. DIF with regard to weight was found for an item on weight loss intention. Girls reported higher restrained and emotional eating; overweight children reported higher restrained, emotional and external eating, while underweight children reported higher parental pressure to eat. The results support the use of EPI-C for measuring eating behaviors in preadolescence.
Differential Item Functioning of the Psychological Domain of the Menopause Rating Scale.
Monterrosa-Castro, Alvaro; Portela-Buelvas, Katherin; Oviedo, Heidi C; Herazo, Edwin; Campo-Arias, Adalberto
2016-01-01
Introduction. Quality of life could be quantified with the Menopause Rating Scale (MRS), which evaluates the severity of somatic, psychological, and urogenital symptoms in menopause. However, differential item functioning (DIF) analysis has not been applied previously. Objective . To establish the DIF of the psychological domain of the MRS in Colombian women. Methods . 4,009 women aged between 40 and 59 years, who participated in the CAVIMEC (Calidad de Vida en la Menopausia y Etnias Colombianas) project, were included. Average age was 49.0 ± 5.9 years. Women were classified in mestizo, Afro-Colombian, and indigenous. The results were presented as averages and standard deviation ( X ± SD). A p value <0.001 was considered statistically significant. Results . In mestizo women, the highest X ± SD were obtained in physical and mental exhaustion (PME) (0.86 ± 0.93) and the lowest ones in anxiety (0.44 ± 0.79). In Afro-Colombian women, an average score of 0.99 ± 1.07 for PME and 0.63 ± 0.88 for anxiety was gotten. Indigenous women obtained an increased average score for PME (1.33 ± 0.93). The lowest score was evidenced in depressive mood (0.50 ± 0.81), which is different from other Colombian women ( p < 0.001). Conclusions . The psychological items of the MRS show differential functioning according to the ethnic group, which may induce systematic error in the measurement of the construct.
Differential Item Functioning of the Psychological Domain of the Menopause Rating Scale
Portela-Buelvas, Katherin; Oviedo, Heidi C.; Herazo, Edwin; Campo-Arias, Adalberto
2016-01-01
Introduction. Quality of life could be quantified with the Menopause Rating Scale (MRS), which evaluates the severity of somatic, psychological, and urogenital symptoms in menopause. However, differential item functioning (DIF) analysis has not been applied previously. Objective. To establish the DIF of the psychological domain of the MRS in Colombian women. Methods. 4,009 women aged between 40 and 59 years, who participated in the CAVIMEC (Calidad de Vida en la Menopausia y Etnias Colombianas) project, were included. Average age was 49.0 ± 5.9 years. Women were classified in mestizo, Afro-Colombian, and indigenous. The results were presented as averages and standard deviation (X ± SD). A p value <0.001 was considered statistically significant. Results. In mestizo women, the highest X ± SD were obtained in physical and mental exhaustion (PME) (0.86 ± 0.93) and the lowest ones in anxiety (0.44 ± 0.79). In Afro-Colombian women, an average score of 0.99 ± 1.07 for PME and 0.63 ± 0.88 for anxiety was gotten. Indigenous women obtained an increased average score for PME (1.33 ± 0.93). The lowest score was evidenced in depressive mood (0.50 ± 0.81), which is different from other Colombian women (p < 0.001). Conclusions. The psychological items of the MRS show differential functioning according to the ethnic group, which may induce systematic error in the measurement of the construct. PMID:27847825
Caronni, Antonio; Zaina, Fabio; Negrini, Stefano
2014-04-01
Scoliosis Research Society-22 (SRS-22) questionnaire was developed to evaluate health-related quality of life (HRQL) in adolescent idiopathic scoliosis (AIS) patients. Rasch analysis (RA) is a statistical procedure which turns questionnaire ordinal scores into interval measures. Measures from Rasch-compatible questionnaires can be used, similar to body temperature or blood pressure, to quantify disease severity progression and treatment efficacy. Purpose of the current work is to present Rasch analysis (RA) of the SRS-22 questionnaire and to develop an SRS-22 Rasch-approved short form. 300 SRS-22 were randomly collected from 2447 consecutive IS adolescents at their first evaluation (229 females; 13.9 ± 1.9 years; 26.9 ± 14.7 Cobb°) in a scoliosis outpatient clinic. RA showed both disordered thresholds and overall misfit of the SRS-22. Sixteen items were re-scored and two misfitting items (6 and 14) removed to obtain a Rasch-compatible questionnaire. Participants HRQL measured too high with the rearranged questionnaire, indicating a severe SRS-22 ceiling effect. RA also highlighted SRS-22 multidimensionality, with pain/function not merging with self-image/mental health items. Item 3 showed differential item functioning (DIF) for both curve and hump amplitude. A 7-item questionnaire (SRS-7) was prepared by selecting single items from the original SRS-22. SRS-7 showed fit to the model, unidimensionality and no DIF. Compared with the SRS-22, the short form scale shows better targeting of the participants' population. RA shows that SRS-22 has poor clinimetric properties; moreover, when used with AIS at first evaluation, SRS-22 is affected by a severe ceiling effect. SRS-7, an SRS-22 7-item short form questionnaire, provides an HRQL interval measure better tailored to these participants. Copyright © 2014 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Wendler, Cathy; Feigenbaum, Miriam; Escandón, Mérida
2001-01-01
The SAT Program undertook two studies aimed at evaluating the impact of allowing students to indicate more than one ethnic/racial category. Results of this study indicated that there is little impact on DIF [differential item functioning] analyses when different definitions of ethnic/racial classifications are used compared to traditionally…
Cavanagh, Anna; Wilson, Coralie J; Caputi, Peter; Kavanagh, David J
2016-09-01
There is some evidence that, in contrast to depressed women, depressed men tend to report alternative symptoms that are not listed as standard diagnostic criteria. This may possibly lead to an under- or misdiagnosis of depression in men. This study aims to clarify whether depressed men and women report different symptoms. This study used data from the 2007 Australian National Survey of Mental Health and Wellbeing that was collected using the World Health Organization's Composite International Diagnostic Interview. Participants with a diagnosis of a depressive disorder with 12-month symptoms (n = 663) were identified and included in this study. Differential item functioning (DIF) was used to test whether depressed men and women endorse different features associated with their condition. Gender-related DIF was present for three symptoms associated with depression. Depressed women were more likely to report 'appetite/weight disturbance', whereas depressed men were more likely to report 'alcohol misuse' and 'substance misuse'. While the results may reflect a greater risk of co-occurring alcohol and substance misuse in men, inclusion of these features in assessments may improve the detection of depression in men, especially if standard depressive symptoms are under-reported. © The Author(s) 2016.
Detecting a Gender-Related DIF Using Logistic Regression and Transformed Item Difficulty
ERIC Educational Resources Information Center
Abedlaziz, Nabeel; Ismail, Wail; Hussin, Zaharah
2011-01-01
Test items are designed to provide information about the examinees. Difficult items are designed to be more demanding and easy items are less so. However, sometimes, test items carry with their demands other than those intended by the test developer (Scheuneman & Gerritz, 1990). When personal attributes such as gender systematically affect…
Forkmann, Thomas; Kroehne, Ulf; Wirtz, Markus; Norra, Christine; Baumeister, Harald; Gauggel, Siegfried; Elhan, Atilla Halil; Tennant, Alan; Boecker, Maren
2013-11-01
This study conducted a simulation study for computer-adaptive testing based on the Aachen Depression Item Bank (ADIB), which was developed for the assessment of depression in persons with somatic diseases. Prior to computer-adaptive test simulation, the ADIB was newly calibrated. Recalibration was performed in a sample of 161 patients treated for a depressive syndrome, 103 patients from cardiology, and 103 patients from otorhinolaryngology (mean age 44.1, SD=14.0; 44.7% female) and was cross-validated in a sample of 117 patients undergoing rehabilitation for cardiac diseases (mean age 58.4, SD=10.5; 24.8% women). Unidimensionality of the itembank was checked and a Rasch analysis was performed that evaluated local dependency (LD), differential item functioning (DIF), item fit and reliability. CAT-simulation was conducted with the total sample and additional simulated data. Recalibration resulted in a strictly unidimensional item bank with 36 items, showing good Rasch model fit (item fit residuals<|2.5|) and no DIF or LD. CAT simulation revealed that 13 items on average were necessary to estimate depression in the range of -2 and +2 logits when terminating at SE≤0.32 and 4 items if using SE≤0.50. Receiver Operating Characteristics analysis showed that θ estimates based on the CAT algorithm have good criterion validity with regard to depression diagnoses (Area Under the Curve≥.78 for all cut-off criteria). The recalibration of the ADIB succeeded and the simulation studies conducted suggest that it has good screening performance in the samples investigated and that it may reasonably add to the improvement of depression assessment. © 2013.
Erhart, M; Hagquist, C; Auquier, P; Rajmil, L; Power, M; Ravens-Sieberer, U
2010-07-01
This study compares item reduction analysis based on classical test theory (maximizing Cronbach's alpha - approach A), with analysis based on the Rasch Partial Credit Model item-fit (approach B), as applied to children and adolescents' health-related quality of life (HRQoL) items. The reliability and structural, cross-cultural and known-group validity of the measures were examined. Within the European KIDSCREEN project, 3019 children and adolescents (8-18 years) from seven European countries answered 19 HRQoL items of the Physical Well-being dimension of a preliminary KIDSCREEN instrument. The Cronbach's alpha and corrected item total correlation (approach A) were compared with infit mean squares and the Q-index item-fit derived according to a partial credit model (approach B). Cross-cultural differential item functioning (DIF ordinal logistic regression approach), structural validity (confirmatory factor analysis and residual correlation) and relative validity (RV) for socio-demographic and health-related factors were calculated for approaches (A) and (B). Approach (A) led to the retention of 13 items, compared with 11 items with approach (B). The item overlap was 69% for (A) and 78% for (B). The correlation coefficient of the summated ratings was 0.93. The Cronbach's alpha was similar for both versions [0.86 (A); 0.85 (B)]. Both approaches selected some items that are not strictly unidimensional and items displaying DIF. RV ratios favoured (A) with regard to socio-demographic aspects. Approach (B) was superior in RV with regard to health-related aspects. Both types of item reduction analysis should be accompanied by additional analyses. Neither of the two approaches was universally superior with regard to cultural, structural and known-group validity. However, the results support the usability of the Rasch method for developing new HRQoL measures for children and adolescents.
Application of Item Response Theory to Tests of Substance-related Associative Memory
Shono, Yusuke; Grenard, Jerry L.; Ames, Susan L.; Stacy, Alan W.
2015-01-01
A substance-related word association test (WAT) is one of the commonly used indirect tests of substance-related implicit associative memory and has been shown to predict substance use. This study applied an item response theory (IRT) modeling approach to evaluate psychometric properties of the alcohol- and marijuana-related WATs and their items among 775 ethnically diverse at-risk adolescents. After examining the IRT assumptions, item fit, and differential item functioning (DIF) across gender and age groups, the original 18 WAT items were reduced to 14- and 15-items in the alcohol- and marijuana-related WAT, respectively. Thereafter, unidimensional one- and two-parameter logistic models (1PL and 2PL models) were fitted to the revised WAT items. The results demonstrated that both alcohol- and marijuana-related WATs have good psychometric properties. These results were discussed in light of the framework of a unified concept of construct validity (Messick, 1975, 1989, 1995). PMID:25134051
Tennant, Alan; Küçükdeveci, Ayse A; Kutlay, Sehim; Elhan, Atilla H
2006-03-23
The Middlesex Elderly Assessment of Mental State (MEAMS) was developed as a screening test to detect cognitive impairment in the elderly. It includes 12 subtests, each having a 'pass score'. A series of tasks were undertaken to adapt the measure for use in the adult population in Turkey and to determine the validity of existing cut points for passing subtests, given the wide range of educational level in the Turkish population. This study focuses on identifying and validating the scoring system of the MEAMS for Turkish adult population. After the translation procedure, 350 normal subjects and 158 acquired brain injury patients were assessed by the Turkish version of MEAMS. Initially, appropriate pass scores for the normal population were determined through ANOVA post-hoc tests according to age, gender and education. Rasch analysis was then used to test the internal construct validity of the scale and the validity of the cut points for pass scores on the pooled data by using Differential Item Functioning (DIF) analysis within the framework of the Rasch model. Data with the initially modified pass scores were analyzed. DIF was found for certain subtests by age and education, but not for gender. Following this, pass scores were further adjusted and data re-fitted to the model. All subtests were found to fit the Rasch model (mean item fit 0.184, SD 0.319; person fit -0.224, SD 0.557) and DIF was then found to be absent. Thus the final pass scores for all subtests were determined. The MEAMS offers a valid assessment of cognitive state for the adult Turkish population, and the revised cut points accommodate for age and education. Further studies are required to ascertain the validity in different diagnostic groups.
Gecht, Judith; Mainz, Verena; Boecker, Maren; Clusmann, Hans; Geiger, Matthias Florian; Tingart, Markus; Quack, Valentin; Gauggel, Siegfried; Heinemann, Allen W; Müller, Christian-Andreas
2017-10-10
Economic environmental factors represent important barriers to participation and have deleterious effects on quality of life (QOL) in persons with spinal diseases (SpD). While economic factors are anchored in the International Classification of Functioning, Disability and Health, their influence on QOL and participation from patients' perspectives is an infrequent focus of research. The aim of the present research is to calibrate a culturally adapted Rasch-based questionnaire assessing economic QOL in patients with SpD. The 11-items of the German economic-QOL-scale were answered by 325 patients with SpD on a four-point Likert-scale. Fit to the Rasch measurement model was investigated by testing for stochastic ordering of the items, unidimensionality, local independence, and differential item functioning (DIF). After adjusting for local dependency, fit to the Rasch model was achieved with a non-significant item-trait interaction (chi-square df = 20 = 34.8, p = 0.021). The person separation reliability equaled 0.88, the scale was free from age- or gender-related DIF, and unidimensionality could be verified. The Rasch-based German version of the economic-QOL-scale represents a suitable instrument to investigate the influences of economic factors on patients' QOL at a group and individual level. It can be easily applied in research and practice and may be administered quickly in combination with other instruments. The short test duration implies a low test burden for patients and a minimum of time expenditure by clinicians when evaluating the results.
Weinfurt, Kevin P; Lin, Li; Bruner, Deborah Watkins; Cyranowski, Jill M; Dombeck, Carrie B; Hahn, Elizabeth A; Jeffery, Diana D; Luecht, Richard M; Magasi, Susan; Porter, Laura S; Reese, Jennifer Barsky; Reeve, Bryce B; Shelby, Rebecca A; Smith, Ashley Wilder; Willse, John T; Flynn, Kathryn E
2015-09-01
The Patient-Reported Outcomes Measurement Information System (PROMIS)(®) Sexual Function and Satisfaction measure (SexFS) version 1.0 was developed with cancer populations. There is a need to expand the SexFS and provide evidence of its validity in diverse populations. The aim of this study was to describe the development of the SexFS v2.0 and present preliminary evidence for its validity. Development built on version 1.0, plus additional review of extant items, discussions with 15 clinical experts, 11 patient focus groups (including individuals with diabetes, heart disease, anxiety, depression, and/or are lesbian, gay, bisexual, or aged 65 or older), 48 cognitive interviews, and psychometric evaluation in a random sample of U.S. adults plus an oversample for specific sexual problems (2281 men, 1686 women). We examined differential item functioning (DIF) by gender and sexual activity. We examined convergent and known-groups validity. The final set of domains includes 11 scored scales (interest in sexual activity, lubrication, vaginal discomfort, clitoral discomfort, labial discomfort, erectile function, orgasm ability, orgasm pleasure, oral dryness, oral discomfort, satisfaction), and six nonscored item pools (screeners, sexual activities, anal discomfort, therapeutic aids, factors interfering with sexual satisfaction, bother). Domains from version 1.0 were reevaluated and improved. Domains considered applicable across gender and sexual activity status, namely interest, orgasm, and satisfaction, were found to have significant DIF. We identified subsets of items in each domain that provided consistent measurement across these important respondent groups. Convergent and known-groups validity was supported. The SexFS version 2.0 has several improvements and enhancements over version 1.0 and other extant measures, including expanded evidence for validity, scores centered around norms for sexually active U.S. adults, new domains, and a final set of items applicable for both men and women and those sexually active with a partner and without. The SexFS is customizable, allowing users to select relevant domains and items for their study. © 2015 International Society for Sexual Medicine.
ERIC Educational Resources Information Center
Wu, Pei-Chen
2010-01-01
The objectives of this study were (a) to investigate whether items of the Chinese version of Beck Depression Inventory II (BDI-II-C; "Chinese Behavioral Science Corporation" in "Manual for the Beck Depression Inventory-II" [in Chinese]. The Chinese Behavioral Science Corporation, Taiwan, 2000) exhibited DIF across adolescent…
ERIC Educational Resources Information Center
Lambert, Matthew C.; Garcia, Allen G.; January, Stacy-Ann A.; Epstein, Michael H.
2018-01-01
There have been significant changes in the racial/ethnic and linguistic background of students attending public schools in the United States. The number of public-school students who are English language learners (ELLs) participating in programs of language assistance has more than doubled over the past two decades. In 1993-1994, 5.1% of…
Medvedev, Oleg N; Turner-Stokes, Lynne; Ashford, Stephen; Siegert, Richard J
2018-02-28
To determine whether the UK Functional Assessment Measure (UK FIM+FAM) fits the Rasch model in stroke patients with complex disability and, if so, to derive a conversion table of Rasch-transformed interval level scores. The sample included a UK multicentre cohort of 1,318 patients admitted for specialist rehabilitation following a stroke. Rasch analysis was conducted for the 30-item scale including 3 domains of items measuring physical, communication and psychosocial functions. The fit of items to the Rasch model was examined using 3 different analytical approaches referred to as "pathways". The best fit was achieved in the pathway where responses from motor, communication and psychosocial domains were summarized into 3 super-items and where some items were split because of differential item functioning (DIF) relative to left and right hemisphere location (χ2 (10) = 14.48, p = 0.15). Re-scoring of items showing disordered thresholds did not significantly improve the overall model fit. The UK FIM+FAM with domain super-items satisfies expectations of the unidimensional Rasch model without the need for re-scoring. A conversion table was produced to convert the total scale scores into interval-level data based on person estimates of the Rasch model. The clinical benefits of interval-transformed scores require further evaluation.
Lin, Chung-Ying; Broström, Anders; Nilsen, Per; Griffiths, Mark D; Pakpour, Amir H
2017-12-01
Background and aims The Bergen Social Media Addiction Scale (BSMAS), a six-item self-report scale that is a brief and effective psychometric instrument for assessing at-risk social media addiction on the Internet. However, its psychometric properties in Persian have never been examined and no studies have applied Rasch analysis for the psychometric testing. This study aimed to verify the construct validity of the Persian BSMAS using confirmatory factor analysis (CFA) and Rasch models among 2,676 Iranian adolescents. Methods In addition to construct validity, measurement invariance in CFA and differential item functioning (DIF) in Rasch analysis across gender were tested for in the Persian BSMAS. Results Both CFA [comparative fit index (CFI) = 0.993; Tucker-Lewis index (TLI) = 0.989; root mean square error of approximation (RMSEA) = 0.057; standardized root mean square residual (SRMR) = 0.039] and Rasch (infit MnSq = 0.88-1.28; outfit MnSq = 0.86-1.22) confirmed the unidimensionality of the BSMAS. Moreover, measurement invariance was supported in multigroup CFA including metric invariance (ΔCFI = -0.001; ΔSRMR = 0.003; ΔRMSEA = -0.005) and scalar invariance (ΔCFI = -0.002; ΔSRMR = 0.005; ΔRMSEA = 0.001) across gender. No item displayed DIF (DIF contrast = -0.48 to 0.24) in Rasch across gender. Conclusions Given the Persian BSMAS was unidimensional, it is concluded that the instrument can be used to assess how an adolescent is addicted to social media on the Internet. Moreover, users of the instrument may comfortably compare the sum scores of the BSMAS across gender.
Kasitanon, N; Wangkaew, S; Puntana, S; Sukitawut, W; Leong, K P; Louthrenoo, W
2013-03-01
The English version of the Systemic Lupus Erythematosus Quality of Life Questionnaire (SLEQOL) is a validated disease-specific quality of life instrument. The aim of this study was to evaluate the psychometric properties of the Thai version of the SLEQOL (SLEQOL-TH). Two independent translators translated the SLEQOL into Thai. The back translation of this version was performed by two other independent translators. The final version, SLEQOL-TH, was completed after resolving the discrepancies revealed by the back translation. One hundred and nine patients with SLE were enrolled to test the reliability, construct validity, floor and ceiling effects, and sensitivity to the changes of the SLEQOL-TH at six months. The differential item functioning (DIF) between the Thai and English versions was analyzed using the partial gamma. The internal consistency of the SLEQOL-TH was satisfactory with the overall Cronbach's alpha of 0.86. The test-retest reliability of the SLEQOL-TH was acceptable with the intra-class correlation coefficient of 0.86. Low correlations between the SLEQOL-TH and SLEDAI were observed. The total score of the SLEQOL-TH was moderately responsive to changes in quality of life, with a standardized response mean of 0.50. When comparing the SLEQOL-TH from Thai SLE patients with the original SLEQOL version obtained from Singapore SLE patients, 11 out of 40 items showed a moderate to large DIF. The SLEQOL-TH has acceptable psychometric properties and shows construct validity. In comparison with the English version of SLEQOL, there are some items that showed DIF. The applicability of the SLEQOL-TH in real-life clinical practice and clinical trials needs to be determined.
[Development of patient-reported outcome scale for myasthenia gravis: a psychometric test].
Chen, Xin-lin; Liu, Feng-bin; Guo, Li; Liu, Xiao-bin
2010-02-01
To investigate the scientificity of patient-reported outcome (PRO) scale for myasthenia gravis (MG), which was used to evaluate the clinical effects of traditional Chinese and Western medicine treatment on MG patients. Psychometric performance of the MG-PRO scale was also expected to be evaluated in this study. A total of 100 MG patients and 100 healthy people were face-to-face interviewed by well-trained investigators, and the data of MG-PRO scale were collected. The classical theory test (CTT) and item response theory (IRT) methods were used to analyze the psychometric performance such as validity, reliability, person separation index (PSI) and differential item functioning (DIF) in the MG-PRO scale. The results of CTT analysis showed that the split-half reliabilities of the MG-PRO scale and each dimension were greater than 0.7. In the analysis of internal consistency of each dimension, the Cronbach's alpha was greater than 0.8. Each facet had greater correlation with its dimension than the other dimensions. Four principal components were extracted by exploratory factor analysis, which represented all dimensions of the scale, and the cumulative variance was 55.54%. The scores of each of the 8 facets between MG patients and healthy people were different (P<0.01). The results of IRT showed that the PSI of each model was greater than 0.8, and all items did not have uniform DIF and non-uniform DIF. The MG-PRO scale reflects the definition and connotation of quality of life and contains special issues of MG patients as well, and shows good reliability (split-half reliability, Cronbach's alpha), validity (content validity, construct validity, discriminate validity) from the results of CTT, and good psychometric performance from the results of IRT.
Simpelaere, Ingeborg S; Van Nuffelen, Gwen; De Bodt, Marc; Vanderwegen, Jan; Hansen, Tina
2017-04-07
The Swallowing Quality-of-Life Questionnaire (SWAL-QoL) is considered the gold standard for assessing health-related QoL in oropharyngeal dysphagia. The Dutch translation (DSWAL-QoL) and its adjusted version (aDSWAL-QoL) have been validated using classical test theory (CTT). However, these scales have not been tested against the Rasch measurement model, which is required to establish the structural validity and objectivity of the total scale and subscale scores. Thus, the purpose of this study was to examine the psychometric properties of these scales using item analysis according to the Rasch model. Item analysis with the Rasch model was performed using RUMM2030 software with previously collected data from a validation study of 108 patients. The assessment included evaluations of overall model fit, reliability, unidimensionality, threshold ordering, individual item and person fits, differential item functioning (DIF), local item dependency (LID) and targeting. The analysis could not establish the psychometric properties of either of the scales or their subscales because they did not fit the Rasch model, and multidimensionality, disordered thresholds, DIF, and/or LID were found. The reliability and power of fit were high for the total scales (PSI = 0.93) but low for most of the subscales (PSI < 0.70). The targeting of persons and items was suboptimal. The main source of misfit was disordered thresholds for both the total scales and subscales. Based on the results of the analysis, adjustments to improve the scales were implemented as follows: disordered thresholds were rescaled, misfit items were removed and items were split for DIF. However, the multidimensionality and LID could not be resolved. The reliability and power of fit remained low for most of the subscales. This study represents the first analyses of the DSWAL-QoL and aDSWAL-QoL with the Rasch model. Relying on the DSWAL-QoL and aDSWAL-QoL total and subscale scores to make conclusions regarding dysphagia-related HRQoL should be treated with caution before the structural validity and objectivity of both scales have been established. A larger and well-targeted sample is recommended to derive definitive conclusions about the items and scales. Solutions for the psychometric weaknesses suggested by the model and practical implications are discussed.
Lin, Chung-Ying; Ku, Li-Jung Elizabeth; Pakpour, Amir H
2017-11-01
The Zarit Burden Interview (ZBI) is a commonly used self-report to assess caregiver burden. A 12-item short form of the ZBI has been developed; however, its measurement invariance has not been examined across some different demographics. It is unclear whether different genders and educational levels of a population interpret the ZBI items similarly. Therefore, this study aimed to examine the measurement invariance of the 12-item ZBI across gender and educational levels in a Taiwanese sample. Caregivers who had a family member with dementia (n = 270) completed the ZBI through telephone interviews. Three confirmatory factor analysis (CFA) models were conducted: Model 1 was the configural model, Model 2 constrained all factor loadings, Model 3 constrained all factor loadings and item intercepts. Multiple group CFAs and the differential item functioning (DIF) contrast under Rasch analyses were used to detect measurement invariance across males (n = 100) and females (n = 170) and across educational levels of junior high schools and below (n = 86) and senior high schools and above (n = 183). The fit index differences between models supported the measurement invariance across gender and across educational levels (∆ comparative fit index (CFI) = -0.010 and 0.003; ∆ root mean square error of approximation (RMSEA) = -0.006 to 0.004). No substantial DIF contrast was found across gender and educational levels (value = -0.36 to 0.29). The ZBI is appropriate for combined use and for comparisons in caregivers across gender and different educational levels in Taiwan.
Heinemann, Allen W; Lai, Jin-Shei; Wong, Alex; Dashner, Jessica; Magasi, Susan; Hahn, Elizabeth A; Carlozzi, Noelle E; Tulsky, David S; Jerousek, Sara; Semik, Patrick; Miskovic, Ana; Gray, David B
2016-11-01
To develop a measure of natural environment and human-made change features (Chapter 2 of the international classification of functioning, disability, and health) and evaluate the influence of perceived barriers on health-related quality of life. A sample of 570 adults with stroke, spinal cord injury, and traumatic brain injury residing in community settings reported their functioning in home, outdoor, and community settings (mean age = 47.0 years, SD = 16.1). They rated 18 items with a 5-point rating scale to describe the influence of barriers to moving around, seeing objects, hearing sounds, hearing conversations, feeling safe, and regulating temperature and indicated whether any difficulties were due to environmental features. We used Rasch analysis to identify misfitting items and evaluate differential item functioning (DIF) across impairment groups. We computed correlations between barriers and patient-reported outcomes measurement information system (PROMIS) social domain measures and community participation indicators (CPI) measures. The 18 items demonstrated person reliability of .70, discriminating nearly three levels of barriers. All items fit the Rasch model; impairment-related DIF was negligible. Ceiling effects were negligible, but 25 % of the respondents were at the floor, indicating that they did not experience barriers that they attributed to the built and natural environment. As anticipated, barriers correlated moderately with PROMIS and CPI variables, suggesting that although this new item bank measures a construct that is related to participation and health-related quality of life, it also captures something unique. Known-groups validity was supported by wheelchair users reporting a higher level of barriers than did ambulatory respondents. Preliminary evidence supports the reliability and validity of this new measure of barriers to the built and natural environment. This measure allows investigators and clinicians to measure perceptions of the natural environment and human-made changes, providing information that can guide interventions to reduce barriers. Moderate relationships between barriers and PROMIS and CPI variables provide support for the measurement and theory of environmental influences on social health and participation.
Jerosch-Herold, Christina; Chester, Rachel; Shepstone, Lee; Vincent, Joshua I; MacDermid, Joy C
2018-02-01
The shoulder pain and disability index (SPADI) has been extensively evaluated for its psychometric properties using classical test theory (CTT). The purpose of this study was to evaluate its structural validity using Rasch model analysis. Responses to the SPADI from 1030 patients referred for physiotherapy with shoulder pain and enrolled in a prospective cohort study were available for Rasch model analysis. Overall fit, individual person and item fit, response format, dependence, unidimensionality, targeting, reliability and differential item functioning (DIF) were examined. The SPADI pain subscale initially demonstrated a misfit due to DIF by age and gender. After iterative analysis it showed good fit to the Rasch model with acceptable targeting and unidimensionality (overall fit Chi-square statistic 57.2, p = 0.1; mean item fit residual 0.19 (1.5) and mean person fit residual 0.44 (1.1); person separation index (PSI) of 0.83. The disability subscale however shows significant misfit due to uniform DIF even after iterative analyses were used to explore different solutions to the sources of misfit (overall fit (Chi-square statistic 57.2, p = 0.1); mean item fit residual 0.54 (1.26) and mean person fit residual 0.38 (1.0); PSI 0.84). Rasch Model analysis of the SPADI has identified some strengths and limitations not previously observed using CTT methods. The SPADI should be treated as two separate subscales. The SPADI is a widely used outcome measure in clinical practice and research; however, the scores derived from it must be interpreted with caution. The pain subscale fits the Rasch model expectations well. The disability subscale does not fit the Rasch model and its current format does not meet the criteria for true interval-level measurement required for use as a primary endpoint in clinical trials. Clinicians should therefore exercise caution when interpreting score changes on the disability subscale and attempt to compare their scores to age- and sex-stratified data.
Oude Voshaar, Martijn A H; Ten Klooster, Peter M; Vonkeman, Harald E; van de Laar, Mart A F J
2017-11-01
Traditional patient-reported physical function instruments often poorly differentiate patients with mild-to-moderate disability. We describe the development and psychometric evaluation of a generic item bank for measuring everyday activity limitations in outpatient populations. Seventy-two items generated from patient interviews and mapped to the International Classification of Functioning, Disability and Health (ICF) domestic life chapter were administered to 1128 adults representative of the Dutch population. The partial credit model was fitted to the item responses and evaluated with respect to its assumptions, model fit, and differential item functioning (DIF). Measurement performance of a computerized adaptive testing (CAT) algorithm was compared with the SF-36 physical functioning scale (PF-10). A final bank of 41 items was developed. All items demonstrated acceptable fit to the partial credit model and measurement invariance across age, sex, and educational level. Five- and ten-item CAT simulations were shown to have high measurement precision, which exceeded that of SF-36 physical functioning scale across the physical function continuum. Floor effects were absent for a 10-item empirical CAT simulation, and ceiling effects were low (13.5%) compared with SF-36 physical functioning (38.1%). CAT also discriminated better than SF-36 physical functioning between age groups, number of chronic conditions, and respondents with or without rheumatic conditions. The Rasch assessment of everyday activity limitations (REAL) item bank will hopefully prove a useful instrument for assessing everyday activity limitations. T-scores obtained using derived measures can be used to benchmark physical function outcomes against the general Dutch adult population.
Rasch Analysis of the Power as Knowing Participation in Change Tool--the Brazilian version.
Guedes, Erika de Souza; Orozco-Vargas, Luiz Carlos; Turrini, Ruth Natália Teresa; de Sousa, Regina Márcia Cardoso; dos Santos, Mariana Alvina; da Cruz, Diná de Almeida Lopes Monteiro
2013-01-01
the objective of this study was to evaluate the items contained in the Brazilian version of the Power as Knowing Participation in Change Tool (PKPCT). investigation of the psychometric properties of the mentioned questionnaire through Rasch analysis. the data from 952 nursing assistants and 627 baccalaureate nurses were analyzed (average age 44.1 (SD=9.5); 13.0% men). The subscales Choices, Awareness, Freedom and Involvement were tested separately and presented unidimensionality; the categories of the responses given to the items were compiled from 7 to 3 levels and the items fit the model well, except for the following/leading item, in which the infit and outfit values were above 1.4; this item has also presented Differential Item Functioning (DIF) according to the participant's role. The reliability of the items was of 0.99 and the reliability of the participants ranged from 0.80 to 0.84 in the subscales. Items with extremely high levels of difficulty were not identified. the PKPCT should not be viewed as unidimensional, items with extremely high levels of difficulty in the scale need to be created and the differential functioning of some items has to be further investigated.
Hong, Ickpyo; Lee, Mi Jung; Kim, Moon Young; Park, Hae Yean
2017-10-01
The aim of this study is to investigate the psychometrics of the 12 items of an instrument assessing activities of daily living (ADL) using an item response theory model. A total of 648 adults with physical disabilities and having difficulties in ADLs were retrieved from the 2014 Korean National Survey on People with Disabilities. The psychometric testing included factor analysis, internal consistency, precision, and differential item functioning (DIF) across categories including sex, older age, marital status, and physical impairment area. The sample had a mean age of 69.7 years old (SD = 13.7). The majority of the sample had lower extremity impairments (62.0%) and had at least 2.1 chronic conditions. The instrument demonstrated unidimensional construct and good internal consistency (Cronbach's alpha = 0.95). The instrument precisely estimated person measures within a wide range of theta values (-2.22 logits < θ < 0.27 logits) with a reliability of 0.9. Only the changing position item demonstrated misfit (χ 2 = 36.6, df = 17, p = 0.0038), and the dressing item demonstrated DIF on the impairment type (upper extremity/others, McFadden's Pseudo R 2 > 5.0%). Our findings indicate that the dressing item would need to be modified to improve its psychometrics. Overall, the ADL instrument demonstrates good psychometrics, and thus, it may be used as a standardized instrument for measuring disability in rehabilitation contexts. However, the findings are limited to adults with physical disabilities. Future studies should replicate psychometric testing for survey respondents with other disorders and for children.
Hendriks, Jacqueline; Fyfe, Sue; Styles, Irene; Skinner, S Rachel; Merriman, Gareth
2012-01-01
Measurement scales seeking to quantify latent traits like attitudes, are often developed using traditional psychometric approaches. Application of the Rasch unidimensional measurement model may complement or replace these techniques, as the model can be used to construct scales and check their psychometric properties. If data fit the model, then a scale with invariant measurement properties, including interval-level scores, will have been developed. This paper highlights the unique properties of the Rasch model. Items developed to measure adolescent attitudes towards abortion are used to exemplify the process. Ten attitude and intention items relating to abortion were answered by 406 adolescents aged 12 to 19 years, as part of the "Teen Relationships Study". The sampling framework captured a range of sexual and pregnancy experiences. Items were assessed for fit to the Rasch model including checks for Differential Item Functioning (DIF) by gender, sexual experience or pregnancy experience. Rasch analysis of the original dataset initially demonstrated that some items did not fit the model. Rescoring of one item (B5) and removal of another (L31) resulted in fit, as shown by a non-significant item-trait interaction total chi-square and a mean log residual fit statistic for items of -0.05 (SD=1.43). No DIF existed for the revised scale. However, items did not distinguish as well amongst persons with the most intense attitudes as they did for other persons. A person separation index of 0.82 indicated good reliability. Application of the Rasch model produced a valid and reliable scale measuring adolescent attitudes towards abortion, with stable measurement properties. The Rasch process provided an extensive range of diagnostic information concerning item and person fit, enabling changes to be made to scale items. This example shows the value of the Rasch model in developing scales for both social science and health disciplines.
Gibbons, Chris J; Thornton, Everard W; Ealing, John; Shaw, Pamela J; Talbot, Kevin; Tennant, Alan; Young, Carolyn A
2013-11-15
Social withdrawal is described as the condition in which an individual experiences a desire to make social contact, but is unable to satisfy that desire. It is an important issue for patients with motor neurone disease who are likely to experience severe physical impairment. This study aims to reassess the psychometric and scaling properties of the MND Social Withdrawal Scale (MND-SWS) domains and examine the feasibility of a summary scale, by applying scale data to the Rasch model. The MND Social Withdrawal Scale was administered to 298 patients with a diagnosis of MND, alongside the Hospital Anxiety and Depression Scale. The factor structure of the MND Social Withdrawal Scale was assessed using confirmatory factor analysis. Model fit, category threshold analysis, differential item functioning (DIF), dimensionality and local dependency were evaluated. Factor analysis confirmed the suitability of the four-factor solution suggested by the original authors. Mokken scale analysis suggested the removal of item five. Rasch analysis removed a further three items; from the Community (one item) and Emotional (two items) withdrawal subscales. Following item reduction, each scale exhibited excellent fit to the Rasch model. A 14-item Summary scale was shown to fit the Rasch model after subtesting the items into three subtests corresponding to the Community, Family and Emotional subscales, indicating that items from these three subscales could be summed together to create a total measure for social withdrawal. Removal of four items from the Social Withdrawal Scale led to a four factor solution with a 14-item hierarchical Summary scale that were all unidimensional, free for DIF and well fitted to the Rasch model. The scale is reliable and allows clinicians and researchers to measure social withdrawal in MND along a unidimensional construct. © 2013. Published by Elsevier B.V. All rights reserved.
Development and Validation of the Spanish Numeracy Understanding in Medicine Instrument.
Jacobs, Elizabeth A; Walker, Cindy M; Miller, Tamara; Fletcher, Kathlyn E; Ganschow, Pamela S; Imbert, Diana; O'Connell, Maria; Neuner, Joan M; Schapira, Marilyn M
2016-11-01
The Spanish-speaking population in the U.S. is large and growing and is known to have lower health literacy than the English-speaking population. Less is known about the health numeracy of this population due to a lack of health numeracy measures in Spanish. we aimed to develop and validate a short and easy to use measure of health numeracy for Spanish-speaking adults: the Spanish Numeracy Understanding in Medicine Instrument (Spanish-NUMi). Items were generated based on qualitative studies in English- and Spanish-speaking adults and translated into Spanish using a group translation and consensus process. Candidate items for the Spanish NUMi were selected from an eight-item validated English Short NUMi. Differential Item Functioning (DIF) was conducted to evaluate equivalence between English and Spanish items. Cronbach's alpha was computed as a measure of reliability and a Pearson's correlation was used to evaluate the association between test scores and the Spanish Test of Functional Health Literacy (S-TOFHLA) and education level. Two-hundred and thirty-two Spanish-speaking Chicago residents were included in the study. The study population was diverse in age, gender, and level of education and 70 % reported Mexico as their country of origin. Two items of the English eight-item Short NUMi demonstrated DIF and were dropped. The resulting six-item test had a Cronbach's alpha of 0.72, a range of difficulty using classical test statistics (percent correct: 0.48 to 0.86), and adequate discrimination (item-total score correlation: 0.34-0.49). Scores were positively correlated with print literacy as measured by the S- TOFHLA (r = 0.67; p < 0.001) and varied as predicted across grade level; mean scores for up to eighth grade, ninth through twelfth grade, and some college experience or more, respectively, were 2.48 (SD ± 1.64), 4.15 (SD ± 1.45), and 4.82 (SD ± 0.37). The Spanish NUMi is a reliable and valid measure of important numerical concepts used in communicating health information.
Peter, Claudio; Schulenberg, Stefan E; Buchanan, Erin M; Prodinger, Birgit; Geyh, Szilvia
2016-02-01
To evaluate the metric properties of distinct measures of psychological personal factors comprising feelings, beliefs, motives, and patterns of experience and behaviour assessed in the Swiss Spinal Cord Injury Cohort Study (SwiSCI), using Rasch methodology. SwiSCI Pathway 2 is a community-based, nationwide, cross-sectional survey for persons with spinal cord injury (SCI) (n = 511). The Rasch partial credit model was used for each subscale of the Positive Affect Negative Affect Scale (PANAS), Appraisal of Life Events Scale (ALE), Purpose in Life test - Short Form (PIL-SF), and the Big Five Inventory-K (BFI-K). The measures were unidimensional, with the exception of the positive affect items of the PANAS, where pairwise t-tests resulted in 10% significant cases, indicating multidimensionality. The BFI-K subscale agreeableness revealed low reliability (0.53). Other reliability estimates ranged between 0.61 and 0.89. Ceiling and floor effects were found for most measures. SCI-related differential item functioning (DIF) was rarely found. Language DIF was identified for several items of the BFI-K, PANAS and the ALE, but not for the PIL-SF. A majority of the measures satisfy the assumptions of the Rasch model, including unidimensionality. Invariance across language versions still represents a major challenge.
Measurement of Women's Agency in Egypt: A National Validation Study.
Yount, Kathryn M; VanderEnde, Kristin E; Dodell, Sylvie; Cheong, Yuk Fai
2016-09-01
Despite widespread assumptions about women's empowerment and agency in the Arab Middle East, psychometric research of these constructs is limited. Using national data from 6214 married women ages 16-49 who took part in the 2006 Egypt Labor Market Panel Survey, we applied factor analysis to explore and then to test the factor structure of women's agency. We then used multiple indicator multiple cause structural equations models to test for differential item functioning (DIF) by women's age at first marriage, a potential resource for women's agency. Our results confirm that women's agency in Egypt is multi-dimensional and comprised of their (1) influence in family decisions, including those reserved for men, (2) freedom of movement in public spaces, and (3) attitudes about gender, specifically violence against wives. These dimensions confirm those explored previously in selected rural areas of Egypt and South Asia. Yet, three items showed significant uniform DIF by women's categorical age at first marriage, with and without a control for women's age in years. Models adjusting for DIF and women's age in years showed that women's older age at first marriage was positively associated with the factor means for family decision-making and gender-violence attitudes, but not freedom of movement. Our findings reveal the value of our analytical strategy for research on the dimensions and determinants of women's agency. Our approach offers a promising model to discern "hierarchies of evidence" for social policies and programs to enhance women's empowerment.
Pilcher, June J; Switzer, Fred S; Munc, Alec; Donnelly, Janet; Jellen, Julia C; Lamm, Claus
2018-04-01
The purpose of this study is to examine the psychometric properties of the Epworth Sleepiness Scale (ESS) in two languages, German and English. Students from a university in Austria (N = 292; 55 males; mean age = 18.71 ± 1.71 years; 237 females; mean age = 18.24 ± 0.88 years) and a university in the US (N = 329; 128 males; mean age = 18.71 ± 0.88 years; 201 females; mean age = 21.59 ± 2.27 years) completed the ESS. An exploratory-factor analysis was completed to examine dimensionality of the ESS. Item response theory (IRT) analyses were used to provide information about the response rates on the items on the ESS and provide differential item functioning (DIF) analyses to examine whether the items were interpreted differently between the two languages. The factor analyses suggest that the ESS measures two distinct sleepiness constructs. These constructs indicate that the ESS is probing sleepiness in settings requiring active versus passive responding. The IRT analyses found that overall, the items on the ESS perform well as a measure of sleepiness. However, Item 8 and to a lesser extent Item 6 were being interpreted differently by respondents in comparison to the other items. In addition, the DIF analyses showed that the responses between German and English were very similar indicating that there are only minor measurement differences between the two language versions of the ESS. These findings suggest that the ESS provides a reliable measure of propensity to sleepiness; however, it does convey a two-factor approach to sleepiness. Researchers and clinicians can use the German and English versions of the ESS but may wish to exclude Item 8 when calculating a total sleepiness score.
Wong, Eric; Ungvari, Gabor S; Leung, Siu-Kau; Tang, Wai-Kwong
2007-01-01
Catatonic signs and symptoms are frequently observed in patients with chronic schizophrenia. Clinical surveys have suggested that the composition of catatonic syndrome occurring in chronic schizophrenia may be different from what is found in acute psychiatric disorders or medical conditions. Consequently, this patient population may need tailor-made rating instruments for catatonia. The aim of the present study was to examine the suitability and accuracy of using the Bush-Francis Catatonia Rating Scale (BFCRS) in chronic schizophrenia inpatients. The unidimensionality (optimal number of items; item fit), and the scoring scheme (the optimal number of scoring categories) of the BFCRS were determined in a random sample of 225 patients with chronic schizophrenia applying Rasch analysis. In addition, differential item functioning (DIF) analysis was also performed. The BFCRS proved to be unidimensional apart from three misfit and one marginally misfit items. The three misfit items were removed from the scale thereby constructing a revised version called BFCRS-R. Since the original BFCRS (BFCRS-O) showed no increase across items across steep gradients (poor endorsability of step calibrations), in BFCRS-R a binary scale ('absent' versus 'present' choices only) was constructed instead of the scoring scheme of 0-3. The 20-item BFCRS-R showed improved psychometric properties in that it had a higher item separation index than BFCRS-O. BFCRS-R mean logit was closer to zero indicating that the items on the scale and the subjects were better matched than in BFCRS-O. DIF analysis showed that certain items of both versions of BFCRS were influenced by the presence of negative symptoms. BFCRS-R is shorter and simpler than the original version and having better psychometric properties seems to be better suited for identifying and quantifying catatonia in chronic psychotic patients. Copyright (c) 2007 John Wiley & Sons, Ltd.
Vaingankar, Janhavi Ajit; Subramaniam, Mythily; Chong, Siow Ann; Abdin, Edimansyah; Orlando Edelen, Maria; Picco, Louisa; Lim, Yee Wei; Phua, Mei Yen; Chua, Boon Yiang; Tee, Joseph Y S; Sherbourne, Cathy
2011-10-31
Instruments to measure mental health and well-being are largely developed and often used within Western populations and this compromises their validity in other cultures. A previous qualitative study in Singapore demonstrated the relevance of spiritual and religious practices to mental health, a dimension currently not included in exiting multi-dimensional measures. The objective of this study was to develop a self-administered measure that covers all key and culturally appropriate domains of mental health, which can be applied to compare levels of mental health across different age, gender and ethnic groups. We present the item reduction and validation of the Positive Mental Health (PMH) instrument in a community-based adult sample in Singapore. Surveys were conducted among adult (21-65 years) residents belonging to Chinese, Malay and Indian ethnicities. Exploratory and confirmatory factor analysis (EFA, CFA) were conducted and items were reduced using item response theory tests (IRT). The final version of the PMH instrument was tested for internal consistency and criterion validity. Items were tested for differential item functioning (DIF) to check if items functioned in the same way across all subgroups. EFA and CFA identified six first-order factor structure (General coping, Personal growth and autonomy, Spirituality, Interpersonal skills, Emotional support, and Global affect) under one higher-order dimension of Positive Mental Health (RMSEA=0.05, CFI=0.96, TLI=0.96). A 47-item self-administered multi-dimensional instrument with a six-point Likert response scale was constructed. The slope estimates and strength of the relation to the theta for all items in each six PMH subscales were high (range:1.39 to 5.69), suggesting good discrimination properties. The threshold estimates for the instrument ranged from -3.45 to 1.61 indicating that the instrument covers entire spectrums for the six dimensions. The instrument demonstrated high internal consistency and had significant and expected correlations with other well-being measures. Results confirmed absence of DIF. The PMH instrument is a reliable and valid instrument that can be used to measure and compare level of mental health across different age, gender and ethnic groups in Singapore.
Age and gender differences in depression across adolescence: real or 'bias'?
van Beek, Yolanda; Hessen, David J; Hutteman, Roos; Verhulp, Esmée E; van Leuven, Mirande
2012-09-01
Since developmental psychologists are interested in explaining age and gender differences in depression across adolescence, it is important to investigate to what extent these observed differences can be attributed to measurement bias. Measurement bias may arise when the phenomenology of depression varies with age or gender, i.e., when younger versus older adolescents or girls versus boys differ in the way depression is experienced or expressed. The Children's Depression Inventory (CDI) was administered to a large school population (N = 4048) aged 8-17 years. A 4-factor model was selected by means of factor analyses for ordered categorical measures. For each of the four factor scales measurement invariance with respect to gender and age (late childhood, early and middle adolescence) was tested using item response theory analyses. Subsequently, to examine which items contributed to measurement bias, all items were studied for differential item functioning (DIF). Finally, it was investigated how developmental patterns changed if measurement biases were accounted for. For each of the factors Self-Deprecation, Dysphoria, School Problems, and Social Problems measurement bias with respect to both gender and age was found and many items showed DIF. Developmental patterns changed profoundly when measurement bias was taken into account. The CDI seemed to particularly overestimate depression in late childhood, and underestimate depression in middle adolescent boys. For scientific as well as clinical use of the CDI, measurement bias with respect to gender and age should be accounted for. © 2012 The Authors. Journal of Child Psychology and Psychiatry © 2012 Association for Child and Adolescent Mental Health.
Measurement invariance across Genders on the Childhood Illness Attitude Scales (CIAS).
Thorisdottir, Audur S; Villadsen, Anna; LeBouthillier, Daniel M; Rask, Charlotte Ulrikka; Wright, Kristi D; Walker, John R; Feldgaier, Steven; Asmundson, Gordon J G
2017-07-01
The Childhood Illness Attitude Scales (CIAS) were created as a developmentally appropriate measure for symptoms of health anxiety (HA) in school-aged children. Despite overall sound psychometric properties reported in previous studies, more comprehensive examination of the latent structure and potential response bias in the CIAS is needed. The purpose of the present study was to cross-validate the latent structure of the CIAS across genders and to examine gender-specific variations in CIAS scores. The sample comprised data from 602 Canadian and Danish school-aged children (M age =10.54, SD=0.99; 52.5% girls). Confirmatory factor analyses were conducted to test 3-, modified 3-, and 4-factor models in both samples. Multigroup confirmatory factor analysis was performed to test factor structure invariance across boys and girls in a combined sample. Differential Item Functioning (DIF) was assessed using test characteristic curves. A modified 3-factor solution (i.e., fears=11 items, help-seeking=6 items, and symptom effects=4 items) provided the best fit to the data (χ 2 (364, N=602)=681.7, p<0.001; χ 2 /df=1.803; RMSEA=0.037; CFI=0.926). The factor structure was stable, well-fitting, and indicated measurement invariance across groups. DIF analyses revealed no gender-based response bias at the scale level. Results support a revised 3-factor version of the CIAS that can be used with confidence to assess symptoms of HA in school-aged boys and girls. Copyright © 2017 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Rachmatullah, A.; Octavianda, R. P.; Ha, M.; Rustaman, N. Y.; Diana, S.
2017-02-01
Along with numerous instruments developed and used in science education researches, some of those instruments have been translated to local language in the country where the instruments were used. Most of researchers that used those translated instruments did not report the quality of those translated instruments. One of the instruments is the Scientific Literacy Assessment (SLA) including the Science Motivation and Beliefs (SLA-MB) as part of the SLA. In this study, the SLA-MB has been translated into Indonesian Language (Bahasa). The purpose of this study is to investigate the SLA-MB instrument that has been translated to Indonesian language from the view of dimensionality, reliability, item quality and differential item functioning (DIF) based on IRT-Rasch analysis. We used Conquest and Winstep as the program for IRT-Rasch analysis. We employed quantitative research method with school-survey on this study. Research subjects are 223 Indonesian Middle school students (age 13-16), with 64 boys and 159 girls. IRT-Rasch analysis of the SLA-MB Indonesian version indicated that a three-dimensional model fit significantly better than one-dimension model, and the reliability of each dimensions are about 0.60 to 0.82. As well as those findings, fit values of all items are acceptable, moreover we found no DIF for all of the SLA-MB items. Overall, our study suggests that Indonesian version of SLA-MB is acceptable to be implemented as research instrument conducted in Indonesia.
Development and validation of an energy-balance knowledge test for fourth- and fifth-grade students.
Chen, Senlin; Zhu, Xihe; Kang, Minsoo
2017-05-01
A valid test measuring children's energy-balance (EB) knowledge is lacking in research. This study developed and validated the energy-balance knowledge test (EBKT) for fourth and fifth grade students. The original EBKT contained 25 items but was reduced to 23 items based on pilot result and intensive expert panel discussion. De-identified data were collected from 468 fourth and fifth grade students enrolled in four schools to examine the psychometric properties of the EBKT items. The Rasch model analysis was conducted using the Winstep 3.65.0 software. Differential item functioning (DIF) analysis flagged 1 item (item #4) functioning differently between boys and girls, which was deleted. The final 22-item EBKT showed desirable model-data fit indices. The items had large variability ranging from -3.58 logit (item #10, the easiest) to 1.70 logit (item #3, the hardest). The average person ability on the test was 0.28 logit (SD = .78). Additional analyses supported known-group difference validity of the EBKT scores in capturing gender- and grade-based ability differences. The test was overall valid but could be further improved by expanding test items to discern various ability levels. For lack of a better test, researchers and practitioners may use the EBKT to assess fourth- and fifth-grade students' EB knowledge.
Linking Existing Instruments to Develop an Activity of Daily Living Item Bank.
Li, Chih-Ying; Romero, Sergio; Bonilha, Heather S; Simpson, Kit N; Simpson, Annie N; Hong, Ickpyo; Velozo, Craig A
2018-03-01
This study examined dimensionality and item-level psychometric properties of an item bank measuring activities of daily living (ADL) across inpatient rehabilitation facilities and community living centers. Common person equating method was used in the retrospective veterans data set. This study examined dimensionality, model fit, local independence, and monotonicity using factor analyses and fit statistics, principal component analysis (PCA), and differential item functioning (DIF) using Rasch analysis. Following the elimination of invalid data, 371 veterans who completed both the Functional Independence Measure (FIM) and minimum data set (MDS) within 6 days were retained. The FIM-MDS item bank demonstrated good internal consistency (Cronbach's α = .98) and met three rating scale diagnostic criteria and three of the four model fit statistics (comparative fit index/Tucker-Lewis index = 0.98, root mean square error of approximation = 0.14, and standardized root mean residual = 0.07). PCA of Rasch residuals showed the item bank explained 94.2% variance. The item bank covered the range of θ from -1.50 to 1.26 (item), -3.57 to 4.21 (person) with person strata of 6.3. The findings indicated the ADL physical function item bank constructed from FIM and MDS measured a single latent trait with overall acceptable item-level psychometric properties, suggesting that it is an appropriate source for developing efficient test forms such as short forms and computerized adaptive tests.
Rasch validation of the Arabic version of the lower extremity functional scale.
Alnahdi, Ali H
2018-02-01
The purpose of this study was to examine the internal construct validity of the Arabic version of the Lower Extremity Functional Scale (20-item Arabic LEFS) using Rasch analysis. Patients (n = 170) with lower extremity musculoskeletal dysfunction were recruited. Rasch analysis of 20-item Arabic LEFS was performed. Once the initial Rasch analysis indicated that the 20-item Arabic LEFS did not fit the Rasch model, follow-up analyses were conducted to improve the fit of the scale to the Rasch measurement model. These modifications included removing misfitting individuals, changing item scoring structure, removing misfitting items, addressing bias caused by response dependency between items and differential item functioning (DIF). Initial analysis indicated deviation of the 20-item Arabic LEFS from the Rasch model. Disordered thresholds in eight items and response dependency between six items were detected with the scale as a whole did not meet the requirement of unidimensionality. Refinements led to a 15-item Arabic LEFS that demonstrated excellent internal consistency (person separation index [PSI] = 0.92) and satisfied all the requirement of the Rasch model. Rasch analysis did not support the 20-item Arabic LEFS as a unidimensional measure of lower extremity function. The refined 15-item Arabic LEFS met all the requirement of the Rasch model and hence is a valid objective measure of lower extremity function. The Rasch-validated 15-item Arabic LEFS needs to be further tested in an independent sample to confirm its fit to the Rasch measurement model. Implications for Rehabilitation The validity of the 20-item Arabic Lower Extremity Functional Scale to measure lower extremity function is not supported. The 15-item Arabic version of the LEFS is a valid measure of lower extremity function and can be used to quantify lower extremity function in patients with lower extremity musculoskeletal disorders.
Michel, Pierre; Auquier, Pascal; Baumstarck, Karine; Pelletier, Jean; Loundou, Anderson; Ghattas, Badih; Boyer, Laurent
2015-09-01
Quality of life (QoL) measurements are considered important outcome measures both for research on multiple sclerosis (MS) and in clinical practice. Computerized adaptive testing (CAT) can improve the precision of measurements made using QoL instruments while reducing the burden of testing on patients. Moreover, a cross-cultural approach is also necessary to guarantee the wide applicability of CAT. The aim of this preliminary study was to develop a calibrated item bank that is available in multiple languages and measures QoL related to mental health by combining one generic (SF-36) and one disease-specific questionnaire (MusiQoL). Patients with MS were enrolled in this international, multicenter, cross-sectional study. The psychometric properties of the item bank were based on classical test and item response theories and approaches, including the evaluation of unidimensionality, item response theory model fitting, and analyses of differential item functioning (DIF). Convergent and discriminant validities of the item bank were examined according to socio-demographic, clinical, and QoL features. A total of 1992 patients with MS and from 15 countries were enrolled in this study to calibrate the 22-item bank developed in this study. The strict monotonicity of the Cronbach's alpha curve, the high eigenvalue ratio estimator (5.50), and the adequate CFA model fit (RMSEA = 0.07 and CFI = 0.95) indicated that a strong assumption of unidimensionality was warranted. The infit mean square statistic ranged from 0.76 to 1.27, indicating a satisfactory item fit. DIF analyses revealed no item biases across geographical areas, confirming the cross-cultural equivalence of the item bank. External validity testing revealed that the item bank scores correlated significantly with QoL scores but also showed discriminant validity for socio-demographic and clinical characteristics. This work demonstrated satisfactory psychometric characteristics for a QoL item bank for MS in multiple languages. This work may offer a common measure for the assessment of QoL in different cultural contexts and for international studies conducted on MS.
Evaluation of the Edinburgh Post Natal Depression Scale using Rasch analysis
Pallant, Julie F; Miller, Renée L; Tennant, Alan
2006-01-01
Background The Edinburgh Postnatal Depression Scale (EPDS) is a 10 item self-rating post-natal depression scale which has seen widespread use in epidemiological and clinical studies. Concern has been raised over the validity of the EPDS as a single summed scale, with suggestions that it measures two separate aspects, one of depressive feelings, the other of anxiety. Methods As part of a larger cross-sectional study conducted in Melbourne, Australia, a community sample (324 women, ranging in age from 18 to 44 years: mean = 32 yrs, SD = 4.6), was obtained by inviting primiparous women to participate voluntarily in this study. Data from the EPDS were fitted to the Rasch measurement model and tested for appropriate category ordering, for item bias through Differential Item Functioning (DIF) analysis, and for unidimensionality through tests of the assumption of local independence. Results Rasch analysis of the data from the ten item scale initially demonstrated a lack of fit to the model with a significant Item-Trait Interaction total chi-square (chi Square = 82.8, df = 40; p < .001). Removal of two items (items 7 and 8) resulted in a non-significant Item-Trait Interaction total chi-square with a residual mean value for items of -0.467 with a standard deviation of 0.850, showing fit to the model. No DIF existed in the final 8-item scale (EPDS-8) and all items showed fit to model expectations. Principal Components Analysis of the residuals supported the local independence assumption, and unidimensionality of the revised EPDS-8 scale. Revised cut points were identified for EPDS-8 to maintain the case identification of the original scale. Conclusion The results of this study suggest that EPDS, in its original 10 item form, is not a viable scale for the unidimensional measurement of depression. Rasch analysis suggests that a revised eight item version (EPDS-8) would provide a more psychometrically robust scale. The revised cut points of 7/8 and 9/10 for the EPDS-8 show high levels of agreement with the original case identification for the EPDS-10. PMID:16768803
Measurement of Women’s Agency in Egypt: A National Validation Study
VanderEnde, Kristin E.; Dodell, Sylvie; Cheong, Yuk Fai
2015-01-01
Despite widespread assumptions about women’s empowerment and agency in the Arab Middle East, psychometric research of these constructs is limited. Using national data from 6214 married women ages 16–49 who took part in the 2006 Egypt Labor Market Panel Survey, we applied factor analysis to explore and then to test the factor structure of women’s agency. We then used multiple indicator multiple cause structural equations models to test for differential item functioning (DIF) by women’s age at first marriage, a potential resource for women’s agency. Our results confirm that women’s agency in Egypt is multi-dimensional and comprised of their (1) influence in family decisions, including those reserved for men, (2) freedom of movement in public spaces, and (3) attitudes about gender, specifically violence against wives. These dimensions confirm those explored previously in selected rural areas of Egypt and South Asia. Yet, three items showed significant uniform DIF by women’s categorical age at first marriage, with and without a control for women’s age in years. Models adjusting for DIF and women’s age in years showed that women’s older age at first marriage was positively associated with the factor means for family decision-making and gender-violence attitudes, but not freedom of movement. Our findings reveal the value of our analytical strategy for research on the dimensions and determinants of women’s agency. Our approach offers a promising model to discern “hierarchies of evidence” for social policies and programs to enhance women’s empowerment. PMID:27597801
Xu, Qian; Black, Wesley P.; Ward, Scott M.; Yang, Zhaomin
2005-01-01
Myxococcus xanthus fibril exopolysaccharide (EPS), essential for the social gliding motility and development of this bacterium, is regulated by the Dif chemotaxis-like pathway. DifA, an MCP homolog, is proposed to mediate signal input to the Dif pathway. However, DifA lacks a prominent periplasmic domain, which in classical chemoreceptors is responsible for signal perception and for initiating transmembrane signaling. To investigate the signaling properties of DifA, we constructed a NarX-DifA (NafA) chimera from the sensory module of Escherichia coli NarX and the signaling module of M. xanthus DifA. We report here the first functional chimeric signal transducer constructed using genes from organisms in two different phylogenetic subdivisions. When expressed in M. xanthus, NafA restored fruiting body formation, EPS production, and S-motility to difA mutants in the presence of nitrate. Studies with various double mutants indicate that NafA requires the downstream Dif proteins to function. We propose that signal inputs to the Dif pathway and transmembrane signaling by DifA are essential for the regulation of EPS production in M. xanthus. Despite the apparent structural differences, DifA appears to share similar transmembrane signaling mechanisms with enteric sensor kinases and chemoreceptors. PMID:16159775
Lin, Chung-Ying; Pakpour, Amir H
2017-02-01
The problems of mood disorders are critical in people with epilepsy. Therefore, there is a need to validate a useful tool for the population. The Hospital Anxiety and Depression Scale (HADS) has been used on the population, and showed that it is a satisfactory screening tool. However, more evidence on its construct validity is needed. A total of 1041 people with epilepsy were recruited in this study, and each completed the HADS. Confirmatory factor analysis (CFA) and Rasch analysis were used to understand the construct validity of the HADS. In addition, internal consistency was tested using Cronbachs' α, person separation reliability, and item separation reliability. Ordering of the response descriptors and the differential item functioning (DIF) were examined using the Rasch models. The HADS showed that 55.3% of our participants had anxiety; 56.0% had depression based on its cutoffs. CFA and Rasch analyses both showed the satisfactory construct validity of the HADS; the internal consistency was also acceptable (α=0.82 in anxiety and 0.79 in depression; person separation reliability=0.82 in anxiety and 0.73 in depression; item separation reliability=0.98 in anxiety and 0.91 in depression). The difficulties of the four-point Likert scale used in the HADS were monotonically increased, which indicates no disordering response categories. No DIF items across male and female patients and across types of epilepsy were displayed in the HADS. The HADS has promising psychometric properties on construct validity in people with epilepsy. Moreover, the additive item score is supported for calculating the cutoff. Copyright © 2016 British Epilepsy Association. Published by Elsevier Ltd. All rights reserved.
Rasch validation of the PHQ-9 in people with visual impairment in South India.
Gothwal, Vijaya K; Bagga, Deepak K; Sumalini, Rebecca
2014-01-01
The Patient-Health Questionnaire (PHQ-9) is a widely used screening instrument for depression. Recently, its properties as a measure were investigated using Rasch analysis in an Australian population with visual impairment (VI) and it was demonstrated to possess excellent measurement properties, but the response scale required shortening (modified PHQ-9). However, further validation was recommended to substantiate its use with the growing population of VI. Therefore, we aimed to use Rasch analysis to evaluate the measurement properties of the modified PHQ-9 in an Indian population with VI. 303 patients with VI (mean age 40.2 years; 71% male) referred to Vision Rehabilitation Centres were administered the PHQ-9 by trained interviewer. Rasch analysis was used to investigate the psychometric properties of the modified PHQ-9. Rasch analysis showed good fit to the model, no misfitting items and an acceptable person separation reliability (0.82). Dimensionality testing supported combining 9 items to create a total score. Targeting was sub-optimal (-1.30 logits); more difficult items are needed. One item ('trouble falling asleep') showed notable differential item functioning, DIF (1.18 logits) by duration of VI. The generalisability of these results might be restricted to patients with VI presenting to a tertiary eye care centre. Except for DIF, the performance of the modified PHQ-9 is consistent with that of the original, albeit in a different cultural context (Indian population with VI). Clinicians/researchers can readily use the modified PHQ-9 without formal training in Rasch procedures given the provision of ready-to-use spreadsheets that convert raw to Rasch-scaled scores. However the conversions will apply only if the sample being tested is similar to that of the present study. Copyright © 2014 Elsevier B.V. All rights reserved.
Nielsen, Julie Bøjstrup; Kyvsgaard, Julie Nyholm; Sildorf, Stine Møller; Kreiner, Svend; Svensson, Jannet
2017-03-01
Type 1 Diabetes (T1D) has a negative impact on psychological and overall well-being. Screening for Health-related Quality of Life (HrQoL) and addressing HrQoL issues in the clinic leads to improved well-being and metabolic outcomes. The aim of this study was to translate the generic and diabetes-specific validated multinational DISABKIDS® questionnaires into Danish, and then determine their validity and reliability. The questionnaires were translated using a validated translation procedure and completed by 99 children and adolescents from our diabetes-department; all diagnosed with T1D and were aged between 8 and 18 years old. The Rasch and the graphical log linear Rasch model (GLLRM) were used to determine validity. Monte Carlo methods and Cronbach's α were used to confirm reliability. The data did not fit a pure Rasch model but did fit a GLLRM when item six in the independence scale is excluded. The six subscales measure different aspects of HrQoL indicating that all the subscales are necessary. The questionnaire shows local dependency between items and differential item functioning (DIF). Therefore age, gender, and glycated hemoglobin (HbA1c) levels must be taken into account when comparing HrQoL between groups. The Danish versions of the DISABKIDS® chronic-generic and diabetes-specific modules provide valid and objective measurements with adequate reliability. These Danish versions are useful tools for evaluating HrQoL in Danish patients with T1D. However, guidelines on how to manage DIF and local independence will be required, and item six should be rephrased.
Item Screening in Graphical Loglinear Rasch Models
ERIC Educational Resources Information Center
Kreiner, Svend; Christensen, Karl Bang
2011-01-01
In behavioural sciences, local dependence and DIF are common, and purification procedures that eliminate items with these weaknesses often result in short scales with poor reliability. Graphical loglinear Rasch models (Kreiner & Christensen, in "Statistical Methods for Quality of Life Studies," ed. by M. Mesbah, F.C. Cole & M.T.…
Fidalgo, Angel M; Tenenbaum, Harriet R; Aznar, Ana
2018-01-01
This article examines whether there are gender differences in understanding the emotions evaluated by the Test of Emotion Comprehension (TEC). The TEC provides a global index of emotion comprehension in children 3-11 years of age, which is the sum of the nine components that constitute emotion comprehension: (1) recognition of facial expressions, (2) understanding of external causes of emotions, (3) understanding of desire-based emotions, (4) understanding of belief-based emotions, (5) understanding of the influence of a reminder on present emotional states, (6) understanding of the possibility to regulate emotional states, (7) understanding of the possibility of hiding emotional states, (8) understanding of mixed emotions, and (9) understanding of moral emotions. We used the answers to the TEC given by 172 English girls and 181 boys from 3 to 8 years of age. First, the nine components into which the TEC is subdivided were analysed for differential item functioning (DIF), taking gender as the grouping variable. To evaluate DIF, the Mantel-Haenszel method and logistic regression analysis were used applying the Educational Testing Service DIF classification criteria. The results show that the TEC did not display gender DIF. Second, when absence of DIF had been corroborated, it was analysed for differences between boys and girls in the total TEC score and its components controlling for age. Our data are compatible with the hypothesis of independence between gender and level of comprehension in 8 of the 9 components of the TEC. Several hypotheses are discussed that could explain the differences found between boys and girls in the belief component. Given that the Belief component is basically a false belief task, the differences found seem to support findings in the literature indicating that girls perform better on this task.
Osman, Augustine; Lamis, Dorian A; Bagge, Courtney L; Freedenthal, Stacey; Barnes, Sean M
2016-01-01
We examined the factor structure and psychometric properties of the Mindful Attention Awareness Scale (MAAS) in a sample of 810 undergraduate students. Using common exploratory factor analysis (EFA), we obtained evidence for a 1-factor solution (41.84% common variance). To confirm unidimensionality of the 15-item MAAS, we conducted a 1-factor confirmatory factor analysis (CFA). Results of the EFA and CFA, respectively, provided support for a unidimensional model. Using differential item functioning analysis methods within item response theory modeling (IRT-based DIF), we found that individuals with high and low levels of nonattachment responded similarly to the MAAS items. Following a detailed item analysis, we proposed a 5-item short version of the instrument and present descriptive statistics and composite score reliability for the short and full versions of the MAAS. Finally, correlation analyses showed that scores on the full and short versions of the MAAS were associated with measures assessing related constructs. The 5-item MAAS is as useful as the original MAAS in enhancing our understanding of the mindfulness construct.
Kwakkenbos, Linda; Arthurs, Erin; van den Hoogen, Frank H. J.; Hudson, Marie; van Lankveld, Wim G. J. M.; Baron, Murray; van den Ende, Cornelia H. M.; Thombs, Brett D.
2013-01-01
Objectives Increasingly, medical research involves patients who complete outcomes in different languages. This occurs in countries with more than one common language, such as Canada (French/English) or the United States (Spanish/English), as well as in international multi-centre collaborations, which are utilized frequently in rare diseases such as systemic sclerosis (SSc). In order to pool or compare outcomes, instruments should be measurement equivalent (invariant) across cultural or linguistic groups. This study provides an example of how to assess cross-language measurement equivalence by comparing the Center for Epidemiologic Studies Depression (CES-D) scale between English-speaking Canadian and Dutch SSc patients. Methods The CES-D was completed by 922 English-speaking Canadian and 213 Dutch SSc patients. Confirmatory factor analysis (CFA) was used to assess the factor structure in both samples. The Multiple-Indicator Multiple-Cause (MIMIC) model was utilized to assess the amount of differential item functioning (DIF). Results A two-factor model (positive and negative affect) showed excellent fit in both samples. Statistically significant, but small-magnitude, DIF was found for 3 of 20 items on the CES-D. The English-speaking Canadian sample endorsed more feeling-related symptoms, whereas the Dutch sample endorsed more somatic/retarded activity symptoms. The overall estimate in depression scores between English and Dutch was not influenced substantively by DIF. Conclusions CES-D scores from English-speaking Canadian and Dutch SSc patients can be compared and pooled without concern that measurement differences may substantively influence results. The importance of assessing cross-language measurement equivalence in rheumatology studies prior to pooling outcomes obtained in different languages should be emphasized. PMID:23326538
Rasch analysis of the Patient Rated Elbow Evaluation questionnaire.
Vincent, Joshua I; MacDermid, Joy C; King, Graham J W; Grewal, Ruby
2015-06-20
The Patient Rated Elbow Evaluation (PREE) was developed as an elbow joint specific measure of pain and disability and validated with classical psychometric methods. More recently, Rasch analysis has contributed new methods for analyzing the clinical measurement properties of self-report outcome measures. The objective of the study was to determine aspects of validity of the PREE using the Rasch model to assess the overall fit of the PREE data, the response scaling, individual item fit, differential item functioning (DIF), local dependency, unidimensionality and person separation index (PSI). A convenience sample of 236 patients (Age range 21-79 years; M: F- 97:139) with elbow disorders were recruited from the Roth│McFarlane Hand and Upper Limb Centre, London, Ontario, Canada. The baseline scores of the PREE were used. Rasch analysis was conducted using RUMM 2030 software on the 3 sub scales of the PREE separately. The 3 sub scales showed misfit initially with disordered thresholds on17 out of 20 items), uniform DIF was observed for two items ("Carrying a 10lbs object" from specific activities subscale for age group; and "household work" from the usual activities subscale for gender); multidimensionality and local dependency. The Pain subscale satisfied Rasch expectations when item 2 "Pain - At rest" was split for age group, while the usual activities subscale readily stood up to Rasch requirements when the item 2 "household work" was split for gender. The specific activities subscale demonstrated fit to the Rasch model when sub test analysis accounted for local dependency. All three subscales of the PREE were well targeted and had high reliability (PSI >0.80). The three subscales of the PREE appear to be robust when tested against the Rasch model when subject to a few alterations. The value of changing the 0-10 format is questionable given its widespread use; further Rasch-based analysis of whether these findings are stable in other samples is warranted.
Kersten, Paula; Cardol, Mieke; George, Steve; Ward, Christopher; Sibley, Andrew; White, Barney
2007-10-15
To evaluate the cross-cultural validity of the five subscales of the Impact on Participation and Autonomy (IPA) measure and the full 31-item scale. Data from two validation studies (Dutch and English) were pooled (n = 106). Participants (aged 18-75), known to rehabilitation services or GP practices, had conditions ranging from minor ailments to significant disability. Validity of the five subscales and the total scale was examined using Rasch analysis (Partial Credit Model). P values smaller than 0.01 were employed to allow for multiple testing. A number of items in all the subscales except 'Outdoor Autonomy' needed rescoring. One 'Indoor Autonomy' item showed uniform DIF by country and was split by country. One 'Work and Education' item displayed uniform and non-uniform DIF by gender. All the subscales fitted the Rasch model and were invariant across country. A 30-item IPA also fitted the Rasch model. The IPA subscales and a 30-item scale are invariant across the two cultures and gender. The IPA can be used validly to assess participation and autonomy in these populations. Further analyses are required to examine whether the IPA is invariant across differing levels of disability and other disease groups not included in this study.
Baylor, Carolyn; Yorkston, Kathryn; Eadie, Tanya; Kim, Jiseon; Chung, Hyewon; Amtmann, Dagmar
2015-01-01
Purpose The purpose of this study was to calibrate the items for the Communicative Participation Item Bank (CPIB) using Item Response Theory (IRT). One overriding objective was to examine if the IRT item parameters would be consistent across different diagnostic groups, thereby allowing creation of a disorder-generic instrument. The intended outcomes were the final item bank and a short form ready for clinical and research applications. Methods Self-report data were collected from 701 individuals representing four diagnoses: multiple sclerosis, Parkinson’s disease, amyotrophic lateral sclerosis and head and neck cancer. Participants completed the CPIB and additional self-report questionnaires. CPIB data were analyzed using the IRT Graded Response Model (GRM). Results The initial set of 94 candidate CPIB items were reduced to an item bank of 46 items demonstrating unidimensionality, local independence, good item fit, and good measurement precision. Differential item function (DIF) analyses detected no meaningful differences across diagnostic groups. A 10-item, disorder-generic short form was generated. Conclusions The CPIB provides speech-language pathologists with a unidimensional, self-report outcomes measurement instrument dedicated to the construct of communicative participation. This instrument may be useful to clinicians and researchers wanting to implement measures of communicative participation in their work. PMID:23816661
McAlinden, Colm; Pesudovs, Konrad; Moore, Jonathan E
2010-11-01
To develop an instrument to measure subjective quality of vision: the Quality of Vision (QoV) questionnaire. A 30-item instrument was designed with 10 symptoms rated in each of three scales (frequency, severity, and bothersome). The QoV was completed by 900 subjects in groups of spectacle wearers, contact lens wearers, and those having had laser refractive surgery, intraocular refractive surgery, or eye disease and investigated with Rasch analysis and traditional statistics. Validity and reliability were assessed by Rasch fit statistics, principal components analysis (PCA), person separation, differential item functioning (DIF), item targeting, construct validity (correlation with visual acuity, contrast sensitivity, total root mean square [RMS] higher order aberrations [HOA]), and test-retest reliability (two-way random intraclass correlation coefficients [ICC] and 95% repeatability coefficients [R(c)]). Rasch analysis demonstrated good precision, reliability, and internal consistency for all three scales (mean square infit and outfit within 0.81-1.27; PCA >60% variance explained by the principal component; person separation 2.08, 2.10, and 2.01 respectively; and minimal DIF). Construct validity was indicated by strong correlations with visual acuity, contrast sensitivity and RMS HOA. Test-retest reliability was evidenced by a minimum ICC of 0.867 and a minimum 95% R(c) of 1.55 units. The QoV Questionnaire consists of a Rasch-tested, linear-scaled, 30-item instrument on three scales providing a QoV score in terms of symptom frequency, severity, and bothersome. It is suitable for measuring QoV in patients with all types of refractive correction, eye surgery, and eye disease that cause QoV problems.
Examining Gender DIF on a Multiple-Choice Test of Mathematics: A Confirmatory Approach.
ERIC Educational Resources Information Center
Ryan, Katherine E.; Fan, Meichu
1996-01-01
Results for 3,244 female and 3,033 male junior high school students from the Second International Mathematics Study show that applied items in algebra, geometry, and computation were easier for males but arithmetic items were differentially easier for females. Implications of these findings for assessment and instruction are discussed. (SLD)
Mohri, Kurato; Hata, Takashi; Kikuchi, Haruhisa; Oshima, Yoshiteru; Urushihara, Hideko
2014-05-29
Separation of somatic cells from germ-line cells is a crucial event for multicellular organisms, but how this step was achieved during evolution remains elusive. In Dictyostelium discoideum and many other dictyostelid species, solitary amoebae gather and form a multicellular fruiting body in which germ-line spores and somatic stalk cells differentiate, whereas in Acytostelium subglobosum, acellular stalks form and all aggregated amoebae become spores. In this study, because most D. discoideum genes known to be required for stalk cell differentiation have homologs in A. subglobosum, we inferred functional variations in these genes and examined conservation of the stalk cell specification cascade of D. discoideum mediated by the polyketide differentiation-inducing factor-1 (DIF-1) in A. subglobosum. Through heterologous expression of A. subglobosum orthologs of DIF-1 biosynthesis genes in D. discoideum, we confirmed that two of the three genes were functional equivalents, while DIF-methyltransferase (As-dmtA) involved at the final step of DIF-1 synthesis was not. In fact, DIF-1 activity was undetectable in A. subglobosum lysates and amoebae of this species were not responsive to DIF-1, suggesting a lack of DIF-1 production in this species. On the other hand, the molecular function of an A. subglobosum ortholog of DIF-1 responsive transcription factor was equivalent with that of D. discoideum and inhibition of polyketide synthesis caused developmental arrest in A. subglobosum, which could not be rescued by DIF-1 addition. These results suggest that non-DIF-1 polyketide cascades involving downstream transcription factors are required for fruiting body development of A. subglobosum. © 2014. Published by The Company of Biologists Ltd.
Xu, Qian; Black, Wesley P; Nascimi, Heidi M; Yang, Zhaomin
2011-02-01
DifA is a methyl-accepting chemotaxis protein (MCP)-like sensory transducer that regulates exopolysaccharide (EPS) production in Myxococcus xanthus. Here mutational analysis and molecular biology were used to probe the signaling mechanisms of DifA in EPS regulation. We first identified the start codon of DifA experimentally; this identification extended the N terminus of DifA for 45 amino acids (aa) from the previous bioinformatics prediction. This extension helped to address the outstanding question of how DifA receives input signals from type 4 pili without a prominent periplasmic domain. The results suggest that DifA uses its N-terminus extension to sense an upstream signal in EPS regulation. We suggest that the perception of the input signal by DifA is mediated by protein-protein interactions with upstream components. Subsequent signal transmission likely involves transmembrane signaling instead of direct intramolecular interactions between the input and the output modules in the cytoplasm. The basic functional unit of DifA for signal transduction is likely dimeric as mutational alteration of the predicted dimeric interface of DifA significantly affected EPS production. Deletions of 14-aa segments in the C terminus suggest that the newly defined flexible bundle subdomain in MCPs is likely critical for DifA function because shortening of this bundle can lead to constitutively active mutations.
Kacerja, Suela; Julie, Cyril; Hadjerrouit, Said
2013-01-01
This paper reports on an investigation on the real-life situations students in grades 8 and 9 in South Africa and Albania prefer to use in Mathematics. The functioning of the instrument used to assess the order of preference learners from both countries have for contextual situations is assessed using Rasch modeling techniques. For both the cohorts, the data fit the Rasch model. The differential item functioning (DIF) analysis rendered 3 items operating differentially for the two cohorts. Explanations for these differences are provided in terms of differences in experiences learners in the two countries have related to some of the contextual situations. Implications for interpretation of international comparative tests are offered, as are the possibilities for the cross-country development of curriculum materials related to contexts that learners prefer to use in Mathematics.
2011-01-01
Background Health-related quality of life (HRQoL) assessment, encompassing the adolescents' perceptions of their mental, physical, and social health and well-being is increasingly considered an important outcome to be used to identify population health needs and to provide targeted medical care. Although validated instruments are essential for accurately assessing HRQoL outcomes, there are few cross-culturally adapted tools for use in Brazil, and none designed exclusively for use among adolescents. The Vécu et Santé Perçue de l'Adolescent (VSP-A) is a generic, multidimensional self-reported instrument originally developed and validated in France that evaluates HRQoL of ill and healthy adolescents. Purpose To cross-culturally adapt and validate the Brazilian-Portuguese version of the VSP-A, a generic HRQoL measure for adolescents originally developed in France. Methods The VSP-A was translated following a well-validated forward-backward process leading to the Brazilian version. The psychometric evaluation was conducted in a sample of 446 adolescents (14-18 years) attending 2 public high schools of São Gonçalo City. The adolescents self-reported the Brazilian VSP-A, the validated Psychosomatic Symptom Checklist and socio-demographic information. A retest evaluation was carried out on a sub-sample (n = 195) at a two-week interval. The internal construct validity was assessed through confirmatory factor analysis (CFA), multi-trait scaling analyses, Rasch analysis evaluating unidimensionality of each scale and Cronbach's alpha coefficients. The reproducibility was evaluated by intra-class correlation coefficients (ICC). Zumbo's ordinal logistic regression analysis was used to detect differential item functioning (DIF) between the Brazilian and the French items. External construct validity was investigated testing expected differences between groups using one-way analysis of variance (ANOVA), Mann-Whitney tests and the univariate general regression linear model. Results CFA showed an acceptable fit (RMSEA=0.05; CFI=0.93); 94% of scaling success was found for item-internal consistency and 98% for item discriminant validity. The items showed good fit to the Rasch model except 3 items with an INFIT at the upper threshold. Cronbach's Alpha ranged from 0.60 to 0.85. Test-retest reliability was moderate to good (ICC=0.55-0.82). DIF was evidenced in 4 out of 36 items. Expected patterns of differences were confirmed with significantly lower physical, psychological well being and vitality reported by symptomatic adolescents. Conclusions Although DIF in few items and responsiveness must be further explored, the Brazilian version of VSP-A demonstrated an acceptable validity and reliability in adolescents attending school and might serve as a starting point for more specific clinical investigations. PMID:21272317
WU, LI-TZY; WOODY, GEORGE E.; YANG, CHONGMING; PAN, JENG-JONG; REEVE, BRYCE B.; BLAZER, DAN G.
2012-01-01
While item response theory (IRT) research shows a latent severity trait underlying response patterns of substance abuse and dependence symptoms, little is known about IRT-based severity estimates in relation to clinically relevant measures. In response to increased prevalences of marijuana-related treatment admissions, an elevated level of marijuana potency, and the debate on medical marijuana use, we applied dimensional approaches to understand IRT-based severity estimates for marijuana use disorders (MUDs) and their correlates while simultaneously considering gender- and race/ethnicity-related differential item functioning (DIF). Using adult data from the 2008 National Survey on Drug Use and Health (N=37,897), DSM-IV criteria for MUDs among past-year marijuana users were examined by IRT, logistic regression, and multiple indicators–multiple causes (MIMIC) approaches. Among 6,917 marijuana users, 15% met criteria for a MUD; another 24% exhibited subthreshold dependence. Abuse criteria were highly correlated with dependence criteria (correlation=0.90), indicating unidimensionality; item information curves revealed redundancy in multiple criteria. MIMIC analyses showed that MUD criteria were positively associated with weekly marijuana use, early marijuana use, other substance use disorders, substance abuse treatment, and serious psychological distress. African Americans and Hispanics showed higher levels of MUDs than whites, even after adjusting for race/ethnicity-related DIF. The redundancy in multiple criteria suggests an opportunity to improve efficiency in measuring symptom-level manifestations by removing low-informative criteria. Elevated rates of MUDs among African Americans and Hispanics require research to elucidate risk factors and improve assessments of MUDs for different racial/ethnic groups. PMID:22351489
Melguizo-Herrera, Estela; Álvarez-Romero, Yuleysi; Cabarcas-Mendoza, Mayerlin Vanessa; Calvo-Rodríguez, Rossy Stefanie; Flórez-Almanza, Jeomaidis; Moadie-Contreras, Olga Patricia; Campo-Arias, Adalberto
2015-01-01
There are many stereotypes and prejudices about the sexual lives of the elderly. However, there are no validated and reliable tools for measuring these in the Latin-American context. To determine the internal consistency, dimensionality, differential item functioning (DIF) by gender and stability of the Attitudes towards Sexuality in the Elderly Questionnaire (ASEQ) in adults over 60 years-old in Cartagena, Colombia. A validation study was designed that included a sample of 130 participants without cognitive impairment attending a Life Center. The ages ranged between 60 and 90 years (mean, 73.7±8.0), and there were 61.5% females. Internal consistency was calculated using Cronbach alpha and McDonald omega, exploratory factor analysis (EFA) (dimensionality), DIF by gender (item response theory) with Kendall correlation, and stability (reproducibility) with Pearson correlation and intraclass correlation coefficient (ICC). The ASEQ showed high internal consistency on the first application (α=.83 and ω=.87) and in the second one (α=.85 and ω=.89). AFE showed two salient factors (prejudices and limitations) that explained 42.6% of the total variance. The IDF presented appropriate coefficients, with the exception of item 14 that showed a high value (τ=.37). ASEQ showed high stability (r=.82 and ICC=.89; 95% confidence interval, 0.83- 0.92; P<.001). ASEQ is a two-dimensional and reliable scale in older adults attending a Life Center in Cartagena, Colombia. New studies are required to evaluate the performance in a representative sample. Copyright © 2014 Asociación Colombiana de Psiquiatría. Publicado por Elsevier España. All rights reserved.
Assessing DSM-IV symptoms of panic attack in the general population: an item response analysis.
Sunderland, Matthew; Hobbs, Megan J; Andrews, Gavin; Craske, Michelle G
2012-12-20
Unexpected panic attacks may represent a non-specific risk factor for future depression and anxiety disorders. The examination of panic symptoms and associated latent severity levels may lead to improvements in the identification, prevention, and treatment of panic attacks and subsequent psychopathology for 'at risk' individuals in the general population. The current study utilised item response theory to assess the DSM-IV symptoms of panic in relation to the latent severity level of the panic attack construct in a sample of 5913 respondents from the National Epidemiologic Survey on Alcohol and Related conditions. Additionally, differential item functioning (DIF) was assessed to determine if each symptom of panic targets the same level of latent severity between different sociodemographic groups (male/female, young/old). Symptoms indexing 'choking', 'fear of dying', and 'tingling/numbness' are some of the more severe symptoms of panic whilst 'heart racing', 'short of breath', 'tremble/shake', 'dizzy/faint', and 'perspire' are some of the least severe symptoms. Significant levels of DIF were detected in the 'perspire' symptom between males and females and the 'fear of dying' symptom between young and old respondents. The current study was limited to examining cross-sectional data from respondents who had experienced at least one panic attack across their lifetime. The findings of the current study provide additional information regarding panic symptoms in the general population that may enable researchers and clinicians to further refine the detection of 'at-risk' individuals who experience threshold and sub-threshold levels of panic. Copyright © 2012 Elsevier B.V. All rights reserved.
Wiebe, Alex; Kersting, Anette; Suslow, Thomas
2017-06-01
Alexithymia is a multidimensional personality construct including the components difficulties identifying feelings (DIF), difficulties describing feelings (DDF), and externally oriented thinking (EOT). Different features of alexithymia are thought to reflect specific deficits in the cognitive processing and regulation of emotions. The aim of the present study was to examine for the first time patterns of deployment of attention as a function of alexithymia components in healthy persons by using eye-tracking technology. It was assumed that EOT is linked to avoidance of negative images. 99 healthy adults viewed freely pictures consisting of anxiety-related, depression-related, positive, and neutral images while gaze behavior was registered. Alexithymia was assessed by the 20-Item Toronto Alexithymia Scale. Measures of anxiety, depression, and (visual-perceptual) intelligence were also administered. A main effect of emotion condition on dwell times was observed. Viewing time was lowest for neutral images, longer for depression-related and happy images, and longest for anxiety-related images. Gender and EOT had significant effects on dwell times. EOT correlated negatively with dwell time on depression-related (but not anxiety-related) images. There were no correlations of dwell times with depression, trait anxiety, intelligence, DIF, or DDF. Alexithymia was assessed exclusively by self-report. Our results show that EOT but not DIF or DDF influences attention deployment to simultaneously presented emotional pictures. EOT may reduce attention allocation to dysphoric information. This attentional characteristic of EOT individuals might have mood protecting effects but also detrimental impacts on social relationships and coping competencies. Copyright © 2016 Elsevier Ltd. All rights reserved.
Nanthakumar, Shenooka; Bucks, Romola S; Skinner, Timothy C; Starkstein, Sergio; Hillman, David; James, Alan; Hunter, Michael
2017-10-01
The assessment of depression in obstructive sleep apnea (OSA) is confounded by symptom overlap. The Depression, Anxiety, and Stress Scale-short form (DASS-21) is a commonly used measure of negative affect, but it not known whether the DASS-21 is suitable for use in an OSA sample. This study compared the fit of Lovibond and Lovibond's (1995) correlated 3-factor structure of the DASS-21 and measurement invariance between a non-OSA and an OSA sample using confirmatory factor analysis. As measurement invariance was not found, to determine the source of non-invariance differential item functioning (DIF) was examined using dMACS. The correlated 3-factor structure (with correlated errors) of the DASS-21 was a better fit in the non-OSA sample. dMACS indicated that there was a degree of DIF for each of the subscales, especially for the Anxiety subscale, in which 2 symptoms (that are also physiological symptoms of OSA) produced lower severity scores in the OSA sample compared with the non-OSA sample. However, the degree of DIF for each of the subscales is not sufficient to cause concern when using the DASS-21; therefore, the total DASS-21 is suitable for use in an OSA sample. Interestingly, the impact of symptom overlap in anxiety symptoms may be reducing anxiety scores because of DIF, which contrasts with the proposed effect of symptom overlap in depression, where it leads to the inflation of depression scores in OSA. This deserves greater consideration in relation to OSA and other clinical disorders or chronic illness conditions with different patterns of overlapping symptoms. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Hadzibajramovic, Emina; Ahlborg, Gunnar; Grimby-Ekman, Anna; Lundgren-Nilsson, Åsa
2015-02-25
Psychosocial stress at work has been recognised as one of the most important factors behind the increase in sick leave due to stress-related mental disorders. It is therefore important to be able to measure perceived work stress in a way that is both valid and reliable. It has been suggested that the Stress-Energy Questionnaire (SEQ) could be a useful tool for measuring mood (stress and energy) at work and it has been used in many Scandinavian studies. The aim of the study is to examine the internal construct validity of the SEQ in a working population and to address measurement issues, such as the ordering of response categories and potential differences in how women and men use the scale - what is termed differential item functioning (DIF). The data used in the present study is baseline data from a longitudinal cohort study aimed at evaluating psychosocial working conditions, stress, health and well-being among employees in two human service organisations in Western Sweden. A modern psychometric approach for scale validations, the Rasch model, was used. Stress items showed a satisfactory fit to the model. Problems related to unidimensionality and local dependence were found when the six stress items were fitted to the model, but these could be resolved by using two testlets. As regards the energy scale, although the final analysis showed an acceptable fit to the model some scale problems were identified. The item dull had disordered thresholds and DIF for gender was detected for the item passive. The items were not well targeted to the persons, with skewness towards high energy. This might explain the scale problems that were detected but these problems need to be investigated in a group where the level of energy is spread across the trait, measured by the SEQ. The stress scale of the SEQ has good psychometric properties and provides a useful tool for assessing work-related stress, on both group and individual levels. However, the limitations of the energy scale make it suitable for group evaluations only. The energy scale needs to be evaluated further in different settings and populations.
Kersten, Paula; Vandal, Alain C; Elder, Hinemoa; McPherson, Kathryn M
2018-04-21
This observational study examines the internal construct validity, internal consistency and cross-informant reliability of the Strengths and Difficulties Questionnaire (SDQ) in a New Zealand preschool population across four ethnicity strata (New Zealand European, Māori, Pasifika, Asian). Rasch analysis was employed to examine internal validity on a subsample of 1000 children. Internal consistency (n=29 075) and cross-informant reliability (n=17 006) were examined using correlations, intraclass correlation coefficients and Cronbach's alpha on the sample available for such analyses. Data were used from a national SDQ database provided by the funder, pertaining to New Zealand domiciled children aged 4 and 5 and scored by their parents and teachers. The five subscales do not fit the Rasch model (as indicated by the overall fit statistics), contain items that are biased (differential item functioning (DIF)) by key variables, suffer from a floor and ceiling effect and have unacceptable internal consistency. After dealing with DIF, the Total Difficulty scale does fit the Rasch model and has good internal consistency. Parent/teacher inter-rater reliability was unacceptably low for all subscales. The five SDQ subscales are not valid and not suitable for use in their own right in New Zealand. We have provided a conversion table for the Total Difficulty scale, which takes account of bias by ethnic group. Clinicians should use this conversion table in order to reconcile DIF by culture in final scores. It is advisable to use both parents and teachers' feedback when considering children's needs for referral of further assessment. Future work should examine whether validity is impacted by different language versions used in the same country. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Chuang, I-Ching; Lin, Keh-Chung; Wu, Ching-Yi; Hsieh, Yu-Wei; Liu, Chien-Ting; Chen, Chia-Ling
2017-10-01
The Motor Activity Log (MAL) and Lower-Functioning MAL (LF-MAL) are used to assess the amount of use of the more impaired arm and the quality of movement during activities in real-life situations for patients with stroke. This study used Rasch analysis to examine the psychometric properties of the MAL and LF-MAL in patients with stroke. This is a methodological study. The MAL and LF-MAL include 2 scales: the amount of use (AOU) and the quality of movement (QOM). Rasch analysis was used to examine the unidimensionality, item difficulty hierarchy, targeting, reliability, and differential item functioning (DIF) of the MAL and LF-MAL. A total of 403 patients with mild or moderate stroke completed the MAL, and 134 patients with moderate/severe stroke finished the LF-MAL. Evidence of disordered thresholds and poor model fit were found both in the MAL and LF-MAL. After the rating categories were collapsed and misfit items were deleted, all items of the revised MAL and LF-MAL exhibited ordering and constituted unidimensional constructs. The person-item map showed that these assessments were difficult for our participants. The person reliability coefficients of these assessments ranged from .79 to .87. No items in the revised MAL and LF-MAL exhibited bias related to patients' characteristics. One limitation is the recruited patients, who have relatively high-functioning ability in the LF-MAL. The revised MAL and LF-MAL are unidimensional scales and have good reliability. The categories function well, and responses to all items in these assessments are not biased by patients' characteristics. However, the revised MAL and LF-MAL both showed floor effect. Further study might add easy items for assessing the performance of activity in real-life situations for patients with stroke. © 2017 American Physical Therapy Association
Caqueo-Urízar, Alejandra; Boyer, Laurent; Boucekine, Mohamed; Auquier, Pascal
2014-10-01
The aim of this study was to adapt the Schizophrenia - Quality of Life short-version questionnaire (SQoL18) for use in three middle-income countries in Latin America and to evaluate the factor structure, reliability, and external validity of this questionnaire. The SQoL18 was translated into Spanish using a well-validated forward-backward process. We evaluated the psychometric properties of the SQoL18 in a sample of 253 patients with schizophrenia attending outpatient mental health services in three Latin American countries. For participants in each country (Bolivia, N=83; Chile, N=85; Peru, N=85), psychometric properties were compared to those reported from the reference population (507 patients with schizophrenia) assessed in the validation study. In addition, differential item functioning (DIF) analyses were performed to see whether all items behave in the same way in each country. Factor analysis performed in the 3 countries showed that the questionnaire's structure adequately matched the initial structure of the SQoL18. The unidimensionality of the dimensions was preserved, and the internal/external validity indices were close to those of the reference population. However, one dimension of the SQoL18 (resilience) presented some unsatisfactory properties including low Cronbach's alpha coefficients, one INFIT value higher than 1.2, and one item showing DIF between the 3 countries. These results demonstrate the satisfactory acceptability and psychometric properties of the SQoL18, suggesting the relevance of this questionnaire among patients with schizophrenia in these 3 Latin American countries. Copyright © 2014 Elsevier B.V. All rights reserved.
Karnoe, Astrid; Furstrand, Dorthe; Batterham, Roy; Christensen, Karl Bang; Elsworth, Gerald; Osborne, Richard H
2018-01-01
Background For people to be able to access, understand, and benefit from the increasing digitalization of health services, it is critical that services are provided in a way that meets the user’s needs, resources, and competence. Objective The objective of the study was to develop a questionnaire that captures the 7-dimensional eHealth Literacy Framework (eHLF). Methods Draft items were created in parallel in English and Danish. The items were generated from 450 statements collected during the conceptual development of eHLF. In all, 57 items (7 to 9 items per scale) were generated and adjusted after cognitive testing. Items were tested in 475 people recruited from settings in which the scale was intended to be used (community and health care settings) and including people with a range of chronic conditions. Measurement properties were assessed using approaches from item response theory (IRT) and classical test theory (CTT) such as confirmatory factor analysis (CFA) and reliability using composite scale reliability (CSR); potential bias due to age and sex was evaluated using differential item functioning (DIF). Results CFA confirmed the presence of the 7 a priori dimensions of eHLF. Following item analysis, a 35-item 7-scale questionnaire was constructed, covering (1) using technology to process health information (5 items, CSR=.84), (2) understanding of health concepts and language (5 items, CSR=.75), (3) ability to actively engage with digital services (5 items, CSR=.86), (4) feel safe and in control (5 items, CSR=.87), (5) motivated to engage with digital services (5 items, CSR=.84), (6) access to digital services that work (6 items, CSR=.77), and (7) digital services that suit individual needs (4 items, CSR=.85). A 7-factor CFA model, using small-variance priors for cross-loadings and residual correlations, had a satisfactory fit (posterior productive P value: .27, 95% CI for the difference between the observed and replicated chi-square values: −63.7 to 133.8). The CFA showed that all items loaded strongly on their respective factors. The IRT analysis showed that no items were found to have disordered thresholds. For most scales, discriminant validity was acceptable; however, 2 pairs of dimensions were highly correlated; dimensions 1 and 5 (r=.95), and dimensions 6 and 7 (r=.96). All dimensions were retained because of strong content differentiation and potential causal relationships between these dimensions. There is no evidence of DIF. Conclusions The eHealth Literacy Questionnaire (eHLQ) is a multidimensional tool based on a well-defined a priori eHLF framework with robust properties. It has satisfactory evidence of construct validity and reliable measurement across a broad range of concepts (using both CTT and IRT traditions) in various groups. It is designed to be used to understand and evaluate people’s interaction with digital health services. PMID:29434011
Satellite phage TLCφ enables toxigenic conversion by CTX phage through dif site alteration.
Hassan, Faizule; Kamruzzaman, M; Mekalanos, John J; Faruque, Shah M
2010-10-21
Bacterial chromosomes often carry integrated genetic elements (for example plasmids, transposons, prophages and islands) whose precise function and contribution to the evolutionary fitness of the host bacterium are unknown. The CTXφ prophage, which encodes cholera toxin in Vibrio cholerae, is known to be adjacent to a chromosomally integrated element of unknown function termed the toxin-linked cryptic (TLC). Here we report the characterization of a TLC-related element that corresponds to the genome of a satellite filamentous phage (TLC-Knφ1), which uses the morphogenesis genes of another filamentous phage (fs2φ) to form infectious TLC-Knφ1 phage particles. The TLC-Knφ1 phage genome carries a sequence similar to the dif recombination sequence, which functions in chromosome dimer resolution using XerC and XerD recombinases. The dif sequence is also exploited by lysogenic filamentous phages (for example CTXφ) for chromosomal integration of their genomes. Bacterial cells defective in the dimer resolution often show an aberrant filamentous cell morphology. We found that acquisition and chromosomal integration of the TLC-Knφ1 genome restored a perfect dif site and normal morphology to V. cholerae wild-type and mutant strains with dif(-) filamentation phenotypes. Furthermore, lysogeny of a dif(-) non-toxigenic V. cholerae with TLC-Knφ1 promoted its subsequent toxigenic conversion through integration of CTXφ into the restored dif site. These results reveal a remarkable level of cooperative interactions between multiple filamentous phages in the emergence of the bacterial pathogen that causes cholera.
Nishigami, Tomohiko; Mibu, Akira; Tanaka, Katsuyoshi; Yamashita, Yuh; Watanabe, Akihisa; Tanabe, Akihito
2017-03-01
The Pain Catastrophizing Scale (PCS) is a commonly used as measure of pain catastrophizing. The scale comprises 13 items related to magnification, rumination, and helplessness. To facilitate quick screening and to reduce participant's burden, the four-item and six-item short forms of the English version of the PCS were developed. The purpose of the present study was to evaluate the psychometric properties of a Japanese version of the short forms of PCS using a contemporary approach called Rasch analysis. A total of 216 patients with musculoskeletal disorders were recruited in this study. Participants completed study measures, which included the pain intensity, the Pain Catastrophizing Scale (PCS), and the Tampa Scale of Kinesiophobia (TSK). Furthermore, the four-item (items 3, 6, 8, and 11) and six-item (items 4, 5, 6, 10, 11, and 13) short forms of the Japanese version of PCS were measured. We used Rasch analysis to analyze the psychometric properties of the original, four-item, and six-item short forms of PCS. Rasch analysis showed that both short forms of PCS had acceptable internal consistency, unidimensionality, and no notable DIF and were functional on the category rating scale. However, four-item short form of PCS had two misfit items. Six-item short form of PCS has acceptable psychometric properties and is suitable for use in participants with musculoskeletal pain. Thus, six-item can be used as brief instruments to evaluate pain catastrophizing. Copyright © 2016 The Japanese Orthopaedic Association. Published by Elsevier B.V. All rights reserved.
Judgmental and Statistical DIF Analyses of the PISA-2003 Mathematics Literacy Items
ERIC Educational Resources Information Center
Yildirim, Huseyin Husnu; Berberoglu, Giray
2009-01-01
Comparisons of human characteristics across different language groups and cultures become more important in today's educational assessment practices as evidenced by the increasing interest in international comparative studies. Within this context, the fairness of the results across different language and cultural groups draws the attention of…
Morandini, P; Offer, J; Traynor, D; Nayler, O; Neuhaus, D; Taylor, G W; Kay, R R
1995-01-01
Stalk cell differentiation during development of the slime mould Dictyostelium is induced by a chlorinated alkyl phenone called differentiation-inducing factor-1 (DIF-1). Inactivation of DIF-1 is likely to be a key element in the DIF-1 signalling system, and we have shown previously that this is accomplished by a dedicated metabolic pathway involving up to 12 unidentified metabolites. We report here the structure of the first four metabolites produced from DIF-1, as deduced by m.s., n.m.r. and chemical synthesis. The structures of these compounds show that the first step in metabolism is a dechlorination of the phenolic ring, producing DIF metabolite 1 (DM1). DM1 is identical with the previously known minor DIF activity, DIF-3. DIF-3 is then metabolized by three successive oxidations of its aliphatic side chain: a hydroxylation at omega-2 to produce DM2, oxidation of the hydroxy group to a ketone group to produce DM3 and a further hydroxylation at omega-1 to produce DM4, a hydroxyketone of DIF-3. We have investigated the enzymology of DIF-1 metabolism. It is already known that the first step, to produce DIF-3, is catalysed by a novel dechlorinase. The enzyme activity responsible for the first side-chain oxidation (DIF-3 hydroxylase) was detected by incubating [3H]DIF-3 with cell-free extracts and resolving the reaction products by t.l.c. DIF-3 hydroxylase has many of the properties of a cytochrome P-450. It is membrane-bound and uses NADPH as co-substrate. It is also inhibited by CO, the classic cytochrome P-450 inhibitor, and by several other cytochrome P-450 inhibitors, as well as by diphenyliodonium chloride, an inhibitor of cytochrome P-450 reductase. DIF-3 hydroxylase is highly specific for DIF-3: other closely related compounds do not compete for the activity at 100-fold molar excess, with the exception of the DIF-3 analogue lacking the chlorine atom. The Km for DIF-3 of 47 nM is consistent with this enzyme being responsible for DIF-3 metabolism in vivo. The two further oxidations necessary to produce DM4 are also performed in vitro by similar enzyme activities. One of the inhibitors of DIF-3 hydroxylase, ancymidol (IC50 67 nM) is likely to be particularly suitable for probing the function of DIF metabolism during development. Images Figure 3 Figure 4 PMID:7702568
Kayser, Lars; Karnoe, Astrid; Furstrand, Dorthe; Batterham, Roy; Christensen, Karl Bang; Elsworth, Gerald; Osborne, Richard H
2018-02-12
For people to be able to access, understand, and benefit from the increasing digitalization of health services, it is critical that services are provided in a way that meets the user's needs, resources, and competence. The objective of the study was to develop a questionnaire that captures the 7-dimensional eHealth Literacy Framework (eHLF). Draft items were created in parallel in English and Danish. The items were generated from 450 statements collected during the conceptual development of eHLF. In all, 57 items (7 to 9 items per scale) were generated and adjusted after cognitive testing. Items were tested in 475 people recruited from settings in which the scale was intended to be used (community and health care settings) and including people with a range of chronic conditions. Measurement properties were assessed using approaches from item response theory (IRT) and classical test theory (CTT) such as confirmatory factor analysis (CFA) and reliability using composite scale reliability (CSR); potential bias due to age and sex was evaluated using differential item functioning (DIF). CFA confirmed the presence of the 7 a priori dimensions of eHLF. Following item analysis, a 35-item 7-scale questionnaire was constructed, covering (1) using technology to process health information (5 items, CSR=.84), (2) understanding of health concepts and language (5 items, CSR=.75), (3) ability to actively engage with digital services (5 items, CSR=.86), (4) feel safe and in control (5 items, CSR=.87), (5) motivated to engage with digital services (5 items, CSR=.84), (6) access to digital services that work (6 items, CSR=.77), and (7) digital services that suit individual needs (4 items, CSR=.85). A 7-factor CFA model, using small-variance priors for cross-loadings and residual correlations, had a satisfactory fit (posterior productive P value: .27, 95% CI for the difference between the observed and replicated chi-square values: -63.7 to 133.8). The CFA showed that all items loaded strongly on their respective factors. The IRT analysis showed that no items were found to have disordered thresholds. For most scales, discriminant validity was acceptable; however, 2 pairs of dimensions were highly correlated; dimensions 1 and 5 (r=.95), and dimensions 6 and 7 (r=.96). All dimensions were retained because of strong content differentiation and potential causal relationships between these dimensions. There is no evidence of DIF. The eHealth Literacy Questionnaire (eHLQ) is a multidimensional tool based on a well-defined a priori eHLF framework with robust properties. It has satisfactory evidence of construct validity and reliable measurement across a broad range of concepts (using both CTT and IRT traditions) in various groups. It is designed to be used to understand and evaluate people's interaction with digital health services. ©Lars Kayser, Astrid Karnoe, Dorthe Furstrand, Roy Batterham, Karl Bang Christensen, Gerald Elsworth, Richard H Osborne. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 12.02.2018.
Development and validation of an item response theory-based Social Responsiveness Scale short form.
Sturm, Alexandra; Kuhfeld, Megan; Kasari, Connie; McCracken, James T
2017-09-01
Research and practice in autism spectrum disorder (ASD) rely on quantitative measures, such as the Social Responsiveness Scale (SRS), for characterization and diagnosis. Like many ASD diagnostic measures, SRS scores are influenced by factors unrelated to ASD core features. This study further interrogates the psychometric properties of the SRS using item response theory (IRT), and demonstrates a strategy to create a psychometrically sound short form by applying IRT results. Social Responsiveness Scale analyses were conducted on a large sample (N = 21,426) of youth from four ASD databases. Items were subjected to item factor analyses and evaluation of item bias by gender, age, expressive language level, behavior problems, and nonverbal IQ. Item selection based on item psychometric properties, DIF analyses, and substantive validity produced a reduced item SRS short form that was unidimensional in structure, highly reliable (α = .96), and free of gender, age, expressive language, behavior problems, and nonverbal IQ influence. The short form also showed strong relationships with established measures of autism symptom severity (ADOS, ADI-R, Vineland). Degree of association between all measures varied as a function of expressive language. Results identified specific SRS items that are more vulnerable to non-ASD-related traits. The resultant 16-item SRS short form may possess superior psychometric properties compared to the original scale and emerge as a more precise measure of ASD core symptom severity, facilitating research and practice. Future research using IRT is needed to further refine existing measures of autism symptomatology. © 2017 Association for Child and Adolescent Mental Health.
Pedersen, Eric R; Huang, Wenjing; Dvorak, Robert D; Prince, Mark A; Hummer, Justin F
2017-08-01
Given recent state legislation legalizing marijuana for recreational purposes and majority popular opinion favoring these laws, we developed the Protective Behavioral Strategies for Marijuana scale (PBSM) to identify strategies that may mitigate the harms related to marijuana use among those young people who choose to use the drug. In the current study, we expand on the initial exploratory study of the PBSM to further validate the measure with a large and geographically diverse sample (N = 2,117; 60% women, 30% non-White) of college students from 11 different universities across the United States. We sought to develop a psychometrically sound item bank for the PBSM and to create a short assessment form that minimizes respondent burden and time. Quantitative item analyses, including exploratory and confirmatory factor analyses with item response theory (IRT) and evaluation of differential item functioning (DIF), revealed an item bank of 36 items that was examined for unidimensionality and good content coverage, as well as a short form of 17 items that is free of bias in terms of gender (men vs. women), race (White vs. non-White), ethnicity (Hispanic vs. non-Hispanic), and recreational marijuana use legal status (state recreational marijuana was legal for 25.5% of participants). We also provide a scoring table for easy transformation from sum scores to IRT scale scores. The PBSM item bank and short form associated strongly and negatively with past month marijuana use and consequences. The measure may be useful to researchers and clinicians conducting intervention and prevention programs with young adults. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Dere, Jessica; Sun, Jiahong; Zhao, Yue; Persson, Tonje J; Zhu, Xiongzhao; Yao, Shuqiao; Bagby, R Michael; Ryder, Andrew G
2013-01-01
The finding that people of Chinese heritage tend to emphasize somatic rather than psychological symptoms of depression has frequently been discussed in the culture and mental health literature since the 1970s. Recent studies have confirmed that Chinese samples report more somatic and fewer psychological depression symptoms compared to "Western" samples. The question remains, however, as to whether or not these effects are attributable to variation in all the constituent symptoms or to a subset. If the latter, there is the additional possibility that some symptoms might show a divergent pattern. Such findings would have implications for how cultural variations in symptom presentation are interpreted, and would also inform the cultural study of affective experiences more broadly. The current study addressed these issues in Chinese (n = 175) and Euro-Canadian (n = 107) psychiatric outpatients originally described by Ryder et al. (2008). Differential item functioning (DIF) was used to examine whether specific somatic and psychological symptoms diverged from the overall patterns of cultural variation. Chi-square analyses were used to examine atypical somatic symptoms (e.g., hypersomnia), previously neglected in this literature. No DIF was observed for the typical somatic symptoms, but Euro-Canadians reported greater levels of atypical somatic symptoms, and showed higher rates of atypical depression. DIF was observed for psychological symptoms-the Chinese reported high levels of "suppressed emotions" and "depressed mood," relative to their overall psychological symptom reporting. Chinese outpatients also spontaneously reported "depressed mood" at similar levels as the Euro-Canadians, contrary to prevailing ideas about Chinese unwillingness to discuss depression. Overall, the findings provide a more nuanced picture of how culture shapes symptom presentation and point toward future studies designed to unpack cultural variation in narrower subsets of depressive symptoms.
2017-01-01
Background The Center for Epidemiologic Studies Depression Scale (CES-D) is a measure of depressive symptomatology which is widely used internationally. Though previous attempts were made to shorten the CES-D scale, few have attempted to develop a Computerized Adaptive Test (CAT) version for the CES-D. Objective The aim of this study was to provide evidence on the efficiency and accuracy of the CES-D when administered using CAT using an American sample group. Methods We obtained a sample of 2060 responses to the CESD-D from US participants using the myPersonality application. The average age of participants was 26 years (range 19-77). We randomly split the sample into two groups to evaluate and validate the psychometric models. We used evaluation group data (n=1018) to assess dimensionality with both confirmatory factor and Mokken analysis. We conducted further psychometric assessments using item response theory (IRT), including assessments of item and scale fit to Samejima’s graded response model (GRM), local dependency and differential item functioning. We subsequently conducted two CAT simulations to evaluate the CES-D CAT using the validation group (n=1042). Results Initial CFA results indicated a poor fit to the model and Mokken analysis revealed 3 items which did not conform to the same dimension as the rest of the items. We removed the 3 items and fit the remaining 17 items to GRM. We found no evidence of differential item functioning (DIF) between age and gender groups. Estimates of the level of CES-D trait score provided by the simulated CAT algorithm and the original CES-D trait score derived from original scale were correlated highly. The second CAT simulation conducted using real participant data demonstrated higher precision at the higher levels of depression spectrum. Conclusions Depression assessments using the CES-D CAT can be more accurate and efficient than those made using the fixed-length assessment. PMID:28931496
Strong, David R; Messer, Karen; Hartman, Sheri J; Conway, Kevin P; Hoffman, Allison C; Pharris-Ciurej, Nikolas; White, Martha; Green, Victoria R; Compton, Wilson M; Pierce, John
2015-07-01
Nicotine dependence (ND) is a key construct that organizes physiological and behavioral symptoms associated with persistent nicotine intake. Measurement of ND has focused primarily on cigarette smokers. Thus, validation of brief instruments that apply to a broad spectrum of tobacco product users is needed. We examined multiple domains of ND in a longitudinal national study of the United States population, the United States National Epidemiological Survey of Alcohol and Related Conditions (NESARC). We used methods based in item response theory to identify and validate increasingly brief measures of ND that included symptoms to assess ND similarly among cigarette, cigar, smokeless, and poly tobacco users. Confirmatory factor analytic models supported a single, primary dimension underlying symptoms of ND across tobacco use groups. Differential Item Functioning (DIF) analysis generated little support for systematic differences in response to symptoms of ND across tobacco use groups. We established significant concurrent and predictive validity of brief 3- and 5-symptom indices for measuring ND. Measuring ND across tobacco use groups with a common set of symptoms facilitates evaluation of tobacco use in an evolving marketplace of tobacco and nicotine products. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
An item response theory analysis of the Olweus Bullying scale.
Breivik, Kyrre; Olweus, Dan
2014-12-02
In the present article, we used IRT (graded response) modeling as a useful technology for a detailed and refined study of the psychometric properties of the various items of the Olweus Bullying scale and the scale itself. The sample consisted of a very large number of Norwegian 4th-10th grade students (n = 48 926). The IRT analyses revealed that the scale was essentially unidimensional and had excellent reliability in the upper ranges of the latent bullying tendency trait, as intended and desired. Gender DIF effects were identified with regard to girls' use of indirect bullying by social exclusion and boys' use of physical bullying by hitting and kicking but these effects were small and worked in opposite directions, having negligible effects at the scale level. Also scale scores adjusted for DIF effects differed very little from non-adjusted scores. In conclusion, the empirical data were well characterized by the chosen IRT model and the Olweus Bullying scale was considered well suited for the conduct of fair and reliable comparisons involving different gender-age groups. Information Aggr. Behav. 9999:XX-XX, 2014. © 2014 Wiley Periodicals, Inc. © 2014 Wiley Periodicals, Inc.
An item response theory analysis of the Olweus Bullying scale.
Breivik, Kyrre; Olweus, Dan
2015-01-01
In the present article, we used IRT (graded response) modeling as a useful technology for a detailed and refined study of the psychometric properties of the various items of the Olweus Bullying scale and the scale itself. The sample consisted of a very large number of Norwegian 4th-10th grade students (n = 48 926). The IRT analyses revealed that the scale was essentially unidimensional and had excellent reliability in the upper ranges of the latent bullying tendency trait, as intended and desired. Gender DIF effects were identified with regard to girls' use of indirect bullying by social exclusion and boys' use of physical bullying by hitting and kicking but these effects were small and worked in opposite directions, having negligible effects at the scale level. Also scale scores adjusted for DIF effects differed very little from non-adjusted scores. In conclusion, the empirical data were well characterized by the chosen IRT model and the Olweus Bullying scale was considered well suited for the conduct of fair and reliable comparisons involving different gender-age groups. Information Aggr. Behav. 41:1-13, 2015. © 2014 Wiley Periodicals, Inc. © 2014 Wiley Periodicals, Inc.
Psychometric Evaluation of a Cultural Competency Assessment Instrument for Health Professionals
Haywood, Sonja H.; Goode, Tawara; Gao, Yong; Smith, Kristyn; Bronheim, Suzanne; Flocke, Susan A; Zyzanski, Steve
2012-01-01
Background Few valid and reliable measures exist for health care professionals interested in determining their levels of cultural and linguistic competence. Objective To evaluate the measurement properties of the Cultural Competence Health Practitioner Assessment (CCHPA-129). Methods The CCHPA-129 is a 129-item web-based instrument, developed by the National Center for Cultural Competence (NCCC). Responses on the CCHPA -129 were examined using factor analysis; Rasch modeling; and Differential Item Functioning (DIF) across race, ethnicity, gender, and profession. Subjects 2504 practitioners, including 1864 nurses (RN/LPN,/BSN); 341 clinicians (PA/NP); and 299 physicians (MD/DO), who completed the CCHPA-129 online between 2005 and 2008. Results Three factors representing domains of knowledge, adapting practice, and promoting health for culturally and linguistically diverse populations accounted for 46% of the variance. Among Knowledge factor items, 53% (23/43) fit the Rasch model, item difficulties ranged from −1.01 logits (least difficult) to +1.11 logits (most difficult), separation index (SI) 13.82, and Cronbach’s α 0.92. Forty-seven percent (21/44) Adapting Practice factor items fit the model, item difficulties −0.07 to +1.11 logits, SI 11.59, Cronbach’s α 0.88; and 58% (23/39). Promoting Health factor items fit the model, item difficulties −1.01 to +1.38 logits, SI 22.64, Cronbach’s α 0.92. Early evidence of validity was established by known groups having statistically different scores. Conclusion The 67-item CCHPA-67 is psychometrically sound. This shorted instrument can be used to establish associations between practitioners’ cultural and linguistic competence and health outcomes as well as to evaluate interventions to increase practitioners’ cultural and linguistic competence. PMID:22437625
Hagman, Brett T; Kuerbis, Alexis N; Morgenstern, Jon; Bux, Donald A; Parsons, Jeffrey T; Heidinger, Bram E
2009-11-01
The Short Inventory of Problems-Alcohol and Drugs (SIP-AD) is a 15-item measure that assesses concurrently negative consequences associated with alcohol and illicit drug use. Current psychometric evaluation has been limited to classical test theory (CTT) statistics, and it has not been validated among non-treatment seeking men-who-have-sex-with-men (MSM). Methods from Item Response Theory (IRT) can improve upon CTT by providing an in-depth analysis of how each item performs across the underlying latent trait that it is purported to measure. The present study examined the psychometric properties of the SIP-AD using methods from both IRT and CTT among a non-treatment seeking MSM sample (N=469). Participants were recruited from the New York City area and were asked to participate in a series of studies examining club drug use. Results indicated that five items on the SIP-AD demonstrated poor item misfit or significant differential item functioning (DIF) across race/ethnicity and HIV status. These five items were dropped and two-parameter IRT analyses were conducted on the remaining 10 items, which indicated a restricted range of item location parameters (-.15 to -.99) plotted at the lower end of the latent negative consequences severity continuum, and reasonably high discrimination parameters (1.30 to 2.22). Additional CTT statistics were compared between the original 15-item SIP-AD and the refined 10-item SIP-AD and suggest that the differences were negligible with the refined 10-item SIP-AD indicating a high degree of reliability and validity. Findings suggest the SIP-AD can be shortened to 10 items and appears to be a non-biased reliable and valid measure among non-treatment seeking MSM.
Scientific literacy: Factor structure and gender differences
NASA Astrophysics Data System (ADS)
Manhart, James Joseph
The purpose of this study was to investigate the factor structure of scientific literacy and to document any gender differences with respect to each factor. Participants included 1139 students (574 females, 565 males) in grades 9 through 12 who were taking a science class at one of four Midwestern high schools. Based on National Science Education Standards, a 100 item multiple-choice test was constructed to assess scientific literacy. Confirmatory factor analysis of item parcels suggested a three factor model was the best way to explain the data resulting from the administration of this test. The factors were labeled constructs of science, abilities necessary to do scientific inquiry, and social aspects of science. Gender differences with respect to these factors were examined using analysis of variance procedures. Because differential enrollment in science classes could cause gender differences in grades 11 and 12, parallel analyses were conducted on the grades 9 and 10 subsample and the grades 11 and 12 subsample. However, the results of the two analyses were similar. The most consistent gender difference observed was that females performed better than males on the social aspects of science factor. Males tended to perform better than females on the constructs of science factor, although no consistent gender difference was noted for items dealing with life science. With respect to the abilities necessary to do scientific inquiry factor, females tended to perform better than males in grades 9 and 10, while no consistent gender difference was observed in grades 11 and 12. Gender differences were also examined using the Mantel-Haenszel procedure to flag individual items that functioned differently for females and males of the same ability. Twelve items were flagged for grades 9 and 10 (8 in favor of females, 4 in favor of males). Fourteen items were flagged for grades 11 and 12 (7 in favor of females, 7 in favor of males). All of the flagged items exhibited only small to moderate differential item functioning (DIF). Only three items were similarly flagged in both subsamples, one item from each factor.
Using item response theory to address vulnerabilities in FFQ.
Kazman, Josh B; Scott, Jonathan M; Deuster, Patricia A
2017-09-01
The limitations for self-reporting of dietary patterns are widely recognised as a major vulnerability of FFQ and the dietary screeners/scales derived from FFQ. Such instruments can yield inconsistent results to produce questionable interpretations. The present article discusses the value of psychometric approaches and standards in addressing these drawbacks for instruments used to estimate dietary habits and nutrient intake. We argue that a FFQ or screener that treats diet as a 'latent construct' can be optimised for both internal consistency and the value of the research results. Latent constructs, a foundation for item response theory (IRT)-based scales (e.g. Patient Reported Outcomes Measurement Information System) are typically introduced in the design stage of an instrument to elicit critical factors that cannot be observed or measured directly. We propose an iterative approach that uses such modelling to refine FFQ and similar instruments. To that end, we illustrate the benefits of psychometric modelling by using items and data from a sample of 12 370 Soldiers who completed the 2012 US Army Global Assessment Tool (GAT). We used factor analysis to build the scale incorporating five out of eleven survey items. An IRT-driven assessment of response category properties indicates likely problems in the ordering or wording of several response categories. Group comparisons, examined with differential item functioning (DIF), provided evidence of scale validity across each Army sub-population (sex, service component and officer status). Such an approach holds promise for future FFQ.
Chen, Cheng-Te; Chen, Yu-Lan; Lin, Yu-Ching; Hsieh, Ching-Lin; Tzeng, Jeng-Yi
2018-01-01
Objective The purpose of this study was to construct a computerized adaptive test (CAT) for measuring self-care performance (the CAT-SC) in children with developmental disabilities (DD) aged from 6 months to 12 years in a content-inclusive, precise, and efficient fashion. Methods The study was divided into 3 phases: (1) item bank development, (2) item testing, and (3) a simulation study to determine the stopping rules for the administration of the CAT-SC. A total of 215 caregivers of children with DD were interviewed with the 73-item CAT-SC item bank. An item response theory model was adopted for examining the construct validity to estimate item parameters after investigation of the unidimensionality, equality of slope parameters, item fitness, and differential item functioning (DIF). In the last phase, the reliability and concurrent validity of the CAT-SC were evaluated. Results The final CAT-SC item bank contained 56 items. The stopping rules suggested were (a) reliability coefficient greater than 0.9 or (b) 14 items administered. The results of simulation also showed that 85% of the estimated self-care performance scores would reach a reliability higher than 0.9 with a mean test length of 8.5 items, and the mean reliability for the rest was 0.86. Administering the CAT-SC could reduce the number of items administered by 75% to 84%. In addition, self-care performances estimated by the CAT-SC and the full item bank were very similar to each other (Pearson r = 0.98). Conclusion The newly developed CAT-SC can efficiently measure self-care performance in children with DD whose performances are comparable to those of TD children aged from 6 months to 12 years as precisely as the whole item bank. The item bank of the CAT-SC has good reliability and a unidimensional self-care construct, and the CAT can estimate self-care performance with less than 25% of the items in the item bank. Therefore, the CAT-SC could be useful for measuring self-care performance in children with DD in clinical and research settings. PMID:29561879
Chen, Cheng-Te; Chen, Yu-Lan; Lin, Yu-Ching; Hsieh, Ching-Lin; Tzeng, Jeng-Yi; Chen, Kuan-Lin
2018-01-01
The purpose of this study was to construct a computerized adaptive test (CAT) for measuring self-care performance (the CAT-SC) in children with developmental disabilities (DD) aged from 6 months to 12 years in a content-inclusive, precise, and efficient fashion. The study was divided into 3 phases: (1) item bank development, (2) item testing, and (3) a simulation study to determine the stopping rules for the administration of the CAT-SC. A total of 215 caregivers of children with DD were interviewed with the 73-item CAT-SC item bank. An item response theory model was adopted for examining the construct validity to estimate item parameters after investigation of the unidimensionality, equality of slope parameters, item fitness, and differential item functioning (DIF). In the last phase, the reliability and concurrent validity of the CAT-SC were evaluated. The final CAT-SC item bank contained 56 items. The stopping rules suggested were (a) reliability coefficient greater than 0.9 or (b) 14 items administered. The results of simulation also showed that 85% of the estimated self-care performance scores would reach a reliability higher than 0.9 with a mean test length of 8.5 items, and the mean reliability for the rest was 0.86. Administering the CAT-SC could reduce the number of items administered by 75% to 84%. In addition, self-care performances estimated by the CAT-SC and the full item bank were very similar to each other (Pearson r = 0.98). The newly developed CAT-SC can efficiently measure self-care performance in children with DD whose performances are comparable to those of TD children aged from 6 months to 12 years as precisely as the whole item bank. The item bank of the CAT-SC has good reliability and a unidimensional self-care construct, and the CAT can estimate self-care performance with less than 25% of the items in the item bank. Therefore, the CAT-SC could be useful for measuring self-care performance in children with DD in clinical and research settings.
The student resilience survey: psychometric validation and associations with mental health.
Lereya, Suzet Tanya; Humphrey, Neil; Patalay, Praveetha; Wolpert, Miranda; Böhnke, Jan R; Macdougall, Amy; Deighton, Jessica
2016-01-01
Policies, designed to promote resilience, and research, to understand the determinants and correlates of resilience, require reliable and valid measures to ensure data quality. The student resilience survey (SRS) covers a range of external supports and internal characteristics which can potentially be viewed as protective factors and can be crucial in exploring the mechanisms between protective factors and risk factors, and to design intervention and prevention strategies. This study examines the validity of the SRS. 7663 children (aged 11-15 years) from 12 local areas across England completed the SRS, and questionnaires regarding mental and physical health. Psychometric properties of 10 subscales of the SRS (family connection, school connection, community connection, participation in home and school life, participation in community life, peer support, self-esteem, empathy, problem solving, and goals and aspirations) were investigated by confirmatory factor analysis (CFA), differential item functioning (DIF), differential test functioning (DTF), Cronbach's α and McDonald's ω . The associations between the SRS scales, mental and physical health outcomes were examined. The results supported the construct validity of the 10 factors of the scale and provided evidence for acceptable reliability of all the subscales. Our DIF analysis indicated differences between boys and girls, between primary and secondary school children, between children with or without special educational needs (SEN) and between children with or without English as an additional language (EAL) in terms of how they answered the peer support subscale of the SRS. Analyses did not indicate any DIF based on free school meals (FSM) eligibility. All subscales, except the peer support subscale, showed small DTF whereas the peer support subscale showed moderate DTF. Correlations showed that all the student resilience subscales were negatively associated with mental health difficulties, global subjective distress and impact on health. Random effects linear regression models showed that family connection, self-esteem, problem solving and peer support were negatively associated with all the mental health outcomes. The findings suggest that the SRS is a valid measure assessing these relevant protective factors, thereby serving as a valuable tool in resilience and mental health research.
Catquest-9SF questionnaire: validation of Malay and Chinese-language versions using Rasch analysis.
Adnan, Tassha Hilda; Mohamed Apandi, Mokhlisoh; Kamaruddin, Haireen; Salowi, Mohamad Aziz; Law, Kian Boon; Haniff, Jamaiyah; Goh, Pik Pin
2018-01-05
Catquest questionnaire was originally developed in Swedish to measure patients' self-assessed visual function to evaluate the benefit of cataract surgery. The result of the Rasch analysis leading to the creation of the nine-item short form of Catquest, (Catquest-9SF), and it had been translated and validated in English. The aim is therefore to evaluate the translated Catquest-9SF questionnaire in Malay and Chinese (Mandarin) language version for measuring patient-reported visual function among cataract population in Malaysia. The English version of Catquest-9SF questionnaire was translated and back translated into Malay and Chinese languages. The Malay and Chinese translated versions were self-administered by 236 and 202 pre-operative patients drawn from a cataract surgery waiting list, respectively. The translated Catquest-9SF data and its four response options were assessed for fit to the Rasch model. The Catquest-9SF performed well in the Malay and Chinese translated versions fulfilling all criteria for valid measurement, as demonstrated by Rasch analysis. Both versions of questionnaire had ordered response thresholds, with a good person separation (Malay 2.84; and Chinese 2.59) and patient separation reliability (Malay 0.89; Chinese 0.87). Targeting was 0.30 and -0.11 logits in Malay and Chinese versions respectively, indicating that the item difficulty was well suited to the visual abilities of the patients. All items fit a single overall construct (Malay infit range 0.85-1.26, outfit range 0.73-1.13; Chinese infit range 0.80-1.51, outfit range 0.71-1.36), unidimensional by principal components analysis, and was free of Differential Item Functioning (DIF). These results support the good overall functioning of the Catquest-9SF in patients with cataract. The translated questionnaire to Malay and Chinese-language versions are reliable and valid in measuring visual disability outcomes in the Malaysian cataract population.
McFadden, Estelle; Horton, Mike C; Ford, Helen L; Gilworth, Gill; McFadden, Majella; Tennant, Alan
2012-06-01
Multiple sclerosis (MS) mainly presents amongst those of working age. Depending upon the type of MS, many people embark upon a long period of managing their day-to-day work-related needs in the face of intermittent and sometimes persistent disabling symptoms. The objective of this study was to explore the concept of work instability (WI) following the onset of MS and develop a Work Instability Scale (WIS) specific to this population. WI amongst those with MS in work was explored through qualitative interviews which were then used to generate items for a WIS. Rasch analysis was used to refine the scaling properties of the MS-WIS, which was then validated against expert vocational assessment by occupational health physiotherapists and ergonomists. The resulting measure is a 22-item, self-administered scale which can be scored in three bands indicating low, medium and high risk of WI (job retention) problems. The scale meets modern psychometric requirements for measurement, indicated by adequate fit to the Rasch model with absence of local dependency and differential item functioning (DIF) by age, gender and hours worked. The scale presents an opportunity in routine clinical practice to take positive action to reduce sickness absence and prevent job loss.
Constructing three emotion knowledge tests from the invariant measurement approach
Prieto, Gerardo; Burin, Debora I.
2017-01-01
Background Psychological constructionist models like the Conceptual Act Theory (CAT) postulate that complex states such as emotions are composed of basic psychological ingredients that are more clearly respected by the brain than basic emotions. The objective of this study was the construction and initial validation of Emotion Knowledge measures from the CAT frame by means of an invariant measurement approach, the Rasch Model (RM). Psychological distance theory was used to inform item generation. Methods Three EK tests—emotion vocabulary (EV), close emotional situations (CES) and far emotional situations (FES)—were constructed and tested with the RM in a community sample of 100 females and 100 males (age range: 18–65), both separately and conjointly. Results It was corroborated that data-RM fit was sufficient. Then, the effect of type of test and emotion on Rasch-modelled item difficulty was tested. Significant effects of emotion on EK item difficulty were found, but the only statistically significant difference was that between “happiness” and the remaining emotions; neither type of test, nor interaction effects on EK item difficulty were statistically significant. The testing of gender differences was carried out after corroborating that differential item functioning (DIF) would not be a plausible alternative hypothesis for the results. No statistically significant sex-related differences were found out in EV, CES, FES, or total EK. However, the sign of d indicate that female participants were consistently better than male ones, a result that will be of interest for future meta-analyses. Discussion The three EK tests are ready to be used as components of a higher-level measurement process. PMID:28929013
Kubohara, Yuzuru; Kikuchi, Haruhisa; Matsuo, Yusuke; Oshima, Yoshiteru; Homma, Yoshimi
2014-01-01
ABSTRACT Differentiation-inducing factor-3 (DIF-3), found in the cellular slime mold Dictyostelium discoideum, and its derivatives, such as butoxy-DIF-3 (Bu-DIF-3), are potent anti-tumor agents. To investigate the activity of DIF-like molecules in tumor cells, we recently synthesized a green fluorescent DIF-3 derivative, BODIPY-DIF-3G, and analyzed its bioactivity and cellular localization. In this study, we synthesized a red (orange) fluorescent DIF-3 derivative, BODIPY-DIF-3R, and compared the cellular localization and bioactivities of the two BODIPY-DIF-3s in HeLa human cervical cancer cells. Both fluorescent compounds penetrated the extracellular membrane within 0.5 h and localized mainly to the mitochondria. In formalin-fixed cells, the two BODIPY-DIF-3s also localized to the mitochondria, indicating that the BODIPY-DIF-3s were incorporated into mitochondria independently of the mitochondrial membrane potential. After treatment for 3 days, BODIPY-DIF-3G, but not BODIPY-DIF-3R, induced mitochondrial swelling and suppressed cell proliferation. Interestingly, the swollen mitochondria were stainable with BODIPY-DIF-3G but not with BODIPY-DIF-3R. When added to isolated mitochondria in vitro, BODIPY-DIF-3G increased dose-dependently the rate of O2 consumption, but BODIPY-DIF-3R did not. These results suggest that the bioactive BODIPY-DIF-3G suppresses cell proliferation, at least in part, by altering mitochondrial activity, whereas the non-bioactive BODIPY-DIF-3R localizes to the mitochondria but does not affect mitochondrial activity or cell proliferation. PMID:24682009
Kubohara, Yuzuru; Kikuchi, Haruhisa; Nguyen, Van Hai; Kuwayama, Hidekazu; Oshima, Yoshiteru
2017-06-15
Differentiation-inducing factor-1 [1-(3,5-dichloro-2,6-dihydroxy-4-methoxyphenyl)hexan-1-one (DIF-1)] is an important regulator of cell differentiation and chemotaxis in the development of the cellular slime mold Dictyostelium discoideum However, the entire signaling pathways downstream of DIF-1 remain to be elucidated. To characterize DIF-1 and its potential receptor(s), we synthesized two fluorescent derivatives of DIF-1, boron-dipyrromethene (BODIPY)-conjugated DIF-1 (DIF-1-BODIPY) and nitrobenzoxadiazole (NBD)-conjugated DIF-1 (DIF-1-NBD), and investigated their biological activities and cellular localization. DIF-1-BODIPY (5 µM) and DIF-1 (2 nM) induced stalk cell differentiation in the DIF-deficient strain HM44 in the presence of cyclic adenosine monosphosphate (cAMP), whereas DIF-1-NBD (5 µM) hardly induced stalk cell differentiation under the same conditions. Microscopic analyses revealed that the biologically active derivative, DIF-1-BODIPY, was incorporated by stalk cells at late stages of differentiation and was localized to mitochondria. The mitochondrial uncouplers carbonyl cyanide m -chlorophenylhydrazone (CCCP), at 25-50 nM, and dinitrophenol (DNP), at 2.5-5 µM, induced partial stalk cell differentiation in HM44 in the presence of cAMP. DIF-1-BODIPY (1-2 µM) and DIF-1 (10 nM), as well as CCCP and DNP, suppressed chemotaxis in the wild-type strain Ax2 in shallow cAMP gradients. These results suggest that DIF-1-BODIPY and DIF-1 induce stalk cell differentiation and modulate chemotaxis, at least in part, by disturbing mitochondrial activity. © 2017. Published by The Company of Biologists Ltd.
Kikuchi, Haruhisa; Nguyen, Van Hai; Kuwayama, Hidekazu; Oshima, Yoshiteru
2017-01-01
ABSTRACT Differentiation-inducing factor-1 [1-(3,5-dichloro-2,6-dihydroxy-4-methoxyphenyl)hexan-1-one (DIF-1)] is an important regulator of cell differentiation and chemotaxis in the development of the cellular slime mold Dictyostelium discoideum. However, the entire signaling pathways downstream of DIF-1 remain to be elucidated. To characterize DIF-1 and its potential receptor(s), we synthesized two fluorescent derivatives of DIF-1, boron-dipyrromethene (BODIPY)-conjugated DIF-1 (DIF-1-BODIPY) and nitrobenzoxadiazole (NBD)-conjugated DIF-1 (DIF-1-NBD), and investigated their biological activities and cellular localization. DIF-1-BODIPY (5 µM) and DIF-1 (2 nM) induced stalk cell differentiation in the DIF-deficient strain HM44 in the presence of cyclic adenosine monosphosphate (cAMP), whereas DIF-1-NBD (5 µM) hardly induced stalk cell differentiation under the same conditions. Microscopic analyses revealed that the biologically active derivative, DIF-1-BODIPY, was incorporated by stalk cells at late stages of differentiation and was localized to mitochondria. The mitochondrial uncouplers carbonyl cyanide m-chlorophenylhydrazone (CCCP), at 25–50 nM, and dinitrophenol (DNP), at 2.5–5 µM, induced partial stalk cell differentiation in HM44 in the presence of cAMP. DIF-1-BODIPY (1–2 µM) and DIF-1 (10 nM), as well as CCCP and DNP, suppressed chemotaxis in the wild-type strain Ax2 in shallow cAMP gradients. These results suggest that DIF-1-BODIPY and DIF-1 induce stalk cell differentiation and modulate chemotaxis, at least in part, by disturbing mitochondrial activity. PMID:28619991
Mazefsky, Carla A; Yu, Lan; White, Susan W; Siegel, Matthew; Pilkonis, Paul A
2018-06-01
Individuals with autism spectrum disorder (ASD) often present with prominent emotion dysregulation that requires treatment but can be difficult to measure. The Emotion Dysregulation Inventory (EDI) was created using methods developed by the Patient-Reported Outcomes Measurement Information System (PROMIS ® ) to capture observable indicators of poor emotion regulation. Caregivers of 1,755 youth with ASD completed 66 candidate EDI items, and the final 30 items were selected based on classical test theory and item response theory (IRT) analyses. The analyses identified two factors: (a) Reactivity, characterized by intense, rapidly escalating, sustained, and poorly regulated negative emotional reactions, and (b) Dysphoria, characterized by anhedonia, sadness, and nervousness. The final items did not show differential item functioning (DIF) based on gender, age, intellectual ability, or verbal ability. Because the final items were calibrated using IRT, even a small number of items offers high precision, minimizing respondent burden. IRT co-calibration of the EDI with related measures demonstrated its superiority in assessing the severity of emotion dysregulation with as few as seven items. Validity of the EDI was supported by expert review, its association with related constructs (e.g., anxiety and depression symptoms, aggression), higher scores in psychiatric inpatients with ASD compared to a community ASD sample, and demonstration of test-retest stability and sensitivity to change. In sum, the EDI provides an efficient and sensitive method to measure emotion dysregulation for clinical assessment, monitoring, and research in youth with ASD of any level of cognitive or verbal ability. Autism Res 2018, 11: 928-941. © 2018 International Society for Autism Research, Wiley Periodicals, Inc. This paper describes a new measure of poor emotional control called the Emotion Dysregulation Inventory (EDI). Caregivers of 1,755 youth with ASD completed candidate items, and advanced statistical techniques were applied to identify the best final items. The EDI is unique because it captures common emotional problems in ASD and is appropriate for both nonverbal and verbal youth. It is an efficient and sensitive measure for use in clinical assessments, monitoring, and research with youth with ASD. © 2018 International Society for Autism Research, Wiley Periodicals, Inc.
Freitas, Sandra; Prieto, Gerardo; Simões, Mário R; Nogueira, Joana; Santana, Isabel; Martins, Cristina; Alves, Lara
2018-05-03
The present study aims to analyze the psychometric characteristics of the TeLPI (Irregular Words Reading Test), a Portuguese premorbid intelligence test, using the Rasch model for dichotomous items. The results reveal an overall adequacy and a good fit of values regarding both items and persons. A high variability of cognitive performance level and a good quality of the measurements were also found. The TeLPI has proved to be a unidimensional measure with reduced DIF effects. The present findings contribute to overcome an important gap in the psychometric validity of this instrument and provide good evidence of the overall psychometric validity of TeLPI results.
GAMSOR: Gamma Source Preparation and DIF3D Flux Solution
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, M. A.; Lee, C. H.; Hill, R. N.
2017-06-28
Nuclear reactors that rely upon the fission reaction have two modes of thermal energy deposition in the reactor system: neutron absorption and gamma absorption. The gamma rays are typically generated by neutron capture reactions or during the fission process which means the primary driver of energy production is of course the neutron interaction. In conventional reactor physics methods, the gamma heating component is ignored such that the gamma absorption is forced to occur at the gamma emission site. For experimental reactor systems like EBR-II and FFTF, the placement of structural pins and assemblies internal to the core leads to problemsmore » with power heating predictions because there is no fission power source internal to the assembly to dictate a spatial distribution of the power. As part of the EBR-II support work in the 1980s, the GAMSOR code was developed to assist analysts in calculating the gamma heating. The GAMSOR code is a modified version of DIF3D and actually functions within a sequence of DIF3D calculations. The gamma flux in a conventional fission reactor system does not perturb the neutron flux and thus the gamma flux calculation can be cast as a fixed source problem given a solution to the steady state neutron flux equation. This leads to a sequence of DIF3D calculations, called the GAMSOR sequence, which involves solving the neutron flux, then the gamma flux, and then combining the results to do a summary edit. In this manuscript, we go over the GAMSOR code and detail how it is put together and functions. We also discuss how to setup the GAMSOR sequence and input for each DIF3D calculation in the GAMSOR sequence.« less
2012-01-01
Background The mini-Mental Adjustment to Cancer Scale (mini-MAC) is a well-recognised, popular measure of coping in psycho-oncology and assesses five cancer-specific coping strategies. It has been suggested that these five subscales could be grouped to form the over-arching adaptive and maladptive coping subscales to facilitate the interpretation and clinical application of the scale. Despite the popularity of the mini-MAC, few studies have examined its psychometric properties among long-term cancer survivors, and further validation of the mini-MAC is needed to substantiate its use with the growing population of survivors. Therefore, this study examined the psychometric properties and dimensionality of the mini-MAC in a sample of long-term cancer survivors using Rasch analysis. Methods RUMM 2030 was used to analyse the mini-MAC data (n=851). Separate Rasch analyses were conducted for each of the original mini-MAC subscales as well as the over-arching adaptive and maladaptive coping subscales to examine summary and individual model fit statistics, person separation index (PSI), response format, local dependency, targeting, item bias (or differential item functioning -DIF), and dimensionality. Results For the fighting spirit, fatalism, and helplessness-hopelessness subscales, a revised three-point response format seemed more optimal than the original four-point response. To achieve model fit, items were deleted from four of the five subscales – Anxious Preoccupation items 7, 25, and 29; Cognitive Avoidance items 11 and 17; Fighting Spirit item 18; and Helplessness-Hopelessness items 16 and 20. For those subscales with sufficient items, analyses supported unidimensionality. Combining items to form the adaptive and maladaptive subscales was partially supported. Conclusions The original five subscales required item deletion and/or rescaling to improve goodness of fit to the Rasch model. While evidence was found for overarching subscales of adaptive and maladaptive coping, extensive modifications were necessary to achieve this result. Further exploration and validation of over-arching subscales assessing adaptive and maladaptive coping is necessary with cancer survivors. PMID:22607052
Darzins, Susan; Imms, Christine; Di Stefano, Marilyn; Taylor, Nicholas F; Pallant, Julie F
2014-11-05
The Personal Care Participation Assessment and Resource Tool (PC-PART) is a 43-item, clinician-administered assessment, designed to identify patients' unmet needs (participation restrictions) in activities of daily living (ADL) required for community life. This information is important for identifying problems that need addressing to enable, for example, discharge from inpatient settings to community living. The objective of this study was to evaluate internal construct validity of the PC-PART using Rasch methods. Fit to the Rasch model was evaluated for 41 PC-PART items, assessing threshold ordering, overall model fit, individual item fit, person fit, internal consistency, Differential Item Functioning (DIF), targeting of items and dimensionality. Data used in this research were taken from admission data from a randomised controlled trial conducted at two publically funded inpatient rehabilitation units in Melbourne, Australia, with 996 participants (63% women; mean age 74 years) and with various impairment types. PC-PART items assessed as one scale, and original PC-PART domains evaluated as separate scales, demonstrated poor fit to the Rasch model. Adequate fit to the Rasch model was achieved in two newly formed PC-PART scales: Self-Care (16 items) and Domestic Life (14 items). Both scales were unidimensional, had acceptable internal consistency (PSI =0.85, 0.76, respectively) and well-targeted items. Rasch analysis did not support conventional summation of all PC-PART item scores to create a total score. However, internal construct validity of the newly formed PC-PART scales, Self-Care and Domestic Life, was supported. Their Rasch-derived scores provided interval-level measurement enabling summation of scores to form a total score on each scale. These scales may assist clinicians, managers and researchers in rehabilitation settings to assess and measure changes in ADL participation restrictions relevant to community living. Data used in this research were gathered during a registered randomised controlled trial: Australian and New Zealand Clinical Trials Registry ACTRN12609000973213. Ethics committee approval was gained for secondary analysis of data for this study.
The patient satisfaction questionnaire of EUprimecare project: measurement properties.
Cimas, Marta; Ayala, Alba; García-Pérez, Sonia; Sarria-Santamera, Antonio; Forjaz, Maria João
2016-06-01
The measurement of patient satisfaction is considered an essential outcome indicator to evaluate health care quality. Patient satisfaction is considered a multi-dimensional construct, which would include a variety of domains. Although a large number of studies have proposed scales to measure patient satisfaction, there is a lack of psychometric information on them. This study aims to describe the psychometric properties of the Primary Care Satisfaction Scale (PCSS) of the EUprimecare project. A cross-sectional survey of patient satisfaction with primary care was carried out by telephone interview. Primary care services of Estonia, Finland, Germany, Hungary, Lithuania, Italy and Spain. A total of 3020 adult patients aged 18-65 years old attending primary care services. Classic psychometric properties were analysed and Rasch analysis was used to assess the following measurement properties: fit to the Rasch model; uni-dimensionality; reliability; differential item functioning (DIF) by gender, age, civil status, area of residency and country; local independency; adequacy of response scale; and scale targeting. To achieve good fit to the Rasch model, the original response scales of three items (1, 2 and 6) were rescored and Item 3 (waiting time in the room) was removed. The scale was uni-dimensional and Person Separation Index was 0.79, indicating a good reliability. All items were free from bias. PCSS linear measure displayed satisfactory convergent validity with overall satisfaction with primary care. PCSS, as a reliable and valid scale, could be used to measure patient satisfaction in primary care in Europe. © The Author 2016. Published by Oxford University Press in association with the International Society for Quality in Health Care; all rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kubohara, Yuzuru, E-mail: ykuboha@juntendo.ac.jp; Department of Health Science, Juntendo University Graduate School of Health and Sports Science, Inzai 270-1695; Komachi, Mayumi
Osteosarcoma is a common metastatic bone cancer that predominantly develops in children and adolescents. Metastatic osteosarcoma remains associated with a poor prognosis; therefore, more effective anti-metastatic drugs are needed. Differentiation-inducing factor-1 (DIF-1), −2, and −3 are novel lead anti-tumor agents that were originally isolated from the cellular slime mold Dictyostelium discoideum. Here we investigated the effects of a panel of DIF derivatives on lysophosphatidic acid (LPA)-induced migration of mouse osteosarcoma LM8 cells by using a Boyden chamber assay. Some DIF derivatives such as Br-DIF-1, DIF-3(+2), and Bu-DIF-3 (5–20 μM) dose-dependently suppressed LPA-induced cell migration with associated IC{sub 50} values of 5.5, 4.6, andmore » 4.2 μM, respectively. On the other hand, the IC{sub 50} values of Br-DIF-1, DIF-3(+2), and Bu-DIF-3 versus cell proliferation were 18.5, 7.2, and 2.0 μM, respectively, in LM8 cells, and >20, 14.8, and 4.3 μM, respectively, in mouse 3T3-L1 fibroblasts (non-transformed). Together, our results demonstrate that Br-DIF-1 in particular may be a valuable tool for the analysis of cancer cell migration, and that DIF derivatives such as DIF-3(+2) and Bu-DIF-3 are promising lead anti-tumor agents for the development of therapies that suppress osteosarcoma cell proliferation, migration, and metastasis. - Highlights: • LPA induces cell migration (invasion) in murine osteosarcoma LM8 cells. • DIFs are novel lead anti-tumor agents found in Dictyostelium discoideum. • We examined the effects of DIF derivatives on LPA-induced LM8 cell migration in vitro. • Some of the DIF derivatives inhibited LPA-induced LM8 cell migration.« less
Kuwayama, Hidekazu; Kikuchi, Haruhisa; Oshima, Yoshiteru; Kubohara, Yuzuru
2016-12-01
In the development of the cellular slime mold Dictyostelium discoideum , two chlorinated compounds, the differentiation-inducing factors DIF-1 and DIF-2, play important roles in the regulation of both cell differentiation and chemotactic cell movement. However, the receptors of DIFs and the components of DIF signaling systems have not previously been elucidated. To identify the receptors for DIF-1 and DIF-2, we here performed DIF-conjugated affinity gel chromatography and liquid chromatography-tandem mass spectrometry and identified the glutathione S-transferase GST4 as a major DIF-binding protein. Knockout and overexpression mutants of gst4 ( gst4 - and gst4 OE , respectively) formed fruiting bodies, but the fruiting bodies of gst4 - cells were smaller than those of wild-type Ax2 cells, and those of gst4 OE cells were larger than those of Ax2 cells. Both chemotaxis regulation and in vitro stalk cell formation by DIFs in the gst4 mutants were similar to those of Ax2 cells. These results suggest that GST4 is a DIF-binding protein that regulates the sizes of cell aggregates and fruiting bodies in D. discoideum .
Tennant, Alan; Tyson, Sarah F.; Nordenskiöld, Ulla; Hawkins, Ruth; Prior, Yeliz
2015-01-01
Objectives. The Evaluation of Daily Activity Questionnaire (EDAQ) includes 138 items in 14 domains identified as important by people with RA. The aim of this study was to test the validity and reliability of the English EDAQ. Methods. A total of 502 participants completed two questionnaires 3 weeks apart. The first consisted of the EDAQ, HAQ, RA Quality of Life (RAQoL) and the Medical Outcomes Scale (MOS) 36-item Short-Form Health Survey (SF-36v2), and the second consisted of the EDAQ only. The 14 EDAQ domains were tested for: unidimensionality—using confirmatory factor analysis; fit, response dependency, invariance across groups (differential item functioning)—using Rasch analysis; internal consistency [Person Separation Index (PSI)]; concurrent validity—by correlations with the HAQ, SF-36v2 and RAQoL; and test–retest reliability (Spearman’s correlations). Results. Confirmatory factor analysis of the 14 EDAQ domains indicated unidimensionality, after adjustment for local dependency in each domain. All domains achieved a root mean square error of approximation <0.10 and satisfied Rasch model expectations for local dependency. DIF by age, gender and employment status was largely absent. The PSI was consistent with individual use (PSI = 0.94 for all 14 domains). For all domains, except Caring, concurrent validity was good: HAQ (rs = 0.72–0.91), RAQoL (rs = 0.67–0.82) and SF36v2 Physical Function scale (rs = −0.60 to −0.84) and test–retest reliability was good (rs = 0.70–0.89). Conclusion. Analysis supported a 14-domain, two-component structure (Self care and Mobility) of the EDAQ, where each domain, and both components, satisfied Rasch model requirements, and have robust reliability and validity. PMID:25863045
Kubohara, Yuzuru; Komachi, Mayumi; Homma, Yoshimi; Kikuchi, Haruhisa; Oshima, Yoshiteru
2015-08-07
Osteosarcoma is a common metastatic bone cancer that predominantly develops in children and adolescents. Metastatic osteosarcoma remains associated with a poor prognosis; therefore, more effective anti-metastatic drugs are needed. Differentiation-inducing factor-1 (DIF-1), -2, and -3 are novel lead anti-tumor agents that were originally isolated from the cellular slime mold Dictyostelium discoideum. Here we investigated the effects of a panel of DIF derivatives on lysophosphatidic acid (LPA)-induced migration of mouse osteosarcoma LM8 cells by using a Boyden chamber assay. Some DIF derivatives such as Br-DIF-1, DIF-3(+2), and Bu-DIF-3 (5-20 μM) dose-dependently suppressed LPA-induced cell migration with associated IC50 values of 5.5, 4.6, and 4.2 μM, respectively. On the other hand, the IC50 values of Br-DIF-1, DIF-3(+2), and Bu-DIF-3 versus cell proliferation were 18.5, 7.2, and 2.0 μM, respectively, in LM8 cells, and >20, 14.8, and 4.3 μM, respectively, in mouse 3T3-L1 fibroblasts (non-transformed). Together, our results demonstrate that Br-DIF-1 in particular may be a valuable tool for the analysis of cancer cell migration, and that DIF derivatives such as DIF-3(+2) and Bu-DIF-3 are promising lead anti-tumor agents for the development of therapies that suppress osteosarcoma cell proliferation, migration, and metastasis. Copyright © 2015 Elsevier Inc. All rights reserved.
Bours, Ralph; van Zanten, Martijn; Pierik, Ronald; Bouwmeester, Harro; van der Krol, Alexander
2013-10-01
In the natural environment, days are generally warmer than the night, resulting in a positive day/night temperature difference (+DIF). Plants have adapted to these conditions, and when exposed to antiphase light and temperature cycles (cold photoperiod/warm night [-DIF]), most species exhibit reduced elongation growth. To study the physiological mechanism of how light and temperature cycles affect plant growth, we used infrared imaging to dissect growth dynamics under +DIF and -DIF in the model plant Arabidopsis (Arabidopsis thaliana). We found that -DIF altered leaf growth patterns, decreasing the amplitude and delaying the phase of leaf movement. Ethylene application restored leaf growth in -DIF conditions, and constitutive ethylene signaling mutants maintain robust leaf movement amplitudes under -DIF, indicating that ethylene signaling becomes limiting under these conditions. In response to -DIF, the phase of ethylene emission advanced 2 h, but total ethylene emission was not reduced. However, expression analysis on members of the 1-aminocyclopropane-1-carboxylic acid (ACC) synthase ethylene biosynthesis gene family showed that ACS2 activity is specifically suppressed in the petiole region under -DIF conditions. Indeed, petioles of plants under -DIF had reduced ACC content, and application of ACC to the petiole restored leaf growth patterns. Moreover, acs2 mutants displayed reduced leaf movement under +DIF, similar to wild-type plants under -DIF. In addition, we demonstrate that the photoreceptor PHYTOCHROME B restricts ethylene biosynthesis and constrains the -DIF-induced phase shift in rhythmic growth. Our findings provide a mechanistic insight into how fluctuating temperature cycles regulate plant growth.
Fayers, Peter M
2007-01-01
We review the papers presented at the NCI/DIA conference, to identify areas of controversy and uncertainty, and to highlight those aspects of item response theory (IRT) and computer adaptive testing (CAT) that require theoretical or empirical research in order to justify their application to patient reported outcomes (PROs). IRT and CAT offer exciting potential for the development of a new generation of PRO instruments. However, most of the research into these techniques has been in non-healthcare settings, notably in education. Educational tests are very different from PRO instruments, and consequently problematic issues arise when adapting IRT and CAT to healthcare research. Clinical scales differ appreciably from educational tests, and symptoms have characteristics distinctly different from examination questions. This affects the transferring of IRT technology. Particular areas of concern when applying IRT to PROs include inadequate software, difficulties in selecting models and communicating results, insufficient testing of local independence and other assumptions, and a need of guidelines for estimating sample size requirements. Similar concerns apply to differential item functioning (DIF), which is an important application of IRT. Multidimensional IRT is likely to be advantageous only for closely related PRO dimensions. Although IRT and CAT provide appreciable potential benefits, there is a need for circumspection. Not all PRO scales are necessarily appropriate targets for this methodology. Traditional psychometric methods, and especially qualitative methods, continue to have an important role alongside IRT. Research should be funded to address the specific concerns that have been identified.
La Porta, F; Giordano, A; Caselli, S; Foti, C; Franchignoni, F
2015-12-01
It is unclear whether the BBS is an effective tool for the measurement of early postural control impairments in patients with Parkinson's disease (PD). The aim of this paper was to evaluate BBS' content validity, internal construct validity, reliability and targeting in patients with PD within the Rasch analysis framework. Observational, cross-sectional study. Outpatient Rehabilitation Unit. A sample of 285 outpatients with PD. The content validity of the BBS was assessed using standard linking techniques. The BBS was administered by trained physiotherapists. The data collected then underwent Rasch analysis. Content validity analysis showed a lack of items assessing postural responses to tripping and slips and stability during walking. On Rasch analysis, the BBS failed the requirements of monotonicity, local independence, unidimensionality and invariance. After rescoring 7 items, grouping of locally dependent items into testlets, and deletion of the static sitting balance item because mistargeted and underdiscriminating, the Rasch-modified BBS for PD (BBS-PD) showed adequate internal construct validity (χ(2)24=39.693; P=0.023), including absence of differential item functioning (DIF) across gender and age, and was, as a whole, sufficiently precise for individual person measurement (PSI=0.894). However, the scale was not well targeted to the sample in view of the prevalence of higher scores. This study demonstrated the internal construct validity and reliability of the BBS-PD as a measurement tool for patients with PD within the Rasch analysis framework. However, the lack of items critical to the assessment of postural control impairments typical of PD, affected negatively the targeting, so that a significant percentage of patients was located in the higher ability range of the measurement continuum, where precision of measurement is reduced. These findings suggest that the BBS, even if modified, may not be an effective tool for the measurement of early postural control in patients with PD.
Implementing statistical equating for MRCP(UK) Parts 1 and 2.
McManus, I C; Chis, Liliana; Fox, Ray; Waller, Derek; Tang, Peter
2014-09-26
The MRCP(UK) exam, in 2008 and 2010, changed the standard-setting of its Part 1 and Part 2 examinations from a hybrid Angoff/Hofstee method to statistical equating using Item Response Theory, the reference group being UK graduates. The present paper considers the implementation of the change, the question of whether the pass rate increased amongst non-UK candidates, any possible role of Differential Item Functioning (DIF), and changes in examination predictive validity after the change. Analysis of data of MRCP(UK) Part 1 exam from 2003 to 2013 and Part 2 exam from 2005 to 2013. Inspection suggested that Part 1 pass rates were stable after the introduction of statistical equating, but showed greater annual variation probably due to stronger candidates taking the examination earlier. Pass rates seemed to have increased in non-UK graduates after equating was introduced, but was not associated with any changes in DIF after statistical equating. Statistical modelling of the pass rates for non-UK graduates found that pass rates, in both Part 1 and Part 2, were increasing year on year, with the changes probably beginning before the introduction of equating. The predictive validity of Part 1 for Part 2 was higher with statistical equating than with the previous hybrid Angoff/Hofstee method, confirming the utility of IRT-based statistical equating. Statistical equating was successfully introduced into the MRCP(UK) Part 1 and Part 2 written examinations, resulting in higher predictive validity than the previous Angoff/Hofstee standard setting. Concerns about an artefactual increase in pass rates for non-UK candidates after equating were shown not to be well-founded. Most likely the changes resulted from a genuine increase in candidate ability, albeit for reasons which remain unclear, coupled with a cognitive illusion giving the impression of a step-change immediately after equating began. Statistical equating provides a robust standard-setting method, with a better theoretical foundation than judgemental techniques such as Angoff, and is more straightforward and requires far less examiner time to provide a more valid result. The present study provides a detailed case study of introducing statistical equating, and issues which may need to be considered with its introduction.
Hecimovich, Mark; Marais, Ida
2017-06-26
Awareness of sport-related concussion (SRC) is an essential step in increasing the number of athletes or parents who report on SRC. This awareness is important, as there is no established data on medical care at youth-level sports and may be limited to individuals with only first aid training. In this circumstance, aside from the coach, it is the players and their parents who need to be aware of possible signs and symptoms. The aim of this study was to examine the psychometric properties of a parent and player concussion survey intended for use before and after an education campaign regarding SRC. 1441 questionnaires were received from parents and 284 questionnaires from players. The responses to the sixteen-item section of the questionnaire's 'recognition of signs and symptoms' were submitted to psychometric analysis using the dichotomous and polytomous Rasch model via the Rasch Unidimensional Measurement Model software RUMM2030. The Rasch model of Modern Test Theory can be considered a refinement of, or advance on, traditional analyses of an instrument's psychometric properties. The main finding is that these sixteen items measure two factors: items that are symptoms of concussion and items that are not symptoms of concussion. Parents and athletes were able to identify most or all of the symptoms, but were not as good at distinguishing symptoms that are not symptoms of concussion. Analyzing these responses revealed differential item functioning for parents and athletes on non-symptom items. When the DIF was resolved a significant difference was found between parents and athletes. The main finding is that the items measure two 'dimensions' in concussion symptom recognition. The first dimension consists of those items that are symptoms of concussion and the second dimension of those items that are not symptoms of concussion. Parents and players were able to identify most or all of the symptoms of concussion, so one would not expect to pick up any positive change on these items after an education campaign. Parents and players were not as good at distinguishing symptoms that are not symptoms of concussion. It is on these items that one may possibly expect improvement to manifest, so to evaluate the effectiveness of an education campaign it would pay to look for improvement in distinguishing symptoms that are not symptoms of concussion.
Deng, Yan; Guo, Sheng-lan; Su, Hong-yue; Wang, Qian; Tan, Zhen; Wu, Ji; Zhang, Di
2015-02-01
This study evaluated the feasibility of assessing left atrium (LA) function and asynchrony in patients with rheumatic mitral stenosis (MS) before and immediately after percutaneous balloon mitral valvuloplasty (PBMV) by real time three-dimensional echocardiography (RT3DE). Thirty patients with rheumatic MS who underwent PBMV and 30 controls were enrolled. RT3DE was used to measure LA volume and function, the standard deviation of time to the minimal systolic volume divided into 16 segments, 12 segments, or 6 segments (Tmsv 16-SD, Tmsv 12-SD, Tmsv 6-SD), and the maximum differences (Tmsv 16-Dif, 12-Dif, 6-Dif) in RT3DE derived values in MS patients before and 2 days after PBMV were obtained and compared with those of normal controls. The associations between the LA asynchrony and heart volume, function, mitral valve area (MVA), maximum mitral valve gradient (MVGmax ), mean mitral valve gradient (MVGmean), and mean LA pressure (MLAP) were investigated. Left atrium asynchrony indexes were significantly larger, and LA function parameters were significantly lower in the MS group than in the controls (P < 0.05 for all). Of all the LA asynchrony indexes, LA Tmsv16-SD was most significantly correlated with the LA volume and function parameters, MVGmax , MVGmean , and MLAP (P < 0.05 for all). LA asynchrony indexes and LA volume significantly deceased, and LA function significantly increased post-PBMV (P < 0.05). Real time three-dimensional echocardiography is a reliable and reproducible method to quantify LA function and asynchrony. RT3DE revealed a significant, early improvement in LA function and asynchrony in MS patients after PBMV. © 2014, Wiley Periodicals, Inc.
Kawaharada, Ritsuko; Nakamura, Akio; Takahashi, Katsunori; Kikuchi, Haruhisa; Oshima, Yoshiteru; Kubohara, Yuzuru
2016-06-15
Differentiation-inducing factor 1 (DIF-1), originally discovered in the cellular slime mold Dictyostelium discoideum, and its derivatives possess pharmacological activities, such as the promotion of glucose uptake in non-transformed mammalian cells in vitro. Accordingly, DIFs are considered promising lead candidates for novel anti-diabetic drugs. The aim of this study was to assess the anti-diabetic and toxic effects of DIF-1 in mouse 3T3-L1 fibroblast cells in vitro and in diabetic rats in vivo. Main methods We investigated the in vitro effects of DIF-1 and DIF-1(3M), a derivative of DIF-1, on glucose metabolism in 3T3-L1 cells by using capillary electrophoresis time-of-flight mass spectrometry (CE-TOF-MS). We also examined the effects of DIF-1 on blood glucose levels in streptozotocin (STZ)-induced rats. CE-TOF-MS revealed that 20μM DIF-1 and 20μM DIF-1(3M) promoted glucose uptake and metabolism in 3T3-L1 cells. Oral administration of DIF-1 (30mg/kg) significantly lowered basal blood glucose levels in STZ-treated rats and promoted a decrease in blood glucose levels after oral glucose loading (2.5g/kg) in the rats. In addition, daily oral administration of DIF-1 (30mg/kg/day) for 1wk significantly lowered the blood glucose levels in STZ-treated rats but did not affect their body weight and caused only minor alterations in the levels of other blood analytes. These results indicate that DIF-1 may be a good lead compound for the development of anti-diabetic drugs. Copyright © 2016 Elsevier Inc. All rights reserved.
GAMSOR: Gamma Source Preparation and DIF3D Flux Solution
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, M. A.; Lee, C. H.; Hill, R. N.
2016-12-15
Nuclear reactors that rely upon the fission reaction have two modes of thermal energy deposition in the reactor system: neutron absorption and gamma absorption. The gamma rays are typically generated by neutron absorption reactions or during the fission process which means the primary driver of energy production is of course the neutron interaction. In conventional reactor physics methods, the gamma heating component is ignored such that the gamma absorption is forced to occur at the gamma emission site. For experimental reactor systems like EBR-II and FFTF, the placement of structural pins and assemblies internal to the core leads to problemsmore » with power heating predictions because there is no fission power source internal to the assembly to dictate a spatial distribution of the power. As part of the EBR-II support work in the 1980s, the GAMSOR code was developed to assist analysts in calculating the gamma heating. The GAMSOR code is a modified version of DIF3D and actually functions within a sequence of DIF3D calculations. The gamma flux in a conventional fission reactor system does not perturb the neutron flux and thus the gamma flux calculation can be cast as a fixed source problem given a solution to the steady state neutron flux equation. This leads to a sequence of DIF3D calculations, called the GAMSOR sequence, which involves solving the neutron flux, then the gamma flux, then combining the results to do a summary edit. In this manuscript, we go over the GAMSOR code and detail how it is put together and functions. We also discuss how to setup the GAMSOR sequence and input for each DIF3D calculation in the GAMSOR sequence. With the GAMSOR capability, users can take any valid steady state DIF3D calculation and compute the power distribution due to neutron and gamma heating. The MC2-3 code is the preferable companion code to use for generating neutron and gamma cross section data, but the GAMSOR code can accept cross section data from other sources. To further this aspect, an additional utility code was created which demonstrates how to merge the neutron and gamma cross section data together to carry out a simultaneous solve of the two systems.« less
Sierakowska, Matylda; Sierakowski, Stanisław; Sierakowska, Justyna; Horton, Mike; Ndosi, Mwidimi
2015-03-01
To undertake cross-cultural adaptation and validation of the educational needs assessment tool (ENAT) for use with people with rheumatoid arthritis (RA) and systemic sclerosis (SSc) in Poland. The study involved two main phases: (1) cross-cultural adaptation of the ENAT from English into Polish and (2) Cross-cultural validation of Polish Educational Needs Assessment Tool (Pol-ENAT). The first phase followed an established process of cross-cultural adaptation of self-report measures. The second phase involved completion of the Pol-ENAT by patients and subjecting the data to Rasch analysis to assess the construct validity, unidimensionality, internal consistency and cross-cultural invariance. An adequate conceptual equivalence was achieved following the adaptation process. The dataset for validation comprised a total of 278 patients, 237 (85.3 %) of which were female. In each disease group (145, RA and 133, SSc), the 7 domains of the Pol-ENAT were found to fit the Rasch model, X (2)(df) = 16.953(14), p = 0.259 and 8.132(14), p = 0.882 for RA and SSc, respectively. Internal consistency of the Pol-ENAT was high (patient separation index = 0.85 and 0.89 for SSc and RA, respectively), and unidimensionality was confirmed. Cross-cultural differential item functioning (DIF) was detected in some subscales, and DIF-adjusted conversion tables were calibrated to enable cross-cultural comparison of data between Poland and the UK. Using a standard process in cross-cultural adaptation, conceptual equivalence was achieved between the original (UK) ENAT and the adapted Pol-ENAT. Fit to the Rasch model, confirmed that the construct validity, unidimensionality and internal consistency of the ENAT have been preserved.
Reliability and validity of the Haitian Creole PHQ-9.
Marc, Linda G; Henderson, Whitney R; Desrosiers, Astrid; Testa, Marcia A; Jean, Samuel E; Akom, Eniko Edit
2014-12-01
There is limited information on depression in Haitians and this is partly attributable to the absence of culturally and linguistically adapted measures for depression. To perform a psychometric evaluation of the Haitian-Creole version of the PHQ-9 administered to men who have sex with men (MSM) in the Republic of Haiti. This study uses a cross-sectional design and data are from the Integrated Behavioral and Biological HIV Survey (IBBS) for MSM in Haiti. Inclusion criteria required that participants be male, ≥ 18 years, report sexual relations with a male partner in the last 12 months, and lived in Haiti during the past 3 months. Respondent Driven Sampling was used for participant recruitment. A structured questionnaire was verbally administered in Haitian-Creole capturing information on sociodemographics, sexual behaviors, human immunodeficiency virus (HIV) status and depressive symptomatology using the PHQ-9. Psychometric analyses of the translated PHQ-9 assessed unidimensionality, factor structure, reliability, construct validity, and differential item functioning (DIF) across subgroups (age, educational level, sexual orientation and HIV status). In a study population of 1,028 MSM, the Haitian-Creole version of the PHQ-9 is unidimensional, has moderately high internal consistency reliability (α = 0.78), and shows evidence of construct validity where HIV-positive subjects have greater depression (p = 0.002). There is no evidence of DIF across age, education, sexual orientation or HIV status. HIV-positive MSM are twice as likely to screen positive for moderately severe and severe depressive symptoms compared to their HIV-negative counterparts. There is strong evidence for the psychometric adequacy of the translated PHQ-9 screening tool as a measure of depression with MSM in Haiti. Future research is necessary to examine the predictive validity of depression for subsequent health behaviors or clinical outcomes among Haitian MSM.
Validation of the Headache Impact Test (HIT-6) in patients with chronic migraine.
Rendas-Baum, Regina; Yang, Min; Varon, Sepideh F; Bloudek, Lisa M; DeGryse, Ronald E; Kosinski, Mark
2014-08-01
The Headache Impact Test (HIT)-6 was developed and has been validated in patients with various types of headache. The objective of this study was to report the psychometric properties of the HIT-6 among patients with chronic migraine. Data came from two international, multicenter, randomized, double-blind, placebo-controlled clinical trials of chronic migraine patients (N = 1,384) undergoing prophylaxis therapy. Confirmatory factor analysis and differential item functioning (DIF) analysis were used to test the latent structure and cross-cultural comparability of the HIT-6. Reliability, construct validity, and responsiveness were assessed. Two sets of criterion groups were used: (1) 28-day headache frequency: <10, 10-14, and ≥15 days; (2) sample quartiles of the total cumulative hours of headache: <140, 140 to <280, 280 to <420, and ≥420 hours. Two sets of responsiveness categories were defined as reduction of <30%, 30% to <50%, or ≥50% in (1) number of headache days and (2) cumulative hours of headache. Measurement invariance tests supported the stability of the HIT-6 latent structure across studies. DIF analysis supported cross-cultural comparability. Good reliability was observed across studies (Cronbach's α: 0.75-0.92; intraclass correlation coefficient: 0.76-0.80). HIT-6 scores correlated strongly (-0.86 to -0.59) with scores of the Migraine-Specific Quality-of-Life Questionnaire. Analysis of variance indicated that HIT-6 scores discriminated across both types of criterion groups (P<0.001), across studies and time points. HIT-6 change scores were significantly higher in magnitude in groups experiencing greater improvement (P<0.001). All measurement properties were consistently verified across the two studies, supporting the validity of the HIT-6 among chronic migraine patients. NCT00156910 and NCT00168428 on www.ClinicalTrials.gov.
Hsu, Ya-Fen; Chen, Po-Fei; Lung, For-Wey
2013-05-01
There is substantial overlap between deliberate self-harm (DSH) and intention to suicide (ITS), although the psychopathologies and motivations behind these behaviors are distinctly different. The purpose of this study was to investigate (i) the pathway relationship among parental bonding, personality characteristics, and alexithymic traits, and (ii) the association of these features with ITS and DSH using structural equation modeling to determine the risks and protective factors for these behaviors. Sixty-nine first-time DSH and 36 first-time ITS patients without medical or psychiatric illnesses, and 66 controls were recruited. The Parental Bonding Inventory (PBI), Eysenck Personality Questionnaire (EPQ), 20-item Toronto Alexithymia Scale (TAS-20), and the Chinese Health Questionnaire (CHQ) were filled out by the participants. Our structural equation models showed that parental bonding had the greatest influence on the development of DSH behavior in patients. On the other hand, participants who were younger, less extraverted, with a greater extent of the alexithymic trait of difficulty identifying feeling (DIF), and a worse mental health condition, were more likely to develop ITS behavior. Males were more likely than females to develop the alexithymic trait of DIF. Although there are many covariates that affect both ITS and DSH behaviors, these covariates may have different functions in the development of these behaviors, thus revealing the psychopathological difference between DSH and ITS. Policymakers should consider these differences and build intervention and prevention programs for gender- and age-specific high-risk groups to target the differences, with a focus on family counseling to treat DSH and a focus on attempting to increase emotional awareness to treat ITS.
Rasch Analysis of the Malaysian Secondary School Student Leadership Inventory (M3SLI).
Ling, Mei-Teng
The importance of instilling leadership skills in students has always been a main subject of discussion in Malaysia. Malaysian Secondary School Students Leadership Inventory (M3SLI) is an instrument which has been piloted tested in year 2013. The main purpose of this study is to examine and optimize the functioning of the rating scale categories in M3SLI by investigating the rating scale category counts, average and expected rating scale category measures, and steps calibrations. In detail, the study was aimed to (1) identify whether the five-point rating scale was functioning as intended and (2) review the effect of a rating scale category revision on the psychometric characteristics of M3SLI. The study was carried out on students aged between 13 to 18 years (2183 students) by stratified random sampling in 26 public schools in Sabah, Malaysia, with the results analysed using Winsteps. This study found that the rating scale of Personality and Values constructs needed to be modified while the scale for Leadership Skills was maintained. For future studies, other aspects of psychometric properties like differential item functioning (DIF) based on demographic variables such as gender, school locations and forms should be researched on prior to the use of the instrument.
NASA Technical Reports Server (NTRS)
Ritz, Scott
2018-01-01
A brief status update on NASA’s latest Global Change Master Directory (GCMD) keyword update, description of the differences between DIF-9 and DIF-10 formats in advance of the deprecation of DIF-9 support in Earth Observing System Data and Information System (EOSDIS) with specifics on the DIF-10.3 schema, transition schedule, and some usage metrics for the GCMD Southern Ocean Observing System (SOOS) Portal.
NASA Astrophysics Data System (ADS)
Fernandes, E. C.; Norbu, C.; Juizo, D.; Wangdi, T.; Richey, J. E.
2011-12-01
Landscapes, watersheds, and their downstream coastal and lacustrine zones are facing a series of challenges critical to their future, centered on the availability and distribution of water. Management options cover a range of issues, from bringing safe water to local villages for the rural poor, developing adaptation strategies for both rural and urban populations and large infrastructure, and sustaining environmental flows and ecosystem services needed for natural and human-dominated ecosystems. These targets represent a very complex set of intersecting issues of scale, cross-sector science and technology, education, politics, and economics, and the desired sustainable development is closely linked to how the nominally responsible governmental Ministries respond to the information they have. In practice, such information and even perspectives are virtually absent, in much of the developing world. A Dynamic Information Framework (DIF) is being designed as a knowledge platform whereby decision-makers in information-sparse regions can consider rigorous scenarios of alternative futures and obtain decision support for complex environmental and economic decisions is essential. The DIF is geospatial gateway, with functional components of base data layers, directed data layers focused on synthetic objectives, geospatially-explicit, process-based, cross-sector simulation models (requiring data from the directed data layers), and facilitated input/output (including visualizations), and decision support system and scenario testing capabilities. A fundamental aspect to a DIF is not only the convergence of multi-sector information, but how that information can be (a) integrated (b) used for robust simulations and projections, and (c) conveyed to policymakers and stakeholders, in the most compelling, and visual, manner. Examples are given of emerging applications. The ZambeziDIF was used to establish baselines for agriculture, biodiversity, and water resources in the lower Zambezi valley of Mozambique. The DrukDIF for Bhutan is moving from a test-of-concept to an operational phase, with uses from extending local biodiversity to computing how much energy can be sold tomorrow, based on waterflows today. AralDIF is being developed to serve as a neutral and transparent platform, as a catalyst for open and transparent discussion on water and energy linkages, for central Asia. ImisoziDIF is now being ramped up in Rwanda, to help guide scaling up of agricultural practices and biodiversity from sites to the country. The Virtual Mekong Basin, "tells the story" of the multiple issues facing the Mekong Basin.
Arioka, Masaki; Takahashi-Yanaga, Fumi; Kubo, Momoko; Igawa, Kazunobu; Tomooka, Katsuhiko; Sasaguri, Toshiyuki
2017-08-15
Differentiation-inducing factor-1 (DIF-1) isolated from Dictyostelium discoideum strongly inhibits the proliferation of various mammalian cells through the activation of glycogen synthase kinase-3 (GSK-3). To evaluate DIF-1 as a novel anti-cancer agent for malignant melanoma, we examined whether DIF-1 has anti-proliferative, anti-migratory, and anti-invasive effects on melanoma cells using in vitro and in vivo systems. DIF-1 reduced the expression levels of cyclin D1 and c-Myc by facilitating their degradation via GSK-3 in mouse (B16BL6) and human (A2058) malignant melanoma cells, and thereby strongly inhibited their proliferation. DIF-1 suppressed the canonical Wnt signaling pathway by lowering the expression levels of transcription factor 7-like 2 and β-catenin, key transcription factors in this pathway. DIF-1 also inhibited cell migration and invasion, reducing the expression of matrix metalloproteinase-2; however, this effect was not dependent on GSK-3 activity. In a mouse lung tumor formation model, repeated oral administrations of DIF-1 markedly reduced melanoma colony formation in the lung. These results suggest that DIF-1 inhibits cell proliferation by a GSK-3-dependent mechanism and suppresses cell migration and invasion by a GSK-3-independent mechanism. Therefore, DIF-1 may have a potential as a novel anti-cancer agent for the treatment of malignant melanoma. Copyright © 2017 Elsevier Inc. All rights reserved.
Graves, S W; Hopoate-Sitake, M; Johnston, A; Buckalew, V; Lam, G; Mason, L; Adair, D
2012-07-01
A double blinded placebo controlled clinical trial of a commercial digoxin immune Fab fragment (DIF) in preeclamptic (PE) women provided some benefit to treated subjects (1). In that study DIF, relative to placebo, prevented a decline in CrCl and lowered levels of endogenous digitalis-like factor (EDLF) activity as measured by sodium pump inhibition (SPI). However, some PE subjects had undetectable EDLF. The hypothesis tested was that only PE women with measureable EDLF would respond to DIF treatment and analysis of EDLF positive women might reveal treatment effects masked by inclusion of EDLF negative, and hence non-responding, PE women. Accordingly, analyses of DIF effects in EDLF positive PE women were conducted. Patient characteristics and study design have been published (1). In these subanalyses, subjects were considered to be EDLF positive if their plasma inhibited red cell sodium pump mediated Rb uptake. All analyses were redone for the EDLF positive subgroup by Covance Inc as in the original trial. Continuous data were analyzed by ANCOVA. Categorical data were analyzed by Barnard Exact Test. 45 subjects (23 DIF, 22 placebo) had baseline SPI evaluated. Of these 22% had undetectable SPI. EDLF positive PE women showed greater and more significant reductions of SPI in response to DIF at each time point (12, 24, 48 hr treatment) than in the original analysis. Subjects with undetectable EDLF showed no significant change in response to DIF or placebo. For CrCl, EDLF positive PE women showed greater and more significant preservation of CrCl compared with original analyses. Subjects absent EDLF showed deterioration of CrCl with or without DIF. Among EDLF positive PE women DIF treated women had significantly less maternal pulmonary edema (p=0.035) and significantly less intraventricular hemorrhage in their infants (p=0.015). There was the suggestion of reductions in the incidence of other maternal and neonatal abnormalities. These data indicate that EDLF positive PE women are those that responded to DIF and also raise the possibility of extended benefits of DIF treatment in this group. Results support further research in this area. Copyright © 2012. Published by Elsevier B.V.
Cross-cultural validation of the 20-item Toronto Alexithymia Scale in Chinese adolescents.
Ling, Y; Zeng, Y; Yuan, H; Zhong, M
2016-04-01
WHAT IS KNOWN ON THE SUBJECT?: The TAS-20 is the most widely used self-reported questionnaire to assess the level of alexithymia in students and community and clinical samples. WHAT THIS PAPER ADDS TO EXISTING KNOWLEDGE?: The TAS-20-C exhibited high levels of reliability and validity, indicating that it is appropriate for the assessment of alexithymia in Chinese adolescents. WHAT ARE THE IMPLICATIONS FOR PRACTICE?: Screening adolescents who are at risk of alexithymia through the TAS-20 could help to perform necessary and effective precautions to decrease the adverse effects of alexithymia, such as the risks of developing depressive mood and behavioral problems. Purpose The aim of this study was to examine the psychometric properties of the Chinese version of the 20-item Toronto Alexithymia Scale (TAS-20-C) in a sample of Chinese adolescents. Method Adolescents (n = 1260) recruited from three schools in mainland China completed the TAS-20-C, the somatization subscale of the Symptom Checklist 90 (SCL-90) and Center for Epidemiological Studies Depression Scale (CES-D). Five different factorial models of the TAS-20 were tested using confirmatory factor analysis (CFA). Cronbach's α, mean inter-item correlations and predictive validity were also evaluated. Results Among those five different factorial models, the four-factor structure model was suitable and invariant across gender and age in this sample. The TAS-20-C demonstrated adequate internal reliability. Gender and age accounted for insignificant amounts of variability in total TAS-20-C and factor scores. TAS-20-C total and subscale scores were correlated significantly with SCL-90 somatization subscale and CES-D. Girls scored higher than boys on difficulty identifying feelings (DIF) and pragmatic thinking (PR) subscales. DIF and lack of subjective significance or importance of emotions (IMs) subscale scores were higher among younger than among middle and older adolescents. Implications for Practice Validating the TAS-20 in adolescents is quite important to use it in evaluating adolescents' alexithymia, and screen those at risk of alexithymia. © 2016 John Wiley & Sons Ltd.
Zachariae, Robert; O'Connor, Maja; Lassesen, Berit; Olesen, Martin; Kjær, Louise Binow; Thygesen, Marianne; Mørcke, Anne Mette
2015-09-15
Patient-centered communication is a core competency in modern health care and associated with higher levels of patient satisfaction, improved patient health outcomes, and lower levels of burnout among physicians. The objective of the present study was to develop a questionnaire assessing medical student and physician self-efficacy in patient-centeredness (SEPCQ) and explore its psychometric properties. A preliminary 88-item questionnaire (SEPCQ-88) was developed based on a review of the literature and medical student portfolios and completed by 448 medical students from Aarhus University. Exploratory Principal Component analysis resulted in a 27-item version (SEPCQ-27) with three underlying self-efficacy factors: 1) Exploring the patient perspective, 2) Sharing information and power, and 3) Dealing with communicative challenges. The SEPCQ-27 was completed by an independent sample of 291 medical students from 2 medical schools and 101 hospital physicians. Internal consistencies of total and subscales were acceptable for both students and physicians (Cronbach's alpha (range): 0.74-0.95). There were no overall indications of gender-related differential item function (DIF), and a Confirmatory Factor Analysis (CFA) indicated good fit (CFI = 0.98; NNFI = 0.98; RMSEA = 0.05; SRMR = 0.07). Responsiveness was indicated by increases in SEPCQ scores after a course in communication and peer-supervision (Cohen's d (range): 0.21 to 0.73; p: 0.053 to 0.001). Furthermore, positive associations were found between increases in SEPCQ-scores and course-related motivation to learn (medical students) and between SEPCQ scores and years of clinical experience (physicians). The final SEPCQ-27 showed satisfactory psychometric properties, and preliminary support was found for its construct validity, indicating that the SEPCQ-27 may be a valuable measure in future patient centered communication training and research.
Reliability and Validity of the Visual, Musculoskeletal, and Balance Complaints Questionnaire.
Lundqvist, Lars-Olov; Zetterlund, Christina; Richter, Hans O
2016-09-01
To evaluate the reliability and validity of the 15-item Visual, Musculoskeletal, and Balance Complaints Questionnaire (VMB) for people with visual impairments, using confirmatory factor analysis (CFA) and with Rasch analysis for use as an outcome measure. Two studies evaluated the VMB. In Study 1, VMB data were collected from 1249 out of 3063 individuals between 18 and 104 years old who were registered at a low vision center. CFA evaluated VMB factor structure and Rasch analysis evaluated VMB scale properties. In Study 2, a subsample of 52 individuals between 27 and 67 years old with visual impairments underwent further measurements. Visual clinical assessments, neck/scapular pain, and balance assessments were collected to evaluate the convergent validity of the VMB (i.e. the domain relationship with other, theoretically predicted measures). CFA supported the a priori three-factor structure of the VMB. The factor loadings of the items on their respective domains were all statistically significant. Rasch analysis indicated disordered categories and the original 10-point scale was subsequently replaced with a 5-point scale. Each VMB domain fitted the Rasch model, showing good metric properties, including unidimensionality (explained variances ≥66% and eigenvalues <1.9), person separation (1.86 to 2.29), reliability (0.87 to 0.94), item fit (infit MnSq's >0.72 and outfit MnSq's <1.47), targeting (0.30 to 0.50 logits), and insignificant differential item functioning (all DIFs but one <0.50 logits) from gender, age, and visual status. The three VMB domains correlated significantly with relevant visual, musculoskeletal, and balance assessments, demonstrating adequate convergent validity of the VMB. The VMB is a simple, inexpensive, and quick yet reliable and valid way to screen and evaluate concurrent visual, musculoskeletal, and balance complaints, with contribution to epidemiological and intervention research and potential clinical implications for the field of health services and low vision rehabilitation.
Reliability and Validity of the Visual, Musculoskeletal, and Balance Complaints Questionnaire
Lundqvist, Lars-Olov; Zetterlund, Christina; Richter, Hans O.
2016-01-01
ABSTRACT Purpose To evaluate the reliability and validity of the 15-item Visual, Musculoskeletal, and Balance Complaints Questionnaire (VMB) for people with visual impairments, using confirmatory factor analysis (CFA) and with Rasch analysis for use as an outcome measure. Methods Two studies evaluated the VMB. In Study 1, VMB data were collected from 1249 out of 3063 individuals between 18 and 104 years old who were registered at a low vision center. CFA evaluated VMB factor structure and Rasch analysis evaluated VMB scale properties. In Study 2, a subsample of 52 individuals between 27 and 67 years old with visual impairments underwent further measurements. Visual clinical assessments, neck/scapular pain, and balance assessments were collected to evaluate the convergent validity of the VMB (i.e. the domain relationship with other, theoretically predicted measures). Results CFA supported the a priori three-factor structure of the VMB. The factor loadings of the items on their respective domains were all statistically significant. Rasch analysis indicated disordered categories and the original 10-point scale was subsequently replaced with a 5-point scale. Each VMB domain fitted the Rasch model, showing good metric properties, including unidimensionality (explained variances ≥66% and eigenvalues <1.9), person separation (1.86 to 2.29), reliability (0.87 to 0.94), item fit (infit MnSq’s >0.72 and outfit MnSq’s <1.47), targeting (0.30 to 0.50 logits), and insignificant differential item functioning (all DIFs but one <0.50 logits) from gender, age, and visual status. The three VMB domains correlated significantly with relevant visual, musculoskeletal, and balance assessments, demonstrating adequate convergent validity of the VMB. Conclusions The VMB is a simple, inexpensive, and quick yet reliable and valid way to screen and evaluate concurrent visual, musculoskeletal, and balance complaints, with contribution to epidemiological and intervention research and potential clinical implications for the field of health services and low vision rehabilitation. PMID:27309524
Development of a Work Climate Scale in Emergency Health Services
Sanduvete-Chaves, Susana; Lozano-Lozano, José A.; Chacón-Moscoso, Salvador; Holgado-Tello, Francisco P.
2018-01-01
An adequate work climate fosters productivity in organizations and increases employee satisfaction. Workers in emergency health services (EHS) have an extremely high degree of responsibility and consequent stress. Therefore, it is essential to foster a good work climate in this context. Despite this, scales with a full study of their psychometric properties (i.e., validity evidence based on test content, internal structure and relations to other variables, and reliability) are not available to measure work climate in EHS specifically. For this reason, our objective was to develop a scale to measure the quality of work climates in EHS. We carried out three studies. In Study 1, we used a mixed-method approach to identify the latent conceptual structure of the construct work climate. Thus, we integrated the results found in (a) a previous study, where a content analysis of seven in-depth interviews obtained from EHS professionals in two hospitals in Gibraltar Countryside County was carried out; and (b) the factor analysis of the responses given by 113 EHS professionals from these same centers to 18 items that measured the work climate in health organizations. As a result, we obtained 56 items grouped into four factors (work satisfaction, productivity/achievement of aims, interpersonal relationships, and performance at work). In Study 2, we presented validity evidence based on test content through experts' judgment. Fourteen experts from the methodology and health fields evaluated the representativeness, utility, and feasibility of each of the 56 items with respect to their factor (theoretical dimension). Forty items met the inclusion criterion, which was to obtain an Osterlind index value greater than or equal to 0.5 in the three aspects assessed. In Study 3, 201 EHS professionals from the same centers completed the resulting 40-item scale. This new instrument produced validity evidence based on the internal structure in a second-order factor model with four components (RMSEA = 0.079, GFI = 0.97, AGFI = 0.97, CFI = 0.97; NFI = 0.95, and NNFI = 0.97); absence of Differential Item Functioning (DIF) in 80% of the items; reliability (α = 0.96); and validity evidence based on relations to other variables, specifically the test-criterion relationship (ρ = 0.680). Finally, we discuss further developments of the instrument and its possible implications for EHS workers. PMID:29403417
Development of a Work Climate Scale in Emergency Health Services.
Sanduvete-Chaves, Susana; Lozano-Lozano, José A; Chacón-Moscoso, Salvador; Holgado-Tello, Francisco P
2018-01-01
An adequate work climate fosters productivity in organizations and increases employee satisfaction. Workers in emergency health services (EHS) have an extremely high degree of responsibility and consequent stress. Therefore, it is essential to foster a good work climate in this context. Despite this, scales with a full study of their psychometric properties (i.e., validity evidence based on test content, internal structure and relations to other variables, and reliability) are not available to measure work climate in EHS specifically. For this reason, our objective was to develop a scale to measure the quality of work climates in EHS. We carried out three studies. In Study 1, we used a mixed-method approach to identify the latent conceptual structure of the construct work climate . Thus, we integrated the results found in (a) a previous study, where a content analysis of seven in-depth interviews obtained from EHS professionals in two hospitals in Gibraltar Countryside County was carried out; and (b) the factor analysis of the responses given by 113 EHS professionals from these same centers to 18 items that measured the work climate in health organizations. As a result, we obtained 56 items grouped into four factors (work satisfaction, productivity/achievement of aims, interpersonal relationships, and performance at work). In Study 2, we presented validity evidence based on test content through experts' judgment. Fourteen experts from the methodology and health fields evaluated the representativeness, utility, and feasibility of each of the 56 items with respect to their factor (theoretical dimension). Forty items met the inclusion criterion, which was to obtain an Osterlind index value greater than or equal to 0.5 in the three aspects assessed. In Study 3, 201 EHS professionals from the same centers completed the resulting 40-item scale. This new instrument produced validity evidence based on the internal structure in a second-order factor model with four components ( RMSEA = 0.079, GFI = 0.97, AGFI = 0.97, CFI = 0.97; NFI = 0.95, and NNFI = 0.97); absence of Differential Item Functioning (DIF) in 80% of the items; reliability (α = 0.96); and validity evidence based on relations to other variables, specifically the test-criterion relationship (ρ = 0.680). Finally, we discuss further developments of the instrument and its possible implications for EHS workers.
A novel, broad-range, CTXΦ-derived stable integrative expression vector for functional studies.
Das, Bhabatosh; Kumari, Reena; Pant, Archana; Sen Gupta, Sourav; Saxena, Shruti; Mehta, Ojasvi; Nair, Gopinath Balakrish
2014-12-01
CTXΦ, a filamentous vibriophage encoding cholera toxin, uses a unique strategy for its lysogeny. The single-stranded phage genome forms intramolecular base-pairing interactions between two inversely oriented XerC and XerD binding sites (XBS) and generates a functional phage attachment site, attP(+), for integration. The attP(+) structure is recognized by the host-encoded tyrosine recombinases XerC and XerD (XerCD), which enables irreversible integration of CTXΦ into the chromosome dimer resolution site (dif) of Vibrio cholerae. The dif site and the XerCD recombinases are widely conserved in bacteria. We took advantage of these conserved attributes to develop a broad-host-range integrative expression vector that could irreversibly integrate into the host chromosome using XerCD recombinases without altering the function of any known open reading frame (ORF). In this study, we engineered two different arabinose-inducible expression vectors, pBD62 and pBD66, using XBS of CTXΦ. pBD62 replicates conditionally and integrates efficiently into the dif of the bacterial chromosome by site-specific recombination using host-encoded XerCD recombinases. The expression level of the gene of interest could be controlled through the PBAD promoter by modulating the functions of the vector-encoded transcriptional factor AraC. We validated the irreversible integration of pBD62 into a wide range of pathogenic and nonpathogenic bacteria, such as V. cholerae, Vibrio fluvialis, Vibrio parahaemolyticus, Escherichia coli, Salmonella enterica, and Klebsiella pneumoniae. Gene expression from the PBAD promoter of integrated vectors was confirmed in V. cholerae using the well-studied reporter genes mCherry, eGFP, and lacZ. Copyright © 2014, American Society for Microbiology. All Rights Reserved.
Personality disparity in chronic regional and widespread pain.
Chang, Mei-Chung; Chen, Po-Fei; Lung, For-Wey
2017-08-01
Chronic pain has high comorbidity with psychiatric disorders, therefore, better understanding of the relationship between chronic pain and mental illness is needed. This study aimed to investigate the pathway relationships among parental attachment, personality characteristics, alexithymic trait and mental health in patients with chronic widespread pain, those with chronic regional pain, and controls. Two hundred and thirty participants were recruited. The parental Bonding Inventory, Eysenck Personality Inventory (EPI), 20-item Toronto Alexithymia Scale (TAS-20), Chinese Health Questionnaire, and Short-Form 36 were filled out. The pathway relationships revealed that patients of mothers who were more protective were more neurotic, had more difficulty identifying feelings (DIF), worse mental health, and a higher association with chronic widespread pain. No differences were found between patients with chronic regional pain and the controls. The predisposing factors for chronic widespread pain, when compared with chronic regional pain, may be more closely related to psychiatric disorders. The pathways to chronic regional pain and chronic widespread pain differ, with neuroticism and the alexithymic DIF trait being the main factors defining chronic widespread pain. Therefore, besides therapies targeting pain symptoms, psychiatric consultation, medication and psychotherapy are also recommended for those with chronic widespread pain to alleviate their mental health conditions. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
Collaborative emitter tracking using Rao-Blackwellized random exchange diffusion particle filtering
NASA Astrophysics Data System (ADS)
Bruno, Marcelo G. S.; Dias, Stiven S.
2014-12-01
We introduce in this paper the fully distributed, random exchange diffusion particle filter (ReDif-PF) to track a moving emitter using multiple received signal strength (RSS) sensors. We consider scenarios with both known and unknown sensor model parameters. In the unknown parameter case, a Rao-Blackwellized (RB) version of the random exchange diffusion particle filter, referred to as the RB ReDif-PF, is introduced. In a simulated scenario with a partially connected network, the proposed ReDif-PF outperformed a PF tracker that assimilates local neighboring measurements only and also outperformed a linearized random exchange distributed extended Kalman filter (ReDif-EKF). Furthermore, the novel ReDif-PF matched the tracking error performance of alternative suboptimal distributed PFs based respectively on iterative Markov chain move steps and selective average gossiping with an inter-node communication cost that is roughly two orders of magnitude lower than the corresponding cost for the Markov chain and selective gossip filters. Compared to a broadcast-based filter which exactly mimics the optimal centralized tracker or its equivalent (exact) consensus-based implementations, ReDif-PF showed a degradation in steady-state error performance. However, compared to the optimal consensus-based trackers, ReDif-PF is better suited for real-time applications since it does not require iterative inter-node communication between measurement arrivals.
A Multiple Indicators Multiple Causes (MIMIC) model of internal barriers to drug treatment in China.
Qi, Chang; Kelly, Brian C; Liao, Yanhui; He, Haoyu; Luo, Tao; Deng, Huiqiong; Liu, Tieqiao; Hao, Wei; Wang, Jichuan
2015-03-01
Although evidence exists for distinct barriers to drug abuse treatment (BDATs), investigations of their inter-relationships and the effect of individual characteristics on the barrier factors have been sparse, especially in China. A Multiple Indicators Multiple Causes (MIMIC) model is applied for this target. A sample of 262 drug users were recruited from three drug rehabilitation centers in Hunan Province, China. We applied a MIMIC approach to investigate the effect of gender, age, marital status, education, primary substance use, duration of primary drug use, and drug treatment experience on the internal barrier factors: absence of problem (AP), negative social support (NSS), fear of treatment (FT), and privacy concerns (PC). Drug users of various characteristics were found to report different internal barrier factors. Younger participants were more likely to report NSS (-0.19, p=0.038) and PC (-0.31, p<0.001). Compared to other drug users, ice users were more likely to report AP (0.44, p<0.001) and NSS (0.25, p=0.010). Drug treatment experiences related to AP (0.20, p=0.012). In addition, differential item functioning (DIF) occurred in three items when participant from groups with different duration of drug use, ice use, or marital status. Individual characteristics had significant effects on internal barriers to drug treatment. On this basis, BDAT perceived by different individuals could be assessed before tactics were utilized to successfully remove perceived barriers to drug treatment. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Audenaert, J; Vangansbeke, D; Verhoeven, R; De Clercq, P; Tirry, L; Gobin, B
2014-01-01
Predatory mites like Phytoseiulus persimilis Athias-Henriot, Neoseiulus californicus McGregor and N. fallacis (Garman) (Acari: Phytoseiidae) are essential in sustainable control strategies of the two-spotted spider mite Tetranychus urticae Koch (Acari: Tetranychidae) in warm greenhouse cultures to complement imited available pesticides and to tackle emerging resistance. However, in response to high energy prices, greenhouse plant breeders have recently changed their greenhouse steering strategies, allowing more variation in temperature and humidity. The impact of these variations on biological control agents is poorly understood. Therefore, we constructed functional response models to demonstrate the impact of realistic climate variations on predation efficiency. First, two temperature regimes were compared at constant humidity (70%) and photoperiod (16L:8D): DIF0 (constant temperature) and DIF15 (variable temperature with day-night difference of 15°C). At mean temperatures of 25°C, DIF15 had a negative influence on the predation efficiency of P. persimilis and N. californicus, as compared to DIF0. At low mean temperatures of 15°C, however, DIF15 showed a higher predation efficiency for P. persimilis and N. californicus. For N. fallacis no difference was observed at both 15°C and 25°C. Secondly, two humidity regimes were compared, at a mean temperature of 25°C (DIFO) and constant photoperiod (16L:8D): RHCTE (constant 70% humidity) and RHALT (alternating 40% L:70%D humidity). For P. persimilis and N. fallacis RHCTE resulted in a higher predation efficiency than RHALT, for N. californicus this effect was opposite. This shows that N. californicus is more adapted to dry climates as compared to the other predatory mites. We conclude that variable greenhouse climates clearly affect predation efficiency of P. persimilis, N. californicus and N. fallacis. To obtain optimal control efficiency, the choice of predatory mites (including dose and application frequency) should be adapted to the actual greenhouse climate.
Design of inquiry-oriented science labs: impacts on students' attitudes
NASA Astrophysics Data System (ADS)
Baseya, J. M.; Francis, C. D.
2011-11-01
Background: Changes in lab style can lead to differences in learning. Two inquiry-oriented lab styles are guided inquiry (GI) and problem-based (PB). Students' attitudes towards lab are important to consider when choosing between GI and PB styles during curriculum design. Purpose: We examined the degree to which lab experiences are explained by a GI or a PB lab style vs. students' attitudes towards specific aspects of the experience, reflected by perceived excitement (exc), difficulty (dif), time efficiency (eff) and association between lab and lecture material (help). Sample: Approximately 1000 students attending first-semester, college biology lab for science majors at the University of Colorado at Boulder, USA, participated in the study. Design and method: In 2007, two labs were run as GI and one as PB. Formats were switched in 2008. Attitudes were assessed with a post-semester survey. Results: Only the four attitude variables (not lab style) had a strong relationship with overall lab rating which was most strongly related to exc, followed by dif and help/eff. Dif and eff had the greatest influence on attitudes for or against GI vs. PB labs, and help and exc had little influence on a GI vs. a PB lab. Also, when dif was low, students' attitudes were not significantly different between PB and GI labs, but when dif was high, students' significantly rated GI labs higher than PB labs. Conclusions: Students' attitudes towards lab are more dependent on specific aspects of the experience than on lab style. Changes in GI vs. PB lab styles primarily influence dif and eff rather than exc and help. Dif may be an important factor to consider when implementing a lab in the PB vs. the GI format. It might be good to go with a GI when dif is high and a PB when dif is low.
Metadata and Service at the GFZ ISDC Portal
NASA Astrophysics Data System (ADS)
Ritschel, B.
2008-05-01
The online service portal of the GFZ Potsdam Information System and Data Center (ISDC) is an access point for all manner of geoscientific geodata, its corresponding metadata, scientific documentation and software tools. At present almost 2000 national and international users and user groups have the opportunity to request Earth science data from a portfolio of 275 different products types and more than 20 Million single data files with an added volume of approximately 12 TByte. The majority of the data and information, the portal currently offers to the public, are global geomonitoring products such as satellite orbit and Earth gravity field data as well as geomagnetic and atmospheric data for the exploration. These products for Earths changing system are provided via state-of-the art retrieval techniques. The data product catalog system behind these techniques is based on the extensive usage of standardized metadata, which are describing the different geoscientific product types and data products in an uniform way. Where as all ISDC product types are specified by NASA's Directory Interchange Format (DIF), Version 9.0 Parent XML DIF metadata files, the individual data files are described by extended DIF metadata documents. Depending on the beginning of the scientific project, one part of data files are described by extended DIF, Version 6 metadata documents and the other part are specified by data Child XML DIF metadata documents. Both, the product type dependent parent DIF metadata documents and the data file dependent child DIF metadata documents are derived from a base-DIF.xsd xml schema file. The ISDC metadata philosophy defines a geoscientific product as a package consisting of mostly one or sometimes more than one data file plus one extended DIF metadata file. Because NASA's DIF metadata standard has been developed in order to specify a collection of data only, the extension of the DIF standard consists of new and specific attributes, which are necessary for an explicit identification of single data files and the set-up of a comprehensive Earth science data catalog. The huge ISDC data catalog is realized by product type dependent tables filled with data file related metadata, which have relations to corresponding metadata tables. The product type describing parent DIF XML metadata documents are stored and managed in ORACLE's XML storage structures. In order to improve the interoperability of the ISDC service portal, the existing proprietary catalog system will be extended by an ISO 19115 based web catalog service. In addition to this development there is ISDC related concerning semantic network of different kind of metadata resources, like different kind of standardized and not-standardized metadata documents and literature as well as Web 2.0 user generated information derived from tagging activities and social navigation data.
Cross-cultural validity of four quality of life scales in persons with spinal cord injury
2010-01-01
Background Quality of life (QoL) in persons with spinal cord injury (SCI) has been found to differ across countries. However, comparability of measurement results between countries depends on the cross-cultural validity of the applied instruments. The study examined the metric quality and cross-cultural validity of the Satisfaction with Life Scale (SWLS), the Life Satisfaction Questionnaire (LISAT-9), the Personal Well-Being Index (PWI) and the 5-item World Health Organization Quality of Life Assessment (WHOQoL-5) across six countries in a sample of persons with spinal cord injury (SCI). Methods A cross-sectional multi-centre study was conducted and the data of 243 out-patients with SCI from study centers in Australia, Brazil, Canada, Israel, South Africa, and the United States were analyzed using Rasch-based methods. Results The analyses showed high reliability for all 4 instruments (person reliability index .78-.92). Unidimensionality of measurement was supported for the WHOQoL-5 (Chi2 = 16.43, df = 10, p = .088), partially supported for the PWI (Chi2 = 15.62, df = 16, p = .480), but rejected for the LISAT-9 (Chi2 = 50.60, df = 18, p = .000) and the SWLS (Chi2 = 78.54, df = 10, p = .000) based on overall and item-wise Chi2 tests, principal components analyses and independent t-tests. The response scales showed the expected ordering for the WHOQoL-5 and the PWI, but not for the other two instruments. Using differential item functioning (DIF) analyses potential cross-country bias was found in two items of the SWLS and the WHOQoL-5, three items of the LISAT-9 and four items of the PWI. However, applying Rasch-based statistical methods, especially subtest analyses, it was possible to identify optimal strategies to enhance the metric properties and the cross-country equivalence of the instruments post-hoc. Following the post-hoc procedures the WHOQOL-5 and the PWI worked in a consistent and expected way in all countries. Conclusions QoL assessment using the summary scores of the WHOQOL-5 and the PWI appeared cross-culturally valid in persons with SCI. In contrast, summary scores of the LISAT-9 and the SWLS have to be interpreted with caution. The findings of the current study can be especially helpful to select instruments for international research projects in SCI. PMID:20815864
12 CFR 327.52 - Annual dividend determination.
Code of Federal Regulations, 2010 CFR
2010-01-01
... the DIF reserve ratio as of December 31st of 2008 or any later year equals or exceeds 1.35 percent... dividend based upon the reserve ratio of the DIF as of December 31st of the preceding year, and the amount... ratio of the DIF equals or exceeds 1.35 percent of estimated insured deposits and does not exceed 1.50...
Development of novel DIF-1 derivatives that selectively suppress innate immune responses.
Nguyen, Van Hai; Kikuchi, Haruhisa; Kubohara, Yuzuru; Takahashi, Katsunori; Katou, Yasuhiro; Oshima, Yoshiteru
2015-08-01
The multiple pharmacological activities of differentiation-inducing factor-1 (DIF-1) of the cellular slime mold Dictyostelium discoideum led us to examine the use of DIF-1 as a 'drug template' to develop promising seed compounds for drug discovery. DIF-1 and its derivatives were synthesized and evaluated for their regulatory activities in innate immune responses. We found two new derivatives (4d and 5e) with highly selective inhibitory activities against production of the antimicrobial peptide attacin in Drosophila S2 cells and against production of interleukin-2 in Jurkat cells. Copyright © 2015 Elsevier Ltd. All rights reserved.
Chao, Wan-Tien; Lin, Yuan-Yao; Peng, Jin-Long; Huang, Chen-Bin
2014-02-15
Adiabatic soliton spectral compression in a dispersion-increasing fiber (DIF) with a linear dispersion ramp is studied both numerically and experimentally. The anticipated maximum spectral compression ratio (SCR) would be limited by the ratio of the DIF output to the input dispersion values. However, our numerical analyses indicate that SCR greater than the DIF dispersion ratio is feasible, provided the input pulse duration is shorter than a threshold value along with adequate pulse energy control. Experimentally, a SCR of 28.6 is achieved in a 1 km DIF with a dispersion ratio of 22.5.
ERIC Educational Resources Information Center
Tsaousis, Ioannis; Sideridis, Georgios; Al-Saawi, Fahad
2018-01-01
The aim of the present study was to examine Differential Distractor Functioning (DDF) as a means of improving the quality of a measure through understanding biased responses across groups. A DDF analysis could shed light on the potential sources of construct-irrelevant variance by examining whether the differential selection of incorrect choices…
Cannabis Problem Experiences Among Users of the Tobacco-Cannabis Combination Known As Blunts
Fairman, Brian J.
2015-01-01
Background In most of the world, cannabis smokers mix loose tobacco inside a joint, pipe, spliff, or cone. More recently, a ‘blunt’ formulation combines these two drugs by inserting cannabis into a hollowed-out cigar. Epidemiological research linking simultaneous use of these two drugs and the development of cannabis use disorders (CUD) remains unclear. This study estimates associations linking blunt smoking with levels and subtypes of cannabis problems. Methods Cross-sectional data on 27,767 past-year cannabis users were analyzed from the US National Survey on Drug Use and Health (NSDUH) conducted from 2009–2012. Ten self-reported items of DSM-IV CUD features elicited a single latent trait of cannabis problem (CP) severity, which was then regressed on past-year blunt smoking and past-month blunt frequency measures within the context of a conceptual model. Differential item functioning (DIF) analysis evaluated potential bias in CP feature response by blunt smoking history. Results Past-year blunt smoking was associated with higher CP severity compared to cannabis users who did not smoke blunts. Days of blunt smoking in the past month also predicted higher CP severity than less frequent blunt use. Those smoking blunts experienced more subjectively felt tolerance and having spent more time obtaining or using cannabis, but were less likely to experience other problems, even at the same level of CP severity. Conclusions These findings suggest smoking blunts might promote the development of problematic cannabis use. Responses to cannabis problems differed by history of blunt smoking, possibly implicating an influence of tobacco on measurement of cannabis use disorders. PMID:25746234
Cannabis problem experiences among users of the tobacco-cannabis combination known as blunts.
Fairman, Brian J
2015-05-01
In most of the world, cannabis smokers mix loose tobacco inside a joint, pipe, spliff, or cone. More recently, a 'blunt' formulation combines these two drugs by inserting cannabis into a hollowed-out cigar. Epidemiological research linking simultaneous use of these two drugs and the development of cannabis use disorders (CUD) remains unclear. This study estimates associations linking blunt smoking with levels and subtypes of cannabis problems. Cross-sectional data on 27,767 past-year cannabis users were analyzed from the US National Survey on Drug Use and Health (NSDUH) conducted from 2009 to 2012. Ten self-reported items of DSM-IV CUD features elicited a single latent trait of cannabis problem (CP) severity, which was then regressed on past-year blunt smoking and past-month blunt frequency measures within the context of a conceptual model. Differential item functioning (DIF) analysis evaluated potential bias in CP feature response by blunt smoking history. Past-year blunt smoking was associated with higher CP severity compared to cannabis users who did not smoke blunts. Days of blunt smoking in the past month also predicted higher CP severity than less frequent blunt use. Those smoking blunts experienced more subjectively felt tolerance and having spent more time obtaining or using cannabis, but were less likely to experience other problems, even at the same level of CP severity. These findings suggest smoking blunts might promote the development of problematic cannabis use. Responses to cannabis problems differed by history of blunt smoking, possibly implicating an influence of tobacco on measurement of cannabis use disorders. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Li, Shengjie; Li, Yao; Shen, Li; Jin, Ping; Chen, Liming; Ma, Fei
2017-02-01
Drosophila melanogaster is widely used as a model system to study innate immunity and signaling pathways related to innate immunity, including the Toll signaling pathway. Although this pathway is well studied, the precise mechanisms of posttranscriptional regulation of key components of the Toll signaling pathway by microRNAs (miRNAs) remain obscure. In this study, we used an in silico strategy in combination with the Gal80 ts -Gal4 driver system to identify microRNA-958 (miR-958) as a candidate Toll pathway regulating miRNA in Drosophila We report that overexpression of miR-958 significantly reduces the expression of Drosomycin, a key antimicrobial peptide involved in Toll signaling and the innate immune response. We further demonstrate in vitro and in vivo that miR-958 targets the Toll and Dif genes, key components of the Toll signaling pathway, to negatively regulate Drosomycin expression. In addition, a miR-958 sponge rescued the expression of Toll and Dif, resulting in increased expression of Drosomycin. These results, not only revealed a novel function and modulation pattern of miR-958, but also provided a new insight into the underlying molecular mechanisms of Toll signaling in regulation of innate immunity. Copyright © 2017 the American Physiological Society.
Durand, Adeline; Desfontaines, Jean-Michel; Iurchenko, Ielyzaveta; Auger, Hélène; Leach, David R. F.
2017-01-01
Marker frequency analysis of the Escherichia coli recB mutant chromosome has revealed a deficit of DNA in a specific zone of the terminus, centred on the dif/TerC region. Using fluorescence microscopy of a marked chromosomal site, we show that the dif region is lost after replication completion, at the time of cell division, in one daughter cell only, and that the phenomenon is transmitted to progeny. Analysis by marker frequency and microscopy shows that the position of DNA loss is not defined by the replication fork merging point since it still occurs in the dif/TerC region when the replication fork trap is displaced in strains harbouring ectopic Ter sites. Terminus DNA loss in the recB mutant is also independent of dimer resolution by XerCD at dif and of Topo IV action close to dif. It occurs in the terminus region, at the point of inversion of the GC skew, which is also the point of convergence of specific sequence motifs like KOPS and Chi sites, regardless of whether the convergence of GC skew is at dif (wild-type) or a newly created sequence. In the absence of FtsK-driven DNA translocation, terminus DNA loss is less precisely targeted to the KOPS convergence sequence, but occurs at a similar frequency and follows the same pattern as in FtsK+ cells. Importantly, using ftsIts, ftsAts division mutants and cephalexin treated cells, we show that DNA loss of the dif region in the recB mutant is decreased by the inactivation of cell division. We propose that it results from septum-induced chromosome breakage, and largely contributes to the low viability of the recB mutant. PMID:28968392
Lock, Jaclyn; Liu, Huinan
2011-01-01
Background Nanomaterials have unique advantages in controlling stem cell function due to their biomimetic characteristics and special biological and mechanical properties. Controlling adhesion and differentiation of stem cells is critical for tissue regeneration. Methods This in vitro study investigated the effects of nano-hydroxyapatite, nano-hydroxyapatite-polylactide- co-glycolide (PLGA) composites, and a bone morphogenetic protein (BMP-7)- derived short peptide (DIF-7c) on osteogenic differentiation of human mesenchymal stem cells (MSC). The peptide was chemically functionalized onto nano-hydroxyapatite, incorporated into a nanophase hydroxyapatite-PLGA composite or PLGA control, or directly injected into culture media. Results Unlike the PLGA control, the nano-hydroxyapatite-PLGA composites promoted adhesion of human MSC. Importantly, nano-hydroxyapatite and nano-hydroxyapatite-PLGA composites promoted osteogenic differentiation of human MSCs, comparable with direct injection of the DIF-7c peptide into culture media. Conclusion Nano-hydroxyapatite and nano-hydroxyapatite-PLGA composites provide a promising alternative in directing the adhesion and differentiation of human MSC. These nanocomposites should be studied further to clarify their effects on MSC functions and bone remodeling in vivo, eventually translating to clinical applications. PMID:22114505