A Quasi-Parametric Method for Fitting Flexible Item Response Functions
ERIC Educational Resources Information Center
Liang, Longjuan; Browne, Michael W.
2015-01-01
If standard two-parameter item response functions are employed in the analysis of a test with some newly constructed items, it can be expected that, for some items, the item response function (IRF) will not fit the data well. This lack of fit can also occur when standard IRFs are fitted to personality or psychopathology items. When investigating…
ERIC Educational Resources Information Center
Fukuhara, Hirotaka; Kamata, Akihito
2011-01-01
A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into…
ERIC Educational Resources Information Center
Tay, Louis; Vermunt, Jeroen K.; Wang, Chun
2013-01-01
We evaluate the item response theory with covariates (IRT-C) procedure for assessing differential item functioning (DIF) without preknowledge of anchor items (Tay, Newman, & Vermunt, 2011). This procedure begins with a fully constrained baseline model, and candidate items are tested for uniform and/or nonuniform DIF using the Wald statistic.…
Teresi, Jeanne A; Ocepek-Welikson, Katja; Ramirez, Mildred; Kleinman, Marjorie; Ornstein, Katherine; Siu, Albert
2016-01-01
Background The Family Satisfaction with End-of-Life Care is an internationally used measure of satisfaction with cancer care. However, the Family Satisfaction with End-of-Life Care has not been studied for equivalence of item endorsement across different socio-demographic groups using differential item functioning. Aims The aims of this secondary data analysis were (1) to examine potential differential item functioning in the family satisfaction item set with respect to type of caregiver, race, and patient age, gender, and education and (2) to provide parameters and documentation of differential item functioning for an item bank. Design A mixed qualitative and quantitative analysis was conducted. A priori hypotheses regarding potential group differences in item response were established. Item response theory and Wald tests were used for the analyses of differential item functioning, accompanied by magnitude and impact measures. Results Very little significant differential item functioning was observed for patient's age and gender. For race, 13 items showed differential item functioning after multiple comparison adjustment, 10 with non-uniform differential item functioning. No items evidenced differential item functioning of high magnitude, and the impact was negligible. For education, 5 items evidenced uniform differential item functioning after adjustment, none of high magnitude. Differential item functioning impact was trivial. One item evidenced differential item functioning for the caregiver relationship variable. Conclusion Differential item functioning was observed primarily for race and education. No differential item functioning of high magnitude was observed for any item, and the overall impact of differential item functioning was negligible. One item, satisfaction with “the patient's pain relief,” might be singled out for further study, given that this item was both hypothesized and observed to show differential item functioning for race and education. PMID:25160692
Vegetable parenting practices scale. Item response modeling analyses
Chen, Tzu-An; O’Connor, Teresia; Hughes, Sheryl; Beltran, Alicia; Baranowski, Janice; Diep, Cassandra; Baranowski, Tom
2015-01-01
Objective To evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We also tested for differences in the ways item function (called differential item functioning) across child’s gender, ethnicity, age, and household income groups. Method Parents of 3–5 year old children completed a self-reported vegetable parenting practices scale online. Vegetable parenting practices consisted of 14 effective vegetable parenting practices and 12 ineffective vegetable parenting practices items, each with three subscales (responsiveness, structure, and control). Multidimensional polytomous item response modeling was conducted separately on effective vegetable parenting practices and ineffective vegetable parenting practices. Results One effective vegetable parenting practice item did not fit the model well in the full sample or across demographic groups, and another was a misfit in differential item functioning analyses across child’s gender. Significant differential item functioning was detected across children’s age and ethnicity groups, and more among effective vegetable parenting practices than ineffective vegetable parenting practices items. Wright maps showed items only covered parts of the latent trait distribution. The harder- and easier-to-respond ends of the construct were not covered by items for effective vegetable parenting practices and ineffective vegetable parenting practices, respectively. Conclusions Several effective vegetable parenting practices and ineffective vegetable parenting practices scale items functioned differently on the basis of child’s demographic characteristics; therefore, researchers should use these vegetable parenting practices scales with caution. Item response modeling should be incorporated in analyses of parenting practice questionnaires to better assess differences across demographic characteristics. PMID:25895694
Improving measurement of injection drug risk behavior using item response theory.
Janulis, Patrick
2014-03-01
Recent research highlights the multiple steps to preparing and injecting drugs and the resultant viral threats faced by drug users. This research suggests that more sensitive measurement of injection drug HIV risk behavior is required. In addition, growing evidence suggests there are gender differences in injection risk behavior. However, the potential for differential item functioning between genders has not been explored. To explore item response theory as an improved measurement modeling technique that provides empirically justified scaling of injection risk behavior and to examine for potential gender-based differential item functioning. Data is used from three studies in the National Institute on Drug Abuse's Criminal Justice Drug Abuse Treatment Studies. A two-parameter item response theory model was used to scale injection risk behavior and logistic regression was used to examine for differential item functioning. Item fit statistics suggest that item response theory can be used to scale injection risk behavior and these models can provide more sensitive estimates of risk behavior. Additionally, gender-based differential item functioning is present in the current data. Improved measurement of injection risk behavior using item response theory should be encouraged as these models provide increased congruence between construct measurement and the complexity of injection-related HIV risk. Suggestions are made to further improve injection risk behavior measurement. Furthermore, results suggest direct comparisons of composite scores between males and females may be misleading and future work should account for differential item functioning before comparing levels of injection risk behavior.
ERIC Educational Resources Information Center
Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.
2016-01-01
In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…
An NCME Instructional Module on Latent DIF Analysis Using Mixture Item Response Models
ERIC Educational Resources Information Center
Cho, Sun-Joo; Suh, Youngsuk; Lee, Woo-yeol
2016-01-01
The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called…
ERIC Educational Resources Information Center
Bilir, Mustafa Kuzey
2009-01-01
This study uses a new psychometric model (mixture item response theory-MIMIC model) that simultaneously estimates differential item functioning (DIF) across manifest groups and latent classes. Current DIF detection methods investigate DIF from only one side, either across manifest groups (e.g., gender, ethnicity, etc.), or across latent classes…
ERIC Educational Resources Information Center
Zheng, Yinggan; Gierl, Mark J.; Cui, Ying
2010-01-01
This study combined the kernel smoothing procedure and a nonparametric differential item functioning statistic--Cochran's Z--to statistically test the difference between the kernel-smoothed item response functions for reference and focal groups. Simulation studies were conducted to investigate the Type I error and power of the proposed…
Massof, Robert W
2014-10-01
A simple theoretical framework explains patient responses to items in rating scale questionnaires. Fixed latent variables position each patient and each item on the same linear scale. Item responses are governed by a set of fixed category thresholds, one for each ordinal response category. A patient's item responses are magnitude estimates of the difference between the patient variable and the patient's estimate of the item variable, relative to his/her personally defined response category thresholds. Differences between patients in their personal estimates of the item variable and in their personal choices of category thresholds are represented by random variables added to the corresponding fixed variables. Effects of intervention correspond to changes in the patient variable, the patient's response bias, and/or latent item variables for a subset of items. Intervention effects on patients' item responses were simulated by assuming the random variables are normally distributed with a constant scalar covariance matrix. Rasch analysis was used to estimate latent variables from the simulated responses. The simulations demonstrate that changes in the patient variable and changes in response bias produce indistinguishable effects on item responses and manifest as changes only in the estimated patient variable. Changes in a subset of item variables manifest as intervention-specific differential item functioning and as changes in the estimated person variable that equals the average of changes in the item variables. Simulations demonstrate that intervention-specific differential item functioning produces inefficiencies and inaccuracies in computer adaptive testing. © The Author(s) 2013 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
The Effect of Error in Item Parameter Estimates on the Test Response Function Method of Linking.
ERIC Educational Resources Information Center
Kaskowitz, Gary S.; De Ayala, R. J.
2001-01-01
Studied the effect of item parameter estimation for computation of linking coefficients for the test response function (TRF) linking/equating method. Simulation results showed that linking was more accurate when there was less error in the parameter estimates, and that 15 or 25 common items provided better results than 5 common items under both…
A Comparison of Linking and Concurrent Calibration under the Graded Response Model.
ERIC Educational Resources Information Center
Kim, Seock-Ho; Cohen, Allan S.
Applications of item response theory to practical testing problems including equating, differential item functioning, and computerized adaptive testing, require that item parameter estimates be placed onto a common metric. In this study, two methods for developing a common metric for the graded response model under item response theory were…
Marfeo, Elizabeth E; Ni, Pengsheng; Chan, Leighton; Rasch, Elizabeth K; Jette, Alan M
2014-07-01
The goal of this article was to investigate optimal functioning of using frequency vs. agreement rating scales in two subdomains of the newly developed Work Disability Functional Assessment Battery: the Mood & Emotions and Behavioral Control scales. A psychometric study comparing rating scale performance embedded in a cross-sectional survey used for developing a new instrument to measure behavioral health functioning among adults applying for disability benefits in the United States was performed. Within the sample of 1,017 respondents, the range of response category endorsement was similar for both frequency and agreement item types for both scales. There were fewer missing values in the frequency items than the agreement items. Both frequency and agreement items showed acceptable reliability. The frequency items demonstrated optimal effectiveness around the mean ± 1-2 standard deviation score range; the agreement items performed better at the extreme score ranges. Findings suggest an optimal response format requires a mix of both agreement-based and frequency-based items. Frequency items perform better in the normal range of responses, capturing specific behaviors, reactions, or situations that may elicit a specific response. Agreement items do better for those whose scores are more extreme and capture subjective content related to general attitudes, behaviors, or feelings of work-related behavioral health functioning. Copyright © 2014 Elsevier Inc. All rights reserved.
Better assessment of physical function: item improvement is neglected but essential
2009-01-01
Introduction Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. Methods The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. Results We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Conclusions Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes. PMID:20015354
Better assessment of physical function: item improvement is neglected but essential.
Bruce, Bonnie; Fries, James F; Ambrosini, Debbie; Lingala, Bharathi; Gandek, Barbara; Rose, Matthias; Ware, John E
2009-01-01
Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes.
Person Response Functions and the Definition of Units in the Social Sciences
ERIC Educational Resources Information Center
Engelhard, George, Jr.; Perkins, Aminah F.
2011-01-01
Humphry (this issue) has written a thought-provoking piece on the interpretation of item discrimination parameters as scale units in item response theory. One of the key features of his work is the description of an item response theory (IRT) model that he calls the logistic measurement function that combines aspects of two traditions in IRT that…
Detection of Differential Item Functioning Using the Lasso Approach
ERIC Educational Resources Information Center
Magis, David; Tuerlinckx, Francis; De Boeck, Paul
2015-01-01
This article proposes a novel approach to detect differential item functioning (DIF) among dichotomously scored items. Unlike standard DIF methods that perform an item-by-item analysis, we propose the "LR lasso DIF method": logistic regression (LR) model is formulated for all item responses. The model contains item-specific intercepts,…
A Model-Free Diagnostic for Single-Peakedness of Item Responses Using Ordered Conditional Means
ERIC Educational Resources Information Center
Polak, Marike; De Rooij, Mark; Heiser, Willem J.
2012-01-01
In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone & Chave's (1929) "criterion…
Chansirinukor, Wunpen; Maher, Christopher G; Latimer, Jane; Hush, Julia
2005-01-01
Retrospective design. To compare the responsiveness and test-retest reliability of the Functional Rating Index and the 18-item version of the Roland-Morris Disability Questionnaire in detecting change in disability in patients with work-related low back pain. Many low back pain-specific disability questionnaires are available, including the Functional Rating Index and the 18-item version of the Roland-Morris Disability Questionnaire. No previous study has compared the responsiveness and reliability of these questionnaires. Files of patients who had been treated for work-related low back pain at a physical therapy clinic were reviewed, and those containing initial and follow-up Functional Rating Index and 18-item Roland-Morris Disability Questionnaires were selected. The responsiveness of both questionnaires was compared using two different methods. First, using the assumption that patients receiving treatment improve over time, various responsiveness coefficients were calculated. Second, using change in work status as an external criterion to identify improved and nonimproved patients, Spearman's rho and receiver operating characteristic curves were calculated. Reliability was estimated from the subset of patients who reported no change in their condition over this period and expressed with the intraclass correlation coefficient and the minimal detectable change. One hundred and forty-three patient files were retrieved. The responsiveness coefficients for the Functional Rating Index were greater than for the 18-item Roland-Morris Disability Questionnaire. The intraclass correlation coefficient values for both questionnaires calculated from 96 patient files were similar, but the minimal detectable change for the Functional Rating Index was less than for the 18-item Roland-Morris Disability Questionnaire. The Functional Rating Index seems preferable to the 18-item Roland-Morris Disability Questionnaire for use in clinical trials and clinical practice.
Semiparametric Item Response Functions in the Context of Guessing
ERIC Educational Resources Information Center
Falk, Carl F.; Cai, Li
2016-01-01
We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood-based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…
A Bayesian Beta-Mixture Model for Nonparametric IRT (BBM-IRT)
ERIC Educational Resources Information Center
Arenson, Ethan A.; Karabatsos, George
2017-01-01
Item response models typically assume that the item characteristic (step) curves follow a logistic or normal cumulative distribution function, which are strictly monotone functions of person test ability. Such assumptions can be overly-restrictive for real item response data. We propose a simple and more flexible Bayesian nonparametric IRT model…
ERIC Educational Resources Information Center
Ferrando, Pere J.
2004-01-01
This study used kernel-smoothing procedures to estimate the item characteristic functions (ICFs) of a set of continuous personality items. The nonparametric ICFs were compared with the ICFs estimated (a) by the linear model and (b) by Samejima's continuous-response model. The study was based on a conditioned approach and used an error-in-variables…
A Multidimensional Ideal Point Item Response Theory Model for Binary Data
ERIC Educational Resources Information Center
Maydeu-Olivares, Albert; Hernandez, Adolfo; McDonald, Roderick P.
2006-01-01
We introduce a multidimensional item response theory (IRT) model for binary data based on a proximity response mechanism. Under the model, a respondent at the mode of the item response function (IRF) endorses the item with probability one. The mode of the IRF is the ideal point, or in the multidimensional case, an ideal hyperplane. The model…
ERIC Educational Resources Information Center
Hospers, J. Mirjam Boeschen; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B.; Kramer, Sophia E.
2016-01-01
Purpose: We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Method: Cross-sectional data from 2,352 adults with and without hearing…
ERIC Educational Resources Information Center
Wang, Wen-Chung
2004-01-01
Scale indeterminacy in analysis of differential item functioning (DIF) within the framework of item response theory can be resolved by imposing 3 anchor item methods: the equal-mean-difficulty method, the all-other anchor item method, and the constant anchor item method. In this article, applicability and limitations of these 3 methods are…
ERIC Educational Resources Information Center
Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem
2016-01-01
The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…
Johnson, Timothy R; Kuhn, Kristine M
2015-12-01
This paper introduces the ltbayes package for R. This package includes a suite of functions for investigating the posterior distribution of latent traits of item response models. These include functions for simulating realizations from the posterior distribution, profiling the posterior density or likelihood function, calculation of posterior modes or means, Fisher information functions and observed information, and profile likelihood confidence intervals. Inferences can be based on individual response patterns or sets of response patterns such as sum scores. Functions are included for several common binary and polytomous item response models, but the package can also be used with user-specified models. This paper introduces some background and motivation for the package, and includes several detailed examples of its use.
Optimal Linking Design for Response Model Parameters
ERIC Educational Resources Information Center
Barrett, Michelle D.; van der Linden, Wim J.
2017-01-01
Linking functions adjust for differences between identifiability restrictions used in different instances of the estimation of item response model parameters. These adjustments are necessary when results from those instances are to be compared. As linking functions are derived from estimated item response model parameters, parameter estimation…
Tokuda, Yasuharu; Okubo, Tomoya; Ohde, Sachiko; Jacobs, Joshua; Takahashi, Osamu; Omata, Fumio; Yanai, Haruo; Hinohara, Shigeaki; Fukui, Tsuguya
2009-06-01
The Short Form-8 (SF-8) questionnaire is a commonly used 8-item instrument of health-related quality of life (QOL) and provides a health profile of eight subdimensions. Our aim was to examine the psychometric properties of the Japanese version of the SF-8 instrument using methodology based on nominal categories model. Using data from an adjusted random sample from a nationally representative panel, the nominal categories modeling was applied to SF-8 items to characterize coverage of the latent trait (theta). Probabilities for response choices were described as functions on the latent trait. Information functions were generated based on the estimated item parameters. A total of 3344 participants (53%, women; median age, 35 years) provided responses. One factor was retained (eigenvalue, 4.65; variance proportion of 0.58) and used as theta. All item response category characteristic curves satisfied the monotonicity assumption in accurate order with corresponding ordinal responses. Four items (general health, bodily pain, vitality, and mental health) cover most of the spectrum of theta, while the other four items (physical function, role physical [role limitations because of physical health], social functioning, and role emotional [role limitations because of emotional problems] ) cover most of the negative range of theta. Information function for all items combined peaked at -0.7 of theta (information = 18.5) and decreased with increasing theta. The SF-8 instrument performs well among those with poor QOL across the continuum of the latent trait and thus can recognize more effectively persons with relatively poorer QOL than those with relatively better QOL.
Semi-Parametric Item Response Functions in the Context of Guessing. CRESST Report 844
ERIC Educational Resources Information Center
Falk, Carl F.; Cai, Li
2015-01-01
We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…
Item Response Theory Using Hierarchical Generalized Linear Models
ERIC Educational Resources Information Center
Ravand, Hamdollah
2015-01-01
Multilevel models (MLMs) are flexible in that they can be employed to obtain item and person parameters, test for differential item functioning (DIF) and capture both local item and person dependence. Papers on the MLM analysis of item response data have focused mostly on theoretical issues where applications have been add-ons to simulation…
Boeschen Hospers, J Mirjam; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B; Kramer, Sophia E
2016-04-01
We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Cross-sectional data from 2,352 adults with and without hearing impairment, ages 18-70 years, were analyzed. They completed the AIADH in the web-based prospective cohort study "Netherlands Longitudinal Study on Hearing." A graded response model was fitted to the AIADH data. Category response curves, item information curves, and the standard error as a function of self-reported hearing ability were plotted. The graded response model showed a good fit. Item information curves were most reliable for adults who reported having hearing disability and less reliable for adults with normal hearing. The standard error plot showed that self-reported hearing ability is most reliably measured for adults reporting mild up to moderate hearing disability. This is one of the few item response theory studies on audiological self-reports. All AIADH items could be hierarchically placed on the self-reported hearing ability continuum, meaning they measure the same construct. This provides a promising basis for developing a clinically useful computerized adaptive test, where item selection adapts to the hearing ability of individuals, resulting in efficient assessment of hearing disability.
Real and Artificial Differential Item Functioning in Polytomous Items
ERIC Educational Resources Information Center
Andrich, David; Hagquist, Curt
2015-01-01
Differential item functioning (DIF) for an item between two groups is present if, for the same person location on a variable, persons from different groups have different expected values for their responses. Applying only to dichotomously scored items in the popular Mantel-Haenszel (MH) method for detecting DIF in which persons are classified by…
Paz, Sylvia H.; Jones, Loretta; Calderón, José L.; Hays, Ron D.
2016-01-01
Background Depression and physical function are especially important health domains for the elderly. The Geriatric Depression Scale (GDS) and the Patient-Reported Outcomes Measurement Information System (PROMIS®) Physical Function Item Bank are two surveys commonly used to measure these domains. It is unclear if these two instruments adequately measure these aspects of health in minority elderly. Objective To estimate the readability of the GDS and PROMIS® Physical Function items and to assess their comprehensibility by a sample of African American and Latino elderly. Methods Readability was estimated using the Flesch-Kincaid (F-K) and Flesch-Reading-Ease (FRE) formulae for English versions, and a Spanish adaptation of the FRE formula for the Spanish versions. Comprehension of the GDS and PROMIS items by minority elderly was evaluated with 30 cognitive interviews. Results Readability estimates of a number of items in English and Spanish of the GDS and PROMIS physical functioning items exceed the recommended 5th grade level, or were rated as fairly difficult, difficult, or very difficult to read. Cognitive interviews revealed that many participants felt that more than the two (yes/no) GDS response options were needed to answer the questions. Wording of several PROMIS items was considered confusing and responses potentially uninterpretable because they were based on physical aids. Conclusions Problems with item wording and response options of the GDS and PROMIS Physical Function items may negatively affect reliability and validity of measurement when used with minority elderly. PMID:27599978
Item response theory analyses of the Delis-Kaplan Executive Function System card sorting subtest.
Spencer, Mercedes; Cho, Sun-Joo; Cutting, Laurie E
2018-02-02
In the current study, we examined the dimensionality of the 16-item Card Sorting subtest of the Delis-Kaplan Executive Functioning System assessment in a sample of 264 native English-speaking children between the ages of 9 and 15 years. We also tested for measurement invariance for these items across age and gender groups using item response theory (IRT). Results of the exploratory factor analysis indicated that a two-factor model that distinguished between verbal and perceptual items provided the best fit to the data. Although the items demonstrated measurement invariance across age groups, measurement invariance was violated for gender groups, with two items demonstrating differential item functioning for males and females. Multigroup analysis using all 16 items indicated that the items were more effective for individuals whose IRT scale scores were relatively high. A single-group explanatory IRT model using 14 non-differential item functioning items showed that for perceptual ability, females scored higher than males and that scores increased with age for both males and females; for verbal ability, the observed increase in scores across age differed for males and females. The implications of these findings are discussed.
Marfeo, Elizabeth E.; Ni, Pengsheng; Bogusz, Kara; Meterko, Mark; McDonough, Christine M.; Chan, Leighton; Rasch, Elizabeth K.; Brandt, Diane E.; Jette, Alan M.
2014-01-01
Objectives To use item response theory (IRT) data simulations to construct and perform initial psychometric testing of a newly developed instrument, the Social Security Administration Behavioral Health Function (SSA-BH) instrument, that aims to assess behavioral health functioning relevant to the context of work. Design Cross-sectional survey followed by item response theory (IRT) calibration data simulations Setting Community Participants A sample of individuals applying for SSA disability benefits, claimants (N=1015), and a normative comparative sample of US adults (N=1000) Interventions None. Main Outcome Measure Social Security Administration Behavioral Health Function (SSA-BH) measurement instrument Results Item response theory analyses supported the unidimensionality of four SSA-BH scales: Mood and Emotions (35 items), Self-Efficacy (23 items), Social Interactions (6 items), and Behavioral Control (15 items). All SSA-BH scales demonstrated strong psychometric properties including reliability, accuracy, and breadth of coverage. High correlations of the simulated 5- or 10- item CATs with the full item bank indicated robust ability of the CAT approach to comprehensively characterize behavioral health function along four distinct dimensions. Conclusions Initial testing and evaluation of the SSA-BH instrument demonstrated good accuracy, reliability, and content coverage along all four scales. Behavioral function profiles of SSA claimants were generated and compared to age and sex matched norms along four scales: Mood and Emotions, Behavioral Control, Social Interactions, and Self-Efficacy. Utilizing the CAT based approach offers the ability to collect standardized, comprehensive functional information about claimants in an efficient way, which may prove useful in the context of the SSA’s work disability programs. PMID:23542404
Consequences of Ignoring Guessing when Estimating the Latent Density in Item Response Theory
ERIC Educational Resources Information Center
Woods, Carol M.
2008-01-01
In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters. In extant Monte Carlo evaluations of RC-IRT, the item response function (IRF) used to fit the data is the same one used to generate the data. The present simulation study examines RC-IRT when the IRF is imperfectly…
Rasch Measurement and Item Banking: Theory and Practice.
ERIC Educational Resources Information Center
Nakamura, Yuji
The Rasch Model is an item response theory, one parameter model developed that states that the probability of a correct response on a test is a function of the difficulty of the item and the ability of the candidate. Item banking is useful for language testing. The Rasch Model provides estimates of item difficulties that are meaningful,…
Item Response Theory Models for Wording Effects in Mixed-Format Scales
ERIC Educational Resources Information Center
Wang, Wen-Chung; Chen, Hui-Fang; Jin, Kuan-Yu
2015-01-01
Many scales contain both positively and negatively worded items. Reverse recoding of negatively worded items might not be enough for them to function as positively worded items do. In this study, we commented on the drawbacks of existing approaches to wording effect in mixed-format scales and used bi-factor item response theory (IRT) models to…
Yau, David T W; Wong, May C M; Lam, K F; McGrath, Colman
2015-08-19
Four-factor structure of the two 8-item short forms of Child Perceptions Questionnaire CPQ11-14 (RSF:8 and ISF:8) has been confirmed. However, the sum scores are typically reported in practice as a proxy of Oral health-related Quality of Life (OHRQoL), which implied a unidimensional structure. This study first assessed the unidimensionality of 8-item short forms of CPQ11-14. Item response theory (IRT) was employed to offer an alternative and complementary approach of validation and to overcome the limitations of classical test theory assumptions. A random sample of 649 12-year-old school children in Hong Kong was analyzed. Unidimensionality of the scale was tested by confirmatory factor analysis (CFA), principle component analysis (PCA) and local dependency (LD) statistic. Graded response model was fitted to the data. Contribution of each item to the scale was assessed by item information function (IIF). Reliability of the scale was assessed by test information function (TIF). Differential item functioning (DIF) across gender was identified by Wald test and expected score functions. Both CPQ11-14 RSF:8 and ISF:8 did not deviate much from the unidimensionality assumption. Results from CFA indicated acceptable fit of the one-factor model. PCA indicated that the first principle component explained >30 % of the total variation with high factor loadings for both RSF:8 and ISF:8. Almost all LD statistic <10 indicated the absence of local dependency. Flat and low IIFs were observed in the oral symptoms items suggesting little contribution of information to the scale and item removal caused little practical impact. Comparing the TIFs, RSF:8 showed slightly better information than ISF:8. In addition to oral symptoms items, the item "Concerned with what other people think" demonstrated a uniform DIF (p < 0.001). The expected score functions were not much different between boys and girls. Items related to oral symptoms were not informative to OHRQoL and deletion of these items is suggested. The impact of DIF across gender on the overall score was minimal. CPQ11-14 RSF:8 performed slightly better than ISF:8 in measurement precision. The 6-item short forms suggested by IRT validation should be further investigated to ensure their robustness, responsiveness and discriminative performance.
The Dutch Identity: A New Tool for the Study of Item Response Models.
ERIC Educational Resources Information Center
Holland, Paul W.
1990-01-01
The Dutch Identity is presented as a useful tool for expressing the basic equations of item response models that relate the manifest probabilities to the item response functions and the latent trait distribution. Ways in which the identity may be exploited are suggested and illustrated. (SLD)
Effect of Differential Item Functioning on Test Equating
ERIC Educational Resources Information Center
Kabasakal, Kübra Atalay; Kelecioglu, Hülya
2015-01-01
This study examines the effect of differential item functioning (DIF) items on test equating through multilevel item response models (MIRMs) and traditional IRMs. The performances of three different equating models were investigated under 24 different simulation conditions, and the variables whose effects were examined included sample size, test…
Ramsay-Curve Differential Item Functioning
ERIC Educational Resources Information Center
Woods, Carol M.
2011-01-01
Differential item functioning (DIF) occurs when an item on a test, questionnaire, or interview has different measurement properties for one group of people versus another, irrespective of true group-mean differences on the constructs being measured. This article is focused on item response theory based likelihood ratio testing for DIF (IRT-LR or…
Paz, Sylvia H; Spritzer, Karen L; Morales, Leo S; Hays, Ron D
2013-03-29
To evaluate the equivalence of the PROMIS® wave 1 physical functioning item bank, by age (50 years or older versus 18-49). A total of 114 physical functioning items with 5 response choices were administered to English- (n=1504) and Spanish-language (n=640) adults. Item frequencies, means and standard deviations, item-scale correlations, and internal consistency reliability were estimated. Differential Item Functioning (DIF) by age was evaluated. Thirty of the 114 items were fagged for DIF based on an R-squared of 0.02 or above criterion. The expected total score was higher for those respondents who were 18-49 than those who were 50 or older. Those who were 50 years or older versus 18-49 years old with the same level of physical functioning responded differently to 30 of the 114 items in the PROMIS® physical functioning item bank. This study yields essential information about the equivalence of the physical functioning items in older versus younger individuals.
Paz, Sylvia H; Jones, Loretta; Calderón, José L; Hays, Ron D
2017-02-01
Depression and physical function are particularly important health domains for the elderly. The Geriatric Depression Scale (GDS) and the Patient-Reported Outcomes Measurement Information System (PROMIS ® ) physical function item bank are two surveys commonly used to measure these domains. It is unclear if these two instruments adequately measure these aspects of health in minority elderly. The aim of this study was to estimate the readability of the GDS and PROMIS ® physical function items and to assess their comprehensibility using a sample of African American and Latino elderly. Readability was estimated using the Flesch-Kincaid and Flesch Reading Ease (FRE) formulae for English versions, and a Spanish adaptation of the FRE formula for the Spanish versions. Comprehension of the GDS and PROMIS ® items by minority elderly was evaluated with 30 cognitive interviews. Readability estimates of a number of items in English and Spanish of the GDS and PROMIS ® physical functioning items exceed the U.S. recommended 5th-grade threshold for vulnerable populations, or were rated as 'fairly difficult', 'difficult', or 'very difficult' to read. Cognitive interviews revealed that many participants felt that more than the two (yes/no) GDS response options were needed to answer the questions. Wording of several PROMIS ® items was considered confusing, and interpreting responses was problematic because they were based on using physical aids. Problems with item wording and response options of the GDS and PROMIS ® physical function items may reduce reliability and validity of measurement when used with minority elderly.
Rasch validation of the Arabic version of the lower extremity functional scale.
Alnahdi, Ali H
2018-02-01
The purpose of this study was to examine the internal construct validity of the Arabic version of the Lower Extremity Functional Scale (20-item Arabic LEFS) using Rasch analysis. Patients (n = 170) with lower extremity musculoskeletal dysfunction were recruited. Rasch analysis of 20-item Arabic LEFS was performed. Once the initial Rasch analysis indicated that the 20-item Arabic LEFS did not fit the Rasch model, follow-up analyses were conducted to improve the fit of the scale to the Rasch measurement model. These modifications included removing misfitting individuals, changing item scoring structure, removing misfitting items, addressing bias caused by response dependency between items and differential item functioning (DIF). Initial analysis indicated deviation of the 20-item Arabic LEFS from the Rasch model. Disordered thresholds in eight items and response dependency between six items were detected with the scale as a whole did not meet the requirement of unidimensionality. Refinements led to a 15-item Arabic LEFS that demonstrated excellent internal consistency (person separation index [PSI] = 0.92) and satisfied all the requirement of the Rasch model. Rasch analysis did not support the 20-item Arabic LEFS as a unidimensional measure of lower extremity function. The refined 15-item Arabic LEFS met all the requirement of the Rasch model and hence is a valid objective measure of lower extremity function. The Rasch-validated 15-item Arabic LEFS needs to be further tested in an independent sample to confirm its fit to the Rasch measurement model. Implications for Rehabilitation The validity of the 20-item Arabic Lower Extremity Functional Scale to measure lower extremity function is not supported. The 15-item Arabic version of the LEFS is a valid measure of lower extremity function and can be used to quantify lower extremity function in patients with lower extremity musculoskeletal disorders.
Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee
2013-07-01
Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.
Bravini, Elisabetta; Giordano, Andrea; Sartorio, Francesco; Ferriero, Giorgio; Vercelli, Stefano
2017-04-01
To investigate dimensionality and the measurement properties of the Italian Lower Extremity Functional Scale using both classical test theory and Rasch analysis methods, and to provide insights for an improved version of the questionnaire. Rasch analysis of individual patient data. Rehabilitation centre. A total of 135 patients with musculoskeletal diseases of the lower limb. Patients were assessed with the Lower Extremity Functional Scale before and after the rehabilitation. Rasch analysis showed some problems related to rating scale category functioning, items fit, and items redundancy. After an iterative process, which resulted in the reduction of rating scale categories from 5 to 4, and in the deletion of 5 items, the psychometric properties of the Italian Lower Extremity Functional Scale improved. The retained 15 items with a 4-level response format fitted the Rasch model (internal construct validity), and demonstrated unidimensionality and good reliability indices (person-separation reliability 0.92; Cronbach's alpha 0.94). Then, the analysis showed differential item functioning for six of the retained items. The sensitivity to change of the Italian 15-item Lower Extremity Functional Scale was nearly equal to the one of the original version (effect size: 0.93 and 0.98; standardized response mean: 1.20 and 1.28, respectively for the 15-item and 20-item versions). The Italian Lower Extremity Functional Scale had unsatisfactory measurement properties. However, removing five items and simplifying the scoring from 5 to 4 levels resulted in a more valid measure with good reliability and sensitivity to change.
Differential item functioning magnitude and impact measures from item response theory models.
Kleinman, Marjorie; Teresi, Jeanne A
2016-01-01
Measures of magnitude and impact of differential item functioning (DIF) at the item and scale level, respectively are presented and reviewed in this paper. Most measures are based on item response theory models. Magnitude refers to item level effect sizes, whereas impact refers to differences between groups at the scale score level. Reviewed are magnitude measures based on group differences in the expected item scores and impact measures based on differences in the expected scale scores. The similarities among these indices are demonstrated. Various software packages are described that provide magnitude and impact measures, and new software presented that computes all of the available statistics conveniently in one program with explanations of their relationships to one another.
Differential Item Functioning: Its Consequences. Research Report. ETS RR-10-01
ERIC Educational Resources Information Center
Lee, Yi-Hsuan; Zhang, Jinming
2010-01-01
This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…
Differential item functioning analysis of the Vanderbilt Expertise Test for cars.
Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W; Van Gulick, Ana Beth; Gauthier, Isabel
2015-01-01
The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge.
Assessing the Utility of Item Response Theory Models: Differential Item Functioning.
ERIC Educational Resources Information Center
Scheuneman, Janice Dowd
The current status of item response theory (IRT) is discussed. Several IRT methods exist for assessing whether an item is biased. Focus is on methods proposed by L. M. Rudner (1975), F. M. Lord (1977), D. Thissen et al. (1988) and R. L. Linn and D. Harnisch (1981). Rudner suggested a measure of the area lying between the two item characteristic…
Examining Differential Math Performance by Gender and Opportunity to Learn
ERIC Educational Resources Information Center
Albano, Anthony D.; Rodriguez, Michael C.
2013-01-01
Although a substantial amount of research has been conducted on differential item functioning in testing, studies have focused on detecting differential item functioning rather than on explaining how or why it may occur. Some recent work has explored sources of differential functioning using explanatory and multilevel item response models. This…
Oude Voshaar, Martijn A H; Ten Klooster, Peter M; Glas, Cees A W; Vonkeman, Harald E; Taal, Erik; Krishnan, Eswar; Bernelot Moens, Hein J; Boers, Maarten; Terwee, Caroline B; van Riel, Piet L C M; van de Laar, Mart A F J
2015-12-01
To evaluate the content validity and measurement properties of the Patient-Reported Outcome Measurement Information System (PROMIS) physical function item bank and a 20-item short form in patients with RA in comparison with the HAQ disability index (HAQ-DI) and 36-item Short Form Health Survey (SF-36) physical functioning scale (PF-10). The content validity of the instruments was evaluated by linking their items to the International Classification of Functioning, Disability and Health (ICF) core set for RA. The measures were administered to 690 RA patients enrolled in the Dutch Rheumatoid Arthritis Monitoring registry. Measurement precision was evaluated using item response theory methods and construct validity was evaluated by correlating physical function scores with other clinical and patient-reported outcome measures. All 207 health concepts identified in the physical function measures referred to activities that are featured in the ICF. Twenty-three of 26 ICF RA core set domains are featured in the full PROMIS physical function item bank compared with 13 and 8 for the HAQ-DI and PF-10, respectively. As hypothesized, all three physical function instruments were highly intercorrelated (r 0.74-0.84), moderately correlated with disease activity measures (r 0.44-0.63) and weakly correlated with age (rs 0.07-0.14). Item response theory-based analysis revealed that a 20-item PROMIS physical function short form covered a wider range of physical function levels than the HAQ-DI or PF-10. The PROMIS physical function item bank demonstrated excellent measurement properties in RA. A content-driven 20-item short form may be a useful tool for assessing physical function in RA. © The Author 2015. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Item Response Theory analysis of Fagerström Test for Cigarette Dependence.
Svicher, Andrea; Cosci, Fiammetta; Giannini, Marco; Pistelli, Francesco; Fagerström, Karl
2018-02-01
The Fagerström Test for Cigarette Dependence (FTCD) and the Heaviness of Smoking Index (HSI) are the gold standard measures to assess cigarette dependence. However, FTCD reliability and factor structure have been questioned and HSI psychometric properties are in need of further investigations. The present study examined the psychometrics properties of the FTCD and the HSI via the Item Response Theory. The study was a secondary analysis of data collected in 862 Italian daily smokers. Confirmatory factor analysis was run to evaluate the dimensionality of FTCD. A Grade Response Model was applied to FTCD and HSI to verify the fit to the data. Both item and test functioning were analyzed and item statistics, Test Information Function, and scale reliabilities were calculated. Mokken Scale Analysis was applied to estimate homogeneity and Loevinger's coefficients were calculated. The FTCD showed unidimensionality and homogeneity for most of the items and for the total score. It also showed high sensitivity and good reliability from medium to high levels of cigarette dependence, although problems related to some items (i.e., items 3 and 5) were evident. HSI had good homogeneity, adequate item functioning, and high reliability from medium to high levels of cigarette dependence. Significant Differential Item Functioning was found for items 1, 4, 5 of the FTCD and for both items of HSI. HSI seems highly recommended in clinical settings addressed to heavy smokers while FTCD would be better used in smokers with a level of cigarette dependence ranging between low and high. Copyright © 2017 Elsevier Ltd. All rights reserved.
Hill, Bridget; Pallant, Julie; Williams, Gavin; Olver, John; Ferris, Scott; Bialocerkowski, Andrea
2016-12-01
To evaluate the internal construct validity and dimensionality of a new patient-reported outcome measure for people with traumatic brachial plexus injury (BPI) based on the International Classification of Functioning, Disability and Health definition of activity. Cross-sectional study. Outpatient clinics. Adults (age range, 18-82y) with a traumatic BPI (N=106). There were 106 people with BPI who completed a 51-item 5-response questionnaire. Responses were analyzed in 4 phases (missing responses, item correlations, exploratory factor analysis, and Rasch analysis) to evaluate the properties of fit to the Rasch model, threshold response, local dependency, dimensionality, differential item functioning, and targeting. Not applicable, as this study addresses the development of an outcome measure. Six items were deleted for missing responses, and 10 were deleted for high interitem correlations >.81. The remaining 35 items, while demonstrating fit to the Rasch model, showed evidence of local dependency and multidimensionality. Items were divided into 3 subscales: dressing and grooming (8 items), arm and hand (17 items), and no hand (6 items). All 3 subscales demonstrated fit to the model with no local dependency, minimal disordered thresholds, no unidimensionality or differential item functioning for age, time postinjury, or self-selected dominance. Subscales were combined into 3 subtests and demonstrated fit to the model, no misfit, and unidimensionality, allowing calculation of a summary score. This preliminary analysis supports the internal construct validity of the Brachial Assessment Tool, a unidimensional targeted 4-response patient-reported outcome measure designed to solely assess activity after traumatic BPI regardless of level of injury, age at recruitment, premorbid limb dominance, and time postinjury. Further examination is required to determine test-retest reliability and responsiveness. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Flowers, Claudia P.; Raju, Nambury S.; Oshima, T. C.
Current interest in the assessment of measurement equivalence emphasizes two methods of analysis, linear, and nonlinear procedures. This study simulated data using the graded response model to examine the performance of linear (confirmatory factor analysis or CFA) and nonlinear (item-response-theory-based differential item function or IRT-Based…
LeBouthillier, Daniel M; Thibodeau, Michel A; Alberts, Nicole M; Hadjistavropoulos, Heather D; Asmundson, Gordon J G
2015-04-01
Individuals with medical conditions are likely to have elevated health anxiety; however, research has not demonstrated how medical status impacts response patterns on health anxiety measures. Measurement bias can undermine the validity of a questionnaire by overestimating or underestimating scores in groups of individuals. We investigated whether the Short Health Anxiety Inventory (SHAI), a widely-used measure of health anxiety, exhibits medical condition-based bias on item and subscale levels, and whether the SHAI subscales adequately assess the health anxiety continuum. Data were from 963 individuals with diabetes, breast cancer, or multiple sclerosis, and 372 healthy individuals. Mantel-Haenszel tests and item characteristic curves were used to classify the severity of item-level differential item functioning in all three medical groups compared to the healthy group. Test characteristic curves were used to assess scale-level differential item functioning and whether the SHAI subscales adequately assess the health anxiety continuum. Nine out of 14 items exhibited differential item functioning. Two items exhibited differential item functioning in all medical groups compared to the healthy group. In both Thought Intrusion and Fear of Illness subscales, differential item functioning was associated with mildly deflated scores in medical groups with very high levels of the latent traits. Fear of Illness items poorly discriminated between individuals with low and very low levels of the latent trait. While individuals with medical conditions may respond differentially to some items, clinicians and researchers can confidently use the SHAI with a variety of medical populations without concern of significant bias. Copyright © 2015 Elsevier Inc. All rights reserved.
Kramer, Jessica M; Schwartz, Ariel
2017-10-01
This study examined the item interpretability and rating scale use of the Pediatric Evaluation of Disability Inventory-Patient-Reported Outcome (PEDI-PRO) by young people with developmental disabilities. The PEDI-PRO assesses the functional performance of discrete functional tasks in the context of everyday life situations. A two-phase cognitive interview design was implemented with a convenience sample of 37 young people (mean age 19y, SD 2y 5mo; 13 males and 24 females; 68% with intellectual disability) with developmental disabilities. In phase I, 182 item candidates were each reviewed by an average of four young people. In phase II, 103 items were carried forward or revised and each reviewed by an average of seven additional young people. Two raters coded responses for intended item interpretation and performance quality; codes were analysed using descriptive statistics. Qualitative analysis explored young people's self-evaluation process. Items were interpreted as intended by most young people (mean 86%). Young people can use PEDI-PRO response categories appropriately to describe their performance: 94% of positive performance descriptions coincided with a positive response category choice; 73% of negative descriptions coincided with a negative response category choice. Young people interpreted items in a literal manner, and their self-evaluation incorporated the use of supports that facilitate functional performance. The PEDI-PRO's measurement framework appears to support the self-evaluation of functional performance of young people with developmental disabilities. © 2017 Mac Keith Press.
Kang, Hyeon-Ah; Su, Ya-Hui; Chang, Hua-Hua
2018-03-08
A monotone relationship between a true score (τ) and a latent trait level (θ) has been a key assumption for many psychometric applications. The monotonicity property in dichotomous response models is evident as a result of a transformation via a test characteristic curve. Monotonicity in polytomous models, in contrast, is not immediately obvious because item response functions are determined by a set of response category curves, which are conceivably non-monotonic in θ. The purpose of the present note is to demonstrate strict monotonicity in ordered polytomous item response models. Five models that are widely used in operational assessments are considered for proof: the generalized partial credit model (Muraki, 1992, Applied Psychological Measurement, 16, 159), the nominal model (Bock, 1972, Psychometrika, 37, 29), the partial credit model (Masters, 1982, Psychometrika, 47, 147), the rating scale model (Andrich, 1978, Psychometrika, 43, 561), and the graded response model (Samejima, 1972, A general model for free-response data (Psychometric Monograph no. 18). Psychometric Society, Richmond). The study asserts that the item response functions in these models strictly increase in θ and thus there exists strict monotonicity between τ and θ under certain specified conditions. This conclusion validates the practice of customarily using τ in place of θ in applied settings and provides theoretical grounds for one-to-one transformations between the two scales. © 2018 The British Psychological Society.
ERIC Educational Resources Information Center
Grover, Raman K.; Ercikan, Kadriye
2017-01-01
In gender differential item functioning (DIF) research it is assumed that all members of a gender group have similar item response patterns and therefore generalizations from group level to subgroup and individual levels can be made accurately. However DIF items do not necessarily disadvantage every member of a gender group to the same degree,…
Donati, Maria Anna; Chiesi, Francesca; Izzo, Viola A; Primi, Caterina
2017-01-01
As there is a lack of evidence attesting the equivalent item functioning across genders for the most employed instruments used to measure pathological gambling in adolescence, the present study was aimed to test the gender invariance of the Gambling Behavior Scale for Adolescents (GBS-A), a new measurement tool to assess the severity of Gambling Disorder (GD) in adolescents. The equivalence of the items across genders was assessed by analyzing Differential Item Functioning within an Item Response Theory framework. The GBS-A was administered to 1,723 adolescents, and the graded response model was employed. The results attested the measurement equivalence of the GBS-A when administered to male and female adolescent gamblers. Overall, findings provided evidence that the GBS-A is an effective measurement tool of the severity of GD in male and female adolescents and that the scale was unbiased and able to relieve truly gender differences. As such, the GBS-A can be profitably used in educational interventions and clinical treatments with young people.
Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina
2015-06-01
This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism. © The Author(s) 2014.
Development of the Computer-Adaptive Version of the Late-Life Function and Disability Instrument
Tian, Feng; Kopits, Ilona M.; Moed, Richard; Pardasaney, Poonam K.; Jette, Alan M.
2012-01-01
Background. Having psychometrically strong disability measures that minimize response burden is important in assessing of older adults. Methods. Using the original 48 items from the Late-Life Function and Disability Instrument and newly developed items, a 158-item Activity Limitation and a 62-item Participation Restriction item pool were developed. The item pools were administered to a convenience sample of 520 community-dwelling adults 60 years or older. Confirmatory factor analysis and item response theory were employed to identify content structure, calibrate items, and build the computer-adaptive testings (CATs). We evaluated real-data simulations of 10-item CAT subscales. We collected data from 102 older adults to validate the 10-item CATs against the Veteran’s Short Form-36 and assessed test–retest reliability in a subsample of 57 subjects. Results. Confirmatory factor analysis revealed a bifactor structure, and multi-dimensional item response theory was used to calibrate an overall Activity Limitation Scale (141 items) and an overall Participation Restriction Scale (55 items). Fit statistics were acceptable (Activity Limitation: comparative fit index = 0.95, Tucker Lewis Index = 0.95, root mean square error approximation = 0.03; Participation Restriction: comparative fit index = 0.95, Tucker Lewis Index = 0.95, root mean square error approximation = 0.05). Correlation of 10-item CATs with full item banks were substantial (Activity Limitation: r = .90; Participation Restriction: r = .95). Test–retest reliability estimates were high (Activity Limitation: r = .85; Participation Restriction r = .80). Strength and pattern of correlations with Veteran’s Short Form-36 subscales were as hypothesized. Each CAT, on average, took 3.56 minutes to administer. Conclusions. The Late-Life Function and Disability Instrument CATs demonstrated strong reliability, validity, accuracy, and precision. The Late-Life Function and Disability Instrument CAT can achieve psychometrically sound disability assessment in older persons while reducing respondent burden. Further research is needed to assess their ability to measure change in older adults. PMID:22546960
Petrillo, Jennifer; Bressler, Neil M; Lamoureux, Ecosse; Ferreira, Alberto; Cano, Stefan
2017-08-14
The NEI VFQ-25 has undergone psychometric evaluation in patients with varying ocular conditions and the general population. However, important limitations which may affect the interpretation of clinical trial results have been previously identified, such as concerns with reliability and validity. The purpose of this study was to evaluate the National Eye Institute Visual Functioning Questionnaire (NEI VFQ-25) and make recommendations for a revised scoring structure, with a view to improving its psychometric performance and interpretability. Rasch Measurement Theory analyses were conducted in two stages using pooled baseline NEI VFQ-25 data for 2487 participants with retinal diseases enrolled in six clinical trials. In stage 1, we examined: scale-to-sample targeting; thresholds for item response options; item fit statistics; stability; local dependence; and reliability. In stage 2, a post-hoc revision of the scoring structure (VFQ-28R) was created and psychometrically re-evaluated. In stage 1, we found that the NEI VFQ-25 was mis-targeted to the sample, and had disordered response thresholds (15/25 items) and mis-fitting items (8/25 items). However, items appeared to be stable (differential item functioning for three items), have minimal item dependency (one pair of items) and good reliability (person-separation index, 0.93). In stage 2, the modified Rasch-scored NEI VFQ-28-R was assessed. It comprised two broad domains: Activity Limitation (19 items) and Socio-Emotional Functioning (nine items). The NEI VFQ-28-R demonstrated improved performance with fewer disordered response thresholds (no items), less item misfit (three items) and improved population targeting (reduced ceiling effect) compared with the NEI VFQ-25. Compared with the original version, the proposed NEI VFQ-28-R, with Rasch-based scoring and a two-domain structure, appears to offer improved psychometric performance and interpretability of the vision-related quality of life scale for the population analysed.
Modeling Answer Change Behavior: An Application of a Generalized Item Response Tree Model
ERIC Educational Resources Information Center
Jeon, Minjeong; De Boeck, Paul; van der Linden, Wim
2017-01-01
We present a novel application of a generalized item response tree model to investigate test takers' answer change behavior. The model allows us to simultaneously model the observed patterns of the initial and final responses after an answer change as a function of a set of latent traits and item parameters. The proposed application is illustrated…
ERIC Educational Resources Information Center
Drabinová, Adéla; Martinková, Patrícia
2017-01-01
In this article we present a general approach not relying on item response theory models (non-IRT) to detect differential item functioning (DIF) in dichotomous items with presence of guessing. The proposed nonlinear regression (NLR) procedure for DIF detection is an extension of method based on logistic regression. As a non-IRT approach, NLR can…
Item response theory, computerized adaptive testing, and PROMIS: assessment of physical function.
Fries, James F; Witter, James; Rose, Matthias; Cella, David; Khanna, Dinesh; Morgan-DeWitt, Esi
2014-01-01
Patient-reported outcome (PRO) questionnaires record health information directly from research participants because observers may not accurately represent the patient perspective. Patient-reported Outcomes Measurement Information System (PROMIS) is a US National Institutes of Health cooperative group charged with bringing PRO to a new level of precision and standardization across diseases by item development and use of item response theory (IRT). With IRT methods, improved items are calibrated on an underlying concept to form an item bank for a "domain" such as physical function (PF). The most informative items can be combined to construct efficient "instruments" such as 10-item or 20-item PF static forms. Each item is calibrated on the basis of the probability that a given person will respond at a given level, and the ability of the item to discriminate people from one another. Tailored forms may cover any desired level of the domain being measured. Computerized adaptive testing (CAT) selects the best items to sharpen the estimate of a person's functional ability, based on prior responses to earlier questions. PROMIS item banks have been improved with experience from several thousand items, and are calibrated on over 21,000 respondents. In areas tested to date, PROMIS PF instruments are superior or equal to Health Assessment Questionnaire and Medical Outcome Study Short Form-36 Survey legacy instruments in clarity, translatability, patient importance, reliability, and sensitivity to change. Precise measures, such as PROMIS, efficiently incorporate patient self-report of health into research, potentially reducing research cost by lowering sample size requirements. The advent of routine IRT applications has the potential to transform PRO measurement.
Differential item functioning analysis of the Vanderbilt Expertise Test for cars
Lee, Woo-Yeol; Cho, Sun-Joo; McGugin, Rankin W.; Van Gulick, Ana Beth; Gauthier, Isabel
2015-01-01
The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge. PMID:26418499
Equal Area Logistic Estimation for Item Response Theory
NASA Astrophysics Data System (ADS)
Lo, Shih-Ching; Wang, Kuo-Chang; Chang, Hsin-Li
2009-08-01
Item response theory (IRT) models use logistic functions exclusively as item response functions (IRFs). Applications of IRT models require obtaining the set of values for logistic function parameters that best fit an empirical data set. However, success in obtaining such set of values does not guarantee that the constructs they represent actually exist, for the adequacy of a model is not sustained by the possibility of estimating parameters. In this study, an equal area based two-parameter logistic model estimation algorithm is proposed. Two theorems are given to prove that the results of the algorithm are equivalent to the results of fitting data by logistic model. Numerical results are presented to show the stability and accuracy of the algorithm.
Different Approaches to Covariate Inclusion in the Mixture Rasch Model
ERIC Educational Resources Information Center
Li, Tongyun; Jiao, Hong; Macready, George B.
2016-01-01
The present study investigates different approaches to adding covariates and the impact in fitting mixture item response theory models. Mixture item response theory models serve as an important methodology for tackling several psychometric issues in test development, including the detection of latent differential item functioning. A Monte Carlo…
The Usefulness of Differential Item Functioning Methodology in Longitudinal Intervention Studies
USDA-ARS?s Scientific Manuscript database
Perceived self-efficacy (SE) for engaging in physical activity (PA) is a key variable mediating PA change in interventions. The purpose of this study is to demonstrate the usefulness of item response modeling-based (IRM) differential item functioning (DIF) in the investigation of group differences ...
ERIC Educational Resources Information Center
Magis, David; Raiche, Gilles; Beland, Sebastien; Gerard, Paul
2011-01-01
We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual framework of a single focal group, we propose a general approach to estimate the item response functions in each group and to test for the presence…
Procedures to develop a computerized adaptive test to assess patient-reported physical functioning.
McCabe, Erin; Gross, Douglas P; Bulut, Okan
2018-06-07
The purpose of this paper is to demonstrate the procedures to develop and implement a computerized adaptive patient-reported outcome (PRO) measure using secondary analysis of a dataset and items from fixed-format legacy measures. We conducted secondary analysis of a dataset of responses from 1429 persons with work-related lower extremity impairment. We calibrated three measures of physical functioning on the same metric, based on item response theory (IRT). We evaluated efficiency and measurement precision of various computerized adaptive test (CAT) designs using computer simulations. IRT and confirmatory factor analyses support combining the items from the three scales for a CAT item bank of 31 items. The item parameters for IRT were calculated using the generalized partial credit model. CAT simulations show that reducing the test length from the full 31 items to a maximum test length of 8 items, or 20 items is possible without a significant loss of information (95, 99% correlation with legacy measure scores). We demonstrated feasibility and efficiency of using CAT for PRO measurement of physical functioning. The procedures we outlined are straightforward, and can be applied to other PRO measures. Additionally, we have included all the information necessary to implement the CAT of physical functioning in the electronic supplementary material of this paper.
Cross-Cultural Validation of the Quality of Life in Hand Eczema Questionnaire (QOLHEQ).
Ofenloch, Robert F; Oosterhaven, Jart A F; Susitaival, Päivikki; Svensson, Åke; Weisshaar, Elke; Minamoto, Keiko; Onder, Meltem; Schuttelaar, Marie Louise A; Bulbul Baskan, Emel; Diepgen, Thomas L; Apfelbacher, Christian
2017-07-01
The Quality of Life in Hand Eczema Questionnaire (QOLHEQ) is the only instrument assessing disease-specific health-related quality of life in patients with hand eczema. It is available in eight language versions. In this study we assessed if the items of different language versions of the QOLHEQ yield comparable values across countries. An international multicenter study was conducted with participating centers in Finland, Germany, Japan, The Netherlands, Sweden, and Turkey. Methods of item response theory were applied to each subscale to assess differential item functioning for items among countries. Overall, 662 hand eczema patients were recruited into the study. Single items were removed or split according to the item response theory model by country to resolve differential item functioning. After this adjustment, none of the four subscales of the QOLHEQ showed significant misfit to the item response theory model (P < 0.01), and a Person Separation Index of greater than 0.7 showed good internal consistency for each subscale. By adapting the scoring of the QOLHEQ using the methods of item response theory, it was possible to obtain QOLHEQ values that are comparable across countries. Cross-cultural variations in the interpretation of single items were resolved. The QOLHEQ is now ready to be used in international studies assessing the health-related quality of life impact of hand eczema. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Calibrating Item Families and Summarizing the Results Using Family Expected Response Functions
ERIC Educational Resources Information Center
Sinharay, Sandip; Johnson, Matthew S.; Williamson, David M.
2003-01-01
Item families, which are groups of related items, are becoming increasingly popular in complex educational assessments. For example, in automatic item generation (AIG) systems, a test may consist of multiple items generated from each of a number of item models. Item calibration or scoring for such an assessment requires fitting models that can…
ERIC Educational Resources Information Center
French, Brian F.; Gotch, Chad M.
2013-01-01
The Brigance Comprehensive Inventory of Basic Skills-II (CIBS-II) is a diagnostic battery intended for children in grades 1st through 6th. The aim of this study was to test for item invariance, or differential item functioning (DIF), of the CIBS-II across sex in the standardization sample through the use of item response theory DIF detection…
Perera, Subashan; Nace, David A; Resnick, Neil M; Greenspan, Susan L
2017-04-11
The Nursing Home Physical Performance Test (NHPPT) was developed to measure function among nursing home residents using sit-to-stand, scooping applesauce, face washing, dialing phone, putting on sweater, and ambulating tasks. Using item response theory, we explore its measurement characteristics at item level and opportunities for improvements. We used data from long-term care women. We fitted a graded response model, estimated parameters, and constructed probability and information curves. We identified items to be targeted toward lower and higher functioning persons to increase the range of abilities to which the instrument is applicable. We revised the scoring by making sit-to-stand and sweater items harder and dialing phone easier. We examined changes to concurrent validity with activities of daily living (ADL), frailty, and cognitive function. Participants were 86 years old, had more than three comorbidities, and a NHPPT of 19.4. All items had high discrimination and were targeted toward the lower middle range of performance continuum. After revision, sit-to-stand and sweater items demonstrated greater discrimination among the higher functioning and/or greater spread of thresholds for response categories. The overall test showed discrimination over a wider range of individuals. Concurrent validity correlation improved from 0.60 to 0.68 for instrumental ADL and explained variability (R2) from 22% to 36% for frailty. NHPPT has good measurement characteristics at the item level. NHPPT can be improved, implemented in computerized adaptive testing, and combined with self-report for greater utility, but a definitive study is needed. © The Author 2017. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Morales, Leo S; Flowers, Claudia; Gutierrez, Peter; Kleinman, Marjorie; Teresi, Jeanne A
2006-11-01
To illustrate the application of the Differential Item and Test Functioning (DFIT) method using English and Spanish versions of the Mini-Mental State Examination (MMSE). Study participants were 65 years of age or older and lived in North Manhattan, New York. Of the 1578 study participants who were administered the MMSE 665 completed it in Spanish. : The MMSE contains 20 items that measure the degree of cognitive impairment in the areas of orientation, attention and calculation, registration, recall and language, as well as the ability to follow verbal and written commands. After assessing the dimensionality of the MMSE scale, item response theory person and item parameters were estimated separately for the English and Spanish sample using Samejima's 2-parameter graded response model. Then the DFIT framework was used to assess differential item functioning (DIF) and differential test functioning (DTF). Nine items were found to show DIF; these were items that ask the respondent to name the correct season, day of the month, city, state, and 2 nearby streets, recall 3 objects, repeat the phrase no ifs, no ands, no buts, follow the command, "close your eyes," and the command, "take the paper in your right hand, fold the paper in half with both hands, and put the paper down in your lap." At the scale level, however, the MMSE did not show differential functioning. Respondents to the English and Spanish versions of the MMSE are comparable on the basis of scale scores. However, assessments based on individual MMSE items may be misleading.
A Bayesian Semiparametric Item Response Model with Dirichlet Process Priors
ERIC Educational Resources Information Center
Miyazaki, Kei; Hoshino, Takahiro
2009-01-01
In Item Response Theory (IRT), item characteristic curves (ICCs) are illustrated through logistic models or normal ogive models, and the probability that examinees give the correct answer is usually a monotonically increasing function of their ability parameters. However, since only limited patterns of shapes can be obtained from logistic models…
Using Data Augmentation and Markov Chain Monte Carlo for the Estimation of Unfolding Response Models
ERIC Educational Resources Information Center
Johnson, Matthew S.; Junker, Brian W.
2003-01-01
Unfolding response models, a class of item response theory (IRT) models that assume a unimodal item response function (IRF), are often used for the measurement of attitudes. Verhelst and Verstralen (1993)and Andrich and Luo (1993) independently developed unfolding response models by relating the observed responses to a more common monotone IRT…
Effect Size Measures for Differential Item Functioning in a Multidimensional IRT Model
ERIC Educational Resources Information Center
Suh, Youngsuk
2016-01-01
This study adapted an effect size measure used for studying differential item functioning (DIF) in unidimensional tests and extended the measure to multidimensional tests. Two effect size measures were considered in a multidimensional item response theory model: signed weighted P-difference and unsigned weighted P-difference. The performance of…
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning
ERIC Educational Resources Information Center
Finch, W. Holmes
2011-01-01
Missing information is a ubiquitous aspect of data analysis, including responses to items on cognitive and affective instruments. Although the broader statistical literature describes missing data methods, relatively little work has focused on this issue in the context of differential item functioning (DIF) detection. Such prior research has…
A New Functional Health Literacy Scale for Japanese Young Adults Based on Item Response Theory.
Tsubakita, Takashi; Kawazoe, Nobuo; Kasano, Eri
2017-03-01
Health literacy predicts health outcomes. Despite concerns surrounding the health of Japanese young adults, to date there has been no objective assessment of health literacy in this population. This study aimed to develop a Functional Health Literacy Scale for Young Adults (funHLS-YA) based on item response theory. Each item in the scale requires participants to choose the most relevant term from 3 choices in relation to a target item, thus assessing objective rather than perceived health literacy. The 20-item scale was administered to 1816 university students and 1751 responded. Cronbach's α coefficient was .73. Difficulty and discrimination parameters of each item were estimated, resulting in the exclusion of 1 item. Some items showed different difficulty parameters for male and female participants, reflecting that some aspects of health literacy may differ by gender. The current 19-item version of funHLS-YA can reliably assess the objective health literacy of Japanese young adults.
NASA Astrophysics Data System (ADS)
Rahmani, B. D.
2018-01-01
The purpose of this paper is to evaluate Indonesian senior high school teacher’s pedagogical content knowledge also their perception toward curriculum changing in West Java Indonesia. The data used in this study were derived from a questionnaire survey conducted among teachers in Bandung, West Java. A total of 61 usable responses were collected. The Differential Item Functioning (DIFF) was used to analyze the data whether the item had a difference or not toward gender, education background also on school location. However, the result showed that there was no any significant difference on gender and school location toward the item response but educational background. As a conclusion, the teacher’s educational background influence on giving the response to the questionnaire. Therefore, it is suggested in the future to construct the items on the questionnaire which is coped the differences of the participant particularly the educational background.
Shen, Minxue; Cui, Yuanwu; Hu, Ming; Xu, Linyong
2017-01-13
The study aimed to validate a scale to assess the severity of "Yin deficiency, intestine heat" pattern of functional constipation based on the modern test theory. Pooled longitudinal data of 237 patients with "Yin deficiency, intestine heat" pattern of constipation from a prospective cohort study were used to validate the scale. Exploratory factor analysis was used to examine the common factors of items. A multidimensional item response model was used to assess the scale with the presence of multidimensionality. The Cronbach's alpha ranged from 0.79 to 0.89, and the split-half reliability ranged from 0.67 to 0.79 at different measurements. Exploratory factor analysis identified two common factors, and all items had cross factor loadings. Bidimensional model had better goodness of fit than the unidimensional model. Multidimensional item response model showed that the all items had moderate to high discrimination parameters. Parameters indicated that the first latent trait signified intestine heat, while the second trait characterized Yin deficiency. Information function showed that items demonstrated highest discrimination power among patients with moderate to high level of disease severity. Multidimensional item response theory provides a useful and rational approach in validating scales for assessing the severity of patterns in traditional Chinese medicine.
ERIC Educational Resources Information Center
Ayodele, Alicia Nicole
2017-01-01
Within polytomous items, differential item functioning (DIF) can take on various forms due to the number of response categories. The lack of invariance at this level is referred to as differential step functioning (DSF). The most common DSF methods in the literature are the adjacent category log odds ratio (AC-LOR) estimator and cumulative…
Tian, Feng; Ni, Pengsheng; Mulcahey, M J; Hambleton, Ronald K; Tulsky, David; Haley, Stephen M; Jette, Alan M
2014-11-01
To use item response theory (IRT) methods to link scores from 2 recently developed contemporary functional outcome measures, the adult Spinal Cord Injury-Functional Index (SCI-FI) and the Pedi SCI (both the parent version and the child version). Secondary data analysis of the physical functioning items of the adult SCI-FI and the Pedi SCI instruments. We used a nonequivalent group design with items common to both instruments and the Stocking-Lord method for the linking. Linking was conducted so that the adult SCI-FI and Pedi SCI scaled scores could be compared. Community. This study included a total sample of 1558 participants. Pedi SCI items were administered to a sample of children (n=381) with SCI aged 8 to 21 years, and of parents/caregivers (n=322) of children with SCI aged 4 to 21 years. Adult SCI-FI items were administered to a sample of adults (n=855) with SCI aged 18 to 92 years. Not applicable. Five scales common to both instruments were included in the analysis: Wheelchair, Daily Routine/Self-care, Daily Routine/Fine Motor, Ambulation, and General Mobility functioning. Confirmatory factor analysis and exploratory factor analysis results indicated that the 5 scales are unidimensional. A graded response model was used to calibrate the items. Misfitting items were identified and removed from the item banks. Items that function differently between the adult and child samples (ie, exhibit differential item functioning) were identified and removed from the common items used for linking. Domain scores from the Pedi SCI instruments were transformed onto the adult SCI-FI metric. This IRT linking allowed estimation of adult SCI-FI scale scores based on Pedi SCI scale scores and vice versa; therefore, it provides clinicians with a means of tracking long-term functional data for children with an SCI across their entire lifespan. Copyright © 2014 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Global, Local, and Graphical Person-Fit Analysis Using Person-Response Functions
ERIC Educational Resources Information Center
Emons, Wilco H. M.; Sijtsma, Klaas; Meijer, Rob R.
2005-01-01
Person-fit statistics test whether the likelihood of a respondent's complete vector of item scores on a test is low given the hypothesized item response theory model. This binary information may be insufficient for diagnosing the cause of a misfitting item-score vector. The authors propose a comprehensive methodology for person-fit analysis in the…
Thibodeau, Michel A; Leonard, Rachel C; Abramowitz, Jonathan S; Riemann, Bradley C
2015-12-01
The Dimensional Obsessive-Compulsive Scale (DOCS) is a promising measure of obsessive-compulsive disorder (OCD) symptoms but has received minimal psychometric attention. We evaluated the utility and reliability of DOCS scores. The study included 832 students and 300 patients with OCD. Confirmatory factor analysis supported the originally proposed four-factor structure. DOCS total and subscale scores exhibited good to excellent internal consistency in both samples (α = .82 to α = .96). Patient DOCS total scores reduced substantially during treatment (t = 16.01, d = 1.02). DOCS total scores discriminated between students and patients (sensitivity = 0.76, 1 - specificity = 0.23). The measure did not exhibit gender-based differential item functioning as tested by Mantel-Haenszel chi-square tests. Expected response options for each item were plotted as a function of item response theory and demonstrated that DOCS scores incrementally discriminate OCD symptoms ranging from low to extremely high severity. Incremental differences in DOCS scores appear to represent unbiased and reliable differences in true OCD symptom severity. © The Author(s) 2014.
Rose, Matthias; Bjorner, Jakob B; Gandek, Barbara; Bruce, Bonnie; Fries, James F; Ware, John E
2014-05-01
To document the development and psychometric evaluation of the Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function (PF) item bank and static instruments. The items were evaluated using qualitative and quantitative methods. A total of 16,065 adults answered item subsets (n>2,200/item) on the Internet, with oversampling of the chronically ill. Classical test and item response theory methods were used to evaluate 149 PROMIS PF items plus 10 Short Form-36 and 20 Health Assessment Questionnaire-Disability Index items. A graded response model was used to estimate item parameters, which were normed to a mean of 50 (standard deviation [SD]=10) in a US general population sample. The final bank consists of 124 PROMIS items covering upper, central, and lower extremity functions and instrumental activities of daily living. In simulations, a 10-item computerized adaptive test (CAT) eliminated floor and decreased ceiling effects, achieving higher measurement precision than any comparable length static tool across four SDs of the measurement range. Improved psychometric properties were transferred to the CAT's superior ability to identify differences between age and disease groups. The item bank provides a common metric and can improve the measurement of PF by facilitating the standardization of patient-reported outcome measures and implementation of CATs for more efficient PF assessments over a larger range. Copyright © 2014. Published by Elsevier Inc.
ERIC Educational Resources Information Center
Dimitrov, Dimiter M.
2017-01-01
This article offers an approach to examining differential item functioning (DIF) under its item response theory (IRT) treatment in the framework of confirmatory factor analysis (CFA). The approach is based on integrating IRT- and CFA-based testing of DIF and using bias-corrected bootstrap confidence intervals with a syntax code in Mplus.
ERIC Educational Resources Information Center
Gomez, Rapson
2012-01-01
Objective: Generalized partial credit model, which is based on item response theory (IRT), was used to test differential item functioning (DIF) for the "Diagnostic and Statistical Manual of Mental Disorders" (4th ed.), inattention (IA), and hyperactivity/impulsivity (HI) symptoms across boys and girls. Method: To accomplish this, parents completed…
Stepwise Analysis of Differential Item Functioning Based on Multiple-Group Partial Credit Model.
ERIC Educational Resources Information Center
Muraki, Eiji
1999-01-01
Extended an Item Response Theory (IRT) method for detection of differential item functioning to the partial credit model and applied the method to simulated data using a stepwise procedure. Then applied the stepwise DIF analysis based on the multiple-group partial credit model to writing trend data from the National Assessment of Educational…
Effects of Differential Item Functioning on Examinees' Test Performance and Reliability of Test
ERIC Educational Resources Information Center
Lee, Yi-Hsuan; Zhang, Jinming
2017-01-01
Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…
Assessment of Differential Item Functioning under Cognitive Diagnosis Models: The DINA Model Example
ERIC Educational Resources Information Center
Li, Xiaomin; Wang, Wen-Chung
2015-01-01
The assessment of differential item functioning (DIF) is routinely conducted to ensure test fairness and validity. Although many DIF assessment methods have been developed in the context of classical test theory and item response theory, they are not applicable for cognitive diagnosis models (CDMs), as the underlying latent attributes of CDMs are…
Reise, Steven P.; Ventura, Joseph; Keefe, Richard S. E.; Baade, Lyle E.; Gold, James M.; Green, Michael F.; Kern, Robert S.; Mesholam-Gately, Raquelle; Nuechterlein, Keith H.; Seidman, Larry J.; Bilder, Robert
2011-01-01
We conducted psychometric analyses of two interview-based measures of cognitive deficits: the 21-item Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS; Ventura et al., 2008), and the 20-item Schizophrenia Cognition Rating Scale (SCoRS; Keefe et al., 2006), which were administered on two occasions to a sample of people with schizophrenia. Traditional psychometrics, bifactor analysis, and item response theory (IRT) methods were used to explore item functioning, dimensionality, and to compare instruments. Despite containing similar item content, responses to the CGI-CogS demonstrated superior psychometric properties (e.g., higher item-intercorrelations, better spread of ratings across response categories), relative to the SCoRS. We argue that these differences arise mainly from the differential use of prompts and how the items are phrased and scored. Bifactor analysis demonstrated that although both measures capture a broad range of cognitive functioning (e.g., working memory, social cognition), the common variance on each is overwhelmingly explained by a single general factor. IRT analyses of the combined pool of 41 items showed that measurement precision is peaked in the mild to moderate range of cognitive impairment. Finally, simulated adaptive testing revealed that only about 10 to 12 items are necessary to achieve latent trait level estimates with reasonably small standard errors for most individuals. This suggests that these interview-based measures of cognitive deficits could be shortened without loss of measurement precision. PMID:21381848
A Model-Free Diagnostic for Single-Peakedness of Item Responses Using Ordered Conditional Means.
Polak, Marike; de Rooij, Mark; Heiser, Willem J
2012-09-01
In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone & Chave's (1929) criterion of irrelevance, which is a graphical, exploratory method for evaluating the "relevance" of dichotomous attitude items. We generalized this criterion to graded response items and quantified the relevance by fitting a unimodal smoother. The resulting goodness-of-fit was used to determine item fit and aggregated scale fit. Based on a simulation procedure, cutoff values were proposed for the measures of item fit. These cutoff values showed high power rates and acceptable Type I error rates. We present 2 applications of the OCM method. First, we apply the OCM method to personality data from the Developmental Profile; second, we analyze attitude data collected by Roberts and Laughlin (1996) concerning opinions of capital punishment.
Are Teacher Course Evaluations Biased against Faculty That Teach Quantitative Methods Courses?
ERIC Educational Resources Information Center
Royal, Kenneth D.; Stockdale, Myrah R.
2015-01-01
The present study investigated graduate students' responses to teacher/course evaluations (TCE) to determine if students' responses were inherently biased against faculty who teach quantitative methods courses. Item response theory (IRT) and Differential Item Functioning (DIF) techniques were utilized for data analysis. Results indicate students…
Adjusting for cross-cultural differences in computer-adaptive tests of quality of life.
Gibbons, C J; Skevington, S M
2018-04-01
Previous studies using the WHOQOL measures have demonstrated that the relationship between individual items and the underlying quality of life (QoL) construct may differ between cultures. If unaccounted for, these differing relationships can lead to measurement bias which, in turn, can undermine the reliability of results. We used item response theory (IRT) to assess differential item functioning (DIF) in WHOQOL data from diverse language versions collected in UK, Zimbabwe, Russia, and India (total N = 1332). Data were fitted to the partial credit 'Rasch' model. We used four item banks previously derived from the WHOQOL-100 measure, which provided excellent measurement for physical, psychological, social, and environmental quality of life domains (40 items overall). Cross-cultural differential item functioning was assessed using analysis of variance for item residuals and post hoc Tukey tests. Simulated computer-adaptive tests (CATs) were conducted to assess the efficiency and precision of the four items banks. Splitting item parameters by DIF results in four linked item banks without DIF or other breaches of IRT model assumptions. Simulated CATs were more precise and efficient than longer paper-based alternatives. Assessing differential item functioning using item response theory can identify measurement invariance between cultures which, if uncontrolled, may undermine accurate comparisons in computer-adaptive testing assessments of QoL. We demonstrate how compensating for DIF using item anchoring allowed data from all four countries to be compared on a common metric, thus facilitating assessments which were both sensitive to cultural nuance and comparable between countries.
ERIC Educational Resources Information Center
Finch, Holmes
2011-01-01
Methods of uniform differential item functioning (DIF) detection have been extensively studied in the complete data case. However, less work has been done examining the performance of these methods when missing item responses are present. Research that has been done in this regard appears to indicate that treating missing item responses as…
Tsang, Siny; Schmidt, Karen M.; Vincent, Gina M.; Salekin, Randall T.; Moretti, Marlene M.; Odgers, Candice L.
2014-01-01
This study used an item response theory (IRT) model and a large adolescent sample of justice involved youth (N = 1,007, 38% female) to examine the item functioning of the Psychopathy Checklist – Youth Version (PCL: YV). Items that were most discriminating (or most sensitive to changes) of the latent trait (thought to be psychopathy) among adolescents included “Glibness/superficial charm”, “Lack of remorse”, and “Need for stimulation”, whereas items that were least discriminating included “Pathological lying”, “Failure to accept responsibility”, and “Lacks goals.” The items “Impulsivity” and “Irresponsibility” were the most likely to be rated high among adolescents, whereas “Parasitic lifestyle”, and “Glibness/superficial charm” were the most likely to be rated low. Evidence of differential item functioning (DIF) on four of the 13 items was found between boys and girls. “Failure to accept responsibility” and “Impulsivity” were endorsed more frequently to describe adolescent girls than boys at similar levels of the latent trait, and vice versa for “Grandiose sense of self-worth” and “Lacks goals.” The DIF findings suggest that four PCL: YV items function differently between boys and girls. PMID:25580672
Rasch measurement: the Arm Activity measure (ArmA) passive function sub-scale.
Ashford, Stephen; Siegert, Richard J; Alexandrescu, Roxana
2016-01-01
To evaluate the conformity of the Arm Activity measure (ArmA) passive function sub-scale to the Rasch model. A consecutive cohort of patients (n = 92) undergoing rehabilitation, including upper limb rehabilitation and spasticity management, at two specialist rehabilitation units were included. Rasch analysis was used to examine scaling and conformity to the model. Responses were analysed using Rasch unidimensional measurement models (RUMM 2030). The following aspects were considered: overall model and individual item fit statistics and fit residuals, internal reliability, item response threshold ordering, item bias, local dependency and unidimensionality. ArmA contains both active and passive function sub-scales, but in this analysis only the passive function sub-scale was considered. Four of the seven items in the ArmA passive function sub-scale initially had disordered thresholds. These items were rescored to four response options, which resulted in ordered thresholds for all items. Once the items with disordered thresholds had been rescored, item bias was not identified for age, global disability level or diagnosis, but with a small difference in difficulty between males and females for one item of the scale. Local dependency was not observed and the unidimensionality of the sub-scale was supported and good fit to the Rasch model was identified. The person separation index (PSI) was 0.95 indicating that the scale is able to reliably differentiate at least two groups of patients. The ArmA passive function sub-scale was shown in this evaluation to conform to the Rasch model once disordered thresholds had been addressed. Using the logit scores produced by the Rasch model it was possible to convert this back to the original scale range. Implications for Rehabilitation The ArmA passive function sub-scale was shown, in this evaluation, to conform to the Rasch model once disordered thresholds had been addressed and therefore to be a clinically applicable and potentially useful hierarchical measure. Using Rasch logit scores it has be possible to convert back to the original ordinal scale range and provide an indication of real change to enable evaluation of clinical outcome of importance to patients and clinicians.
Deng, Nina; Anatchkova, Milena D; Waring, Molly E; Han, Kyung T; Ware, John E
2015-08-01
The Quality-of-life (QOL) Disease Impact Scale (QDIS(®)) standardizes the content and scoring of QOL impact attributed to different diseases using item response theory (IRT). This study examined the IRT invariance of the QDIS-standardized IRT parameters in an independent sample. The differential functioning of items and test (DFIT) of a static short-form (QDIS-7) was examined across two independent sources: patients hospitalized for acute coronary syndrome (ACS) in the TRACE-CORE study (N = 1,544) and chronically ill US adults in the QDIS standardization sample. "ACS-specific" IRT item parameters were calibrated and linearly transformed to compare to "standardized" IRT item parameters. Differences in IRT model-expected item, scale and theta scores were examined. The DFIT results were also compared in a standard logistic regression differential item functioning analysis. Item parameters estimated in the ACS sample showed lower discrimination parameters than the standardized discrimination parameters, but only small differences were found for thresholds parameters. In DFIT, results on the non-compensatory differential item functioning index (range 0.005-0.074) were all below the threshold of 0.096. Item differences were further canceled out at the scale level. IRT-based theta scores for ACS patients using standardized and ACS-specific item parameters were highly correlated (r = 0.995, root-mean-square difference = 0.09). Using standardized item parameters, ACS patients scored one-half standard deviation higher (indicating greater QOL impact) compared to chronically ill adults in the standardization sample. The study showed sufficient IRT invariance to warrant the use of standardized IRT scoring of QDIS-7 for studies comparing the QOL impact attributed to acute coronary disease and other chronic conditions.
Item Response Theory and Health Outcomes Measurement in the 21st Century
Hays, Ron D.; Morales, Leo S.; Reise, Steve P.
2006-01-01
Item response theory (IRT) has a number of potential advantages over classical test theory in assessing self-reported health outcomes. IRT models yield invariant item and latent trait estimates (within a linear transformation), standard errors conditional on trait level, and trait estimates anchored to item content. IRT also facilitates evaluation of differential item functioning, inclusion of items with different response formats in the same scale, and assessment of person fit and is ideally suited for implementing computer adaptive testing. Finally, IRT methods can be helpful in developing better health outcome measures and in assessing change over time. These issues are reviewed, along with a discussion of some of the methodological and practical challenges in applying IRT methods. PMID:10982088
Sideridis, Georgios D.; Tsaousis, Ioannis; Al Harbi, Khaleel
2016-01-01
The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction. PMID:27790174
Sideridis, Georgios D; Tsaousis, Ioannis; Al Harbi, Khaleel
2016-01-01
The purpose of the present study was to relate response strategy with person ability estimates. Two behavioral strategies were examined: (a) the strategy to skip items in order to save time on timed tests, and, (b) the strategy to select two responses on an item, with the hope that one of them may be considered correct. Participants were 4,422 individuals who were administered a standardized achievement measure related to math, biology, chemistry, and physics. In the present evaluation, only the physics subscale was employed. Two analyses were conducted: (a) a person-based one to identify differences between groups and potential correlates of those differences, and, (b) a measure-based analysis in order to identify the parts of the measure that were responsible for potential group differentiation. For (a) person abilities the 2-PL model was employed and later the 3-PL and 4-PL models in order to estimate upper and lower asymptotes of person abilities. For (b) differential item functioning, differential test functioning, and differential distractor functioning were investigated. Results indicated that there were significant differences between groups with completers having the highest ability compared to both non-attempters and dual responders. There were no significant differences between no-attempters and dual responders. The present findings have implications for response strategy efficacy and measure evaluation, revision, and construction.
Samejima Items in Multiple-Choice Tests: Identification and Implications
ERIC Educational Resources Information Center
Rahman, Nazia
2013-01-01
Samejima hypothesized that non-monotonically increasing item response functions (IRFs) of ability might occur for multiple-choice items (referred to here as "Samejima items") if low ability test takers with some, though incomplete, knowledge or skill are drawn to a particularly attractive distractor, while very low ability test takers…
Holman, Rebecca; Glas, Cees AW; Lindeboom, Robert; Zwinderman, Aeilko H; de Haan, Rob J
2004-01-01
Background Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. Methods The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. Results The item and respondent population parameter estimates were very similar for the strategies involving hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The estimates obtained using the cold deck imputation method were substantially different. Conclusions The cold deck imputation method was not considered suitable for use in the ALDS item bank. The other three methods described can be usefully implemented in the ALDS item bank, depending on the purpose of the data analysis to be carried out. These three methods may be useful for other data sets examining similar constructs, when item response theory based methods are used. PMID:15200681
Item Response Theory Applied to Factors Affecting the Patient Journey Towards Hearing Rehabilitation
Chenault, Michelene; Berger, Martijn; Kremer, Bernd; Anteunis, Lucien
2016-01-01
To develop a tool for use in hearing screening and to evaluate the patient journey towards hearing rehabilitation, responses to the hearing aid rehabilitation questionnaire scales aid stigma, pressure, and aid unwanted addressing respectively hearing aid stigma, experienced pressure from others; perceived hearing aid benefit were evaluated with item response theory. The sample was comprised of 212 persons aged 55 years or more; 63 were hearing aid users, 64 with and 85 persons without hearing impairment according to guidelines for hearing aid reimbursement in the Netherlands. Bias was investigated relative to hearing aid use and hearing impairment within the differential test functioning framework. Items compromising model fit or demonstrating differential item functioning were dropped. The aid stigma scale was reduced from 6 to 4, the pressure scale from 7 to 4, and the aid unwanted scale from 5 to 4 items. This procedure resulted in bias-free scales ready for screening purposes and application to further understand the help-seeking process of the hearing impaired. PMID:28028428
NASA Astrophysics Data System (ADS)
Ishimoto, Michi; Davenport, Glen; Wittmann, Michael C.
2017-12-01
Student views of force and motion reflect the personal experiences and physics education of the student. With a different language, culture, and educational system, we expect that Japanese students' views on force and motion might be different from those of American students. The Force and Motion Conceptual Evaluation (FMCE) is an instrument used to probe student views on force and motion. It was designed using research on American students, and, as such, the items might function differently for Japanese students. Preliminary results from a translated version indicated that Japanese students had similar misconceptions as those of American students. In this study, we used item response curves (IRCs) to make more detailed item-by-item comparisons. IRCs show the functioning of individual items across all levels of performance by plotting the proportion of each response as a function of the total score. Most of the IRCs showed very similar patterns on both correct and incorrect responses; however, a few of the plots indicate differences between the populations. The similar patterns indicate that students tend to interact with FMCE items similarly, despite differences in culture, language, and education. We speculate about the possible causes for the differences in some of the IRCs. This report is intended to show how IRCs can be used as a part of the validation process when making comparisons across languages and nationalities. Differences in IRCs can help to pinpoint artifacts of translation, contextual effects because of differences in culture, and perhaps intrinsic differences in student understanding of Newtonian motion.
Methodology for the development and calibration of the SCI-QOL item banks
Tulsky, David S.; Kisala, Pamela A.; Victorson, David; Choi, Seung W.; Gershon, Richard; Heinemann, Allen W.; Cella, David
2015-01-01
Objective To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Methods Individual interviews (n = 44) and focus groups (n = 65 individuals with SCI and n = 42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n = 877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n = 245) to assess test-retest reliability and stability. Participants and Procedures A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. Results We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury – Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. Conclusions The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM. PMID:26010963
Methodology for the development and calibration of the SCI-QOL item banks.
Tulsky, David S; Kisala, Pamela A; Victorson, David; Choi, Seung W; Gershon, Richard; Heinemann, Allen W; Cella, David
2015-05-01
To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Individual interviews (n=44) and focus groups (n=65 individuals with SCI and n=42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n=877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n=245) to assess test-retest reliability and stability. A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury--Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM.
Thirty Years of Nonparametric Item Response Theory.
ERIC Educational Resources Information Center
Molenaar, Ivo W.
2001-01-01
Discusses relationships between a mathematical measurement model and its real-world applications. Makes a distinction between large-scale data matrices commonly found in educational measurement and smaller matrices found in attitude and personality measurement. Also evaluates nonparametric methods for estimating item response functions and…
Dual representation of item positions in verbal short-term memory: Evidence for two access modes.
Lange, Elke B; Verhaeghen, Paul; Cerella, John
Memory sets of N = 1~5 digits were exposed sequentially from left-to-right across the screen, followed by N recognition probes. Probes had to be compared to memory list items on identity only (Sternberg task) or conditional on list position. Positions were probed randomly or in left-to-right order. Search functions related probe response times to set size. Random probing led to ramped, "Sternbergian" functions whose intercepts were elevated by the location requirement. Sequential probing led to flat search functions-fast responses unaffected by set size. These results suggested that items in STM could be accessed either by a slow search-on-identity followed by recovery of an associated location tag, or in a single step by following item-to-item links in study order. It is argued that this dual coding of location information occurs spontaneously at study, and that either code can be utilised at retrieval depending on test demands.
Jahn, Danielle R; Dressel, Jeffrey A; Gavett, Brandon E; O'Bryant, Sid E
2015-01-01
The Executive Interview (EXIT25) is an effective measure of executive dysfunction, but may be inefficient due to the time it takes to complete 25 interview-based items. The current study aimed to examine psychometric properties of the EXIT25, with a specific focus on determining whether a briefer version of the measure could comprehensively assess executive dysfunction. The current study applied a graded response model (a type of item response theory model for polytomous categorical data) to identify items that were most closely related to the underlying construct of executive functioning and best discriminated between varying levels of executive functioning. Participants were 660 adults ages 40 to 96 years living in West Texas, who were recruited through an ongoing epidemiological study of rural health and aging, called Project FRONTIER. The EXIT25 was the primary measure examined. Participants also completed the Trail Making Test and Controlled Oral Word Association Test, among other measures, to examine the convergent validity of a brief form of the EXIT25. Eight items were identified that provided the majority of the information about the underlying construct of executive functioning; total scores on these items were associated with total scores on other measures of executive functioning and were able to differentiate between cognitively healthy, mildly cognitively impaired, and demented participants. In addition, cutoff scores were recommended based on sensitivity and specificity of scores. A brief, eight-item version of the EXIT25 may be an effective and efficient screening for executive dysfunction among older adults.
Wilkerson, Keith; McGahan, Joseph R; Stevens, Rick; Williamson, David; Low, Jean
2009-12-01
The goal of this study was to determine whether differential response formats to covariation problems influence corresponding response latencies. The authors provided participants with 3 trials of 16 statements addressing positive and negative relations between freedom and responsibility. The authors framed half of the items around responsibility given freedom and the other half around freedom given responsibility. Response formats comprised true-false, agree-disagree, and yes-no answers as a between-participants factor. Results indicated that the manipulation of response format did not affect latencies. However, latencies differed according to the framing of the items. For items framed around freedom given responsibility, latencies were shorter. In addition, participants were more likely to report a positive relation between freedom and responsibility when items were framed around freedom given responsibility. The authors discuss implications relative to previous research in this area and give recommendations for future research.
Improving measures of work-related physical functioning.
McDonough, Christine M; Ni, Pengsheng; Peterik, Kara; Marfeo, Elizabeth E; Marino, Molly E; Meterko, Mark; Rasch, Elizabeth K; Brandt, Diane E; Jette, Alan M; Chan, Leighton
2017-03-01
To expand content of the physical function domain of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration's (SSA) disability determination process. Newly developed questions were administered to 3532 recent SSA applicants for work disability benefits and 2025 US adults. Factor analyses and item response theory (IRT) methods were used to calibrate and link the new items to the existing WD-FAB, and computer-adaptive test simulations were conducted. Factor and IRT analyses supported integration of 44 new items into three existing WD-FAB scales and the addition of a new 11-item scale (Community Mobility). The final physical function domain consisting of: Basic Mobility (56 items), Upper Body Function (34 items), Fine Motor Function (45 items), and Community Mobility (11 items) demonstrated acceptable psychometric properties. The WD-FAB offers an important tool for enhancement of work disability determination. The FAB could provide relevant information about work-related functioning for initial assessment of claimants; identifying denied applicants who may benefit from interventions to improve work and health outcomes; enhancing periodic review of work disability beneficiaries; and assessing outcomes for policies, programs and services targeting people with work disability.
Improving Measures of Work-Related Physical Functioning
McDonough, Christine M.; Ni, Pengsheng; Peterik, Kara; Marfeo, Elizabeth E.; Marino, Molly E.; Meterko, Mark; Rasch, Elizabeth K; Brandt, Diane E.; Jette, Alan M; Chan, Leighton
2016-01-01
Purpose To expand content of the physical function domain of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration’s (SSA) disability determination process. Methods Newly developed questions were administered to 3,532 recent SSA applicants for work disability benefits and 2,025 US adults. Factor analyses and item response theory (IRT) methods were used to calibrate and link the new items to existing WD-FAB, and computer-adaptive test simulations were conducted. Results Factor and IRT analyses supported integration of 44 new items into 3 existing WD-FAB scales and the addition of a new 11-item scale (Community Mobility). The final physical function domain consisting of: Basic Mobility (56 items), Upper Body Function (34 items), Fine Motor Function (45 items), and Community Mobility (11 items) demonstrated acceptable psychometric properties. Conclusions The WD-FAB offers an important tool for enhancement of work disability determination. The FAB could provide relevant information about work-related functioning for initial assessment of claimants, identifying denied applicants who may benefit from interventions to improve work and health outcomes; enhancing periodic review of work disability beneficiaries; and assessing outcomes for policies, programs and services targeting people with work disability. PMID:28005243
Item Response Theory Analysis of the Psychopathic Personality Inventory-Revised.
Eichenbaum, Alexander E; Marcus, David K; French, Brian F
2017-06-01
This study examined item and scale functioning in the Psychopathic Personality Inventory-Revised (PPI-R) using an item response theory analysis. PPI-R protocols from 1,052 college student participants (348 male, 704 female) were analyzed. Analyses were conducted on the 131 self-report items comprising the PPI-R's eight content scales, using a graded response model. Scales collected a majority of their information about respondents possessing higher than average levels of the traits being measured. Each scale contained at least some items that evidenced limited ability to differentiate between respondents with differing levels of the trait being measured. Moreover, 80 items (61.1%) yielded significantly different responses between men and women presumably possessing similar levels of the trait being measured. Item performance was also influenced by the scoring format (directly scored vs. reverse-scored) of the items. Overall, the results suggest that the PPI-R, despite identifying psychopathic personality traits in individuals possessing high levels of those traits, may not identify these traits equally well for men and women, and scores are likely influenced by the scoring format of the individual item and scale.
Wang, Jing-Jing; Chen, Tzu-An; Baranowski, Tom; Lau, Patrick W C
2017-09-16
This study aimed to evaluate the psychometric properties of four self-efficacy scales (i.e., self-efficacy for fruit (FSE), vegetable (VSE), and water (WSE) intakes, and physical activity (PASE)) and to investigate their differences in item functioning across sex, age, and body weight status groups using item response modeling (IRM) and differential item functioning (DIF). Four self-efficacy scales were administrated to 763 Hong Kong Chinese children (55.2% boys) aged 8-13 years. Classical test theory (CTT) was used to examine the reliability and factorial validity of scales. IRM was conducted and DIF analyses were performed to assess the characteristics of item parameter estimates on the basis of children's sex, age and body weight status. All self-efficacy scales demonstrated adequate to excellent internal consistency reliability (Cronbach's α: 0.79-0.91). One FSE misfit item and one PASE misfit item were detected. Small DIF were found for all the scale items across children's age groups. Items with medium to large DIF were detected in different sex and body weight status groups, which will require modification. A Wright map revealed that items covered the range of the distribution of participants' self-efficacy for each scale except VSE. Several self-efficacy scales' items functioned differently by children's sex and body weight status. Additional research is required to modify the four self-efficacy scales to minimize these moderating influences for application.
Item Response Theory Analyses of the Cambridge Face Memory Test (CFMT)
Cho, Sun-Joo; Wilmer, Jeremy; Herzmann, Grit; McGugin, Rankin; Fiset, Daniel; Van Gulick, Ana E.; Ryan, Katie; Gauthier, Isabel
2014-01-01
We evaluated the psychometric properties of the Cambridge face memory test (CFMT; Duchaine & Nakayama, 2006). First, we assessed the dimensionality of the test with a bi-factor exploratory factor analysis (EFA). This EFA analysis revealed a general factor and three specific factors clustered by targets of CFMT. However, the three specific factors appeared to be minor factors that can be ignored. Second, we fit a unidimensional item response model. This item response model showed that the CFMT items could discriminate individuals at different ability levels and covered a wide range of the ability continuum. We found the CFMT to be particularly precise for a wide range of ability levels. Third, we implemented item response theory (IRT) differential item functioning (DIF) analyses for each gender group and two age groups (Age ≤ 20 versus Age > 21). This DIF analysis suggested little evidence of consequential differential functioning on the CFMT for these groups, supporting the use of the test to compare older to younger, or male to female, individuals. Fourth, we tested for a gender difference on the latent facial recognition ability with an explanatory item response model. We found a significant but small gender difference on the latent ability for face recognition, which was higher for women than men by 0.184, at age mean 23.2, controlling for linear and quadratic age effects. Finally, we discuss the practical considerations of the use of total scores versus IRT scale scores in applications of the CFMT. PMID:25642930
Multidimensional Extension of Multiple Indicators Multiple Causes Models to Detect DIF
ERIC Educational Resources Information Center
Lee, Soo; Bulut, Okan; Suh, Youngsuk
2017-01-01
A number of studies have found multiple indicators multiple causes (MIMIC) models to be an effective tool in detecting uniform differential item functioning (DIF) for individual items and item bundles. A recently developed MIMIC-interaction model is capable of detecting both uniform and nonuniform DIF in the unidimensional item response theory…
Waller, Niels G; Feuerstahler, Leah
2017-01-01
In this study, we explored item and person parameter recovery of the four-parameter model (4PM) in over 24,000 real, realistic, and idealized data sets. In the first analyses, we fit the 4PM and three alternative models to data from three Minnesota Multiphasic Personality Inventory-Adolescent form factor scales using Bayesian modal estimation (BME). Our results indicated that the 4PM fits these scales better than simpler item Response Theory (IRT) models. Next, using the parameter estimates from these real data analyses, we estimated 4PM item parameters in 6,000 realistic data sets to establish minimum sample size requirements for accurate item and person parameter recovery. Using a factorial design that crossed discrete levels of item parameters, sample size, and test length, we also fit the 4PM to an additional 18,000 idealized data sets to extend our parameter recovery findings. Our combined results demonstrated that 4PM item parameters and parameter functions (e.g., item response functions) can be accurately estimated using BME in moderate to large samples (N ⩾ 5, 000) and person parameters can be accurately estimated in smaller samples (N ⩾ 1, 000). In the supplemental files, we report annotated [Formula: see text] code that shows how to estimate 4PM item and person parameters in [Formula: see text] (Chalmers, 2012 ).
Fries, J F; Bruce, B; Bjorner, J; Rose, M
2006-01-01
Objectives Patient reported outcomes (PROs) have become standard study endpoints. However, little attention has been given to using item improvement to advance PRO performance which could improve precision, clarity, patient relevance, and information content of “physical function/disability” items and thus the performance of resulting instruments. Methods The present study included1860 physical function/disability items from 165 instruments. Item formulations were assessed by frequency of use, modified Delphi consensus, respondent judgement of clarity and importance, and item response theory (IRT). Data from 1100 rheumatoid arthritis, osteoarthritis, and normal ageing subjects, using qualitative item review, focus groups, cognitive interviews, and patient survey were used to achieve a unique item pool that was clear, reliable, sensitive to change, readily translatable, devoid of floor and ceiling limitations, contained unidimensional subdomains, and had maximal information content. Results A “present tense” time frame was used most frequently, better understood, more readily translated, and more directly estimated the latent trait of disability. Items in the “past tense” had 80–90% false negatives (p<0.001). The best items were brief, clear, and contained a single construct. Responses with four to five options were preferred by both experts and respondents. The term physical function may be preferable to the term disability because of fewer floor effects. IRT analyses of “disability” suggest four independent subdomains (mobility, dexterity, axial, and compound) with factor loadings of 0.81–0.99. Conclusions Major improvement in performance of items and instruments is possible, and may have the effect of substantially reducing sample size requirements for clinical trials. PMID:17038464
Crins, Martine H P; Terwee, Caroline B; Klausch, Thomas; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis A; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Roorda, Leo D
2017-07-01
The objective of this study was to assess the psychometric properties of the Dutch-Flemish Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank in Dutch patients with chronic pain. A bank of 121 items was administered to 1,247 Dutch patients with chronic pain. Unidimensionality was assessed by fitting a one-factor confirmatory factor analysis and evaluating resulting fit statistics. Items were calibrated with the graded response model and its fit was evaluated. Cross-cultural validity was assessed by testing items for differential item functioning (DIF) based on language (Dutch vs. English). Construct validity was evaluated by calculation correlations between scores on the Dutch-Flemish PROMIS Physical Function measure and scores on generic and disease-specific measures. Results supported the Dutch-Flemish PROMIS Physical Function item bank's unidimensionality (Comparative Fit Index = 0.976, Tucker Lewis Index = 0.976) and model fit. Item thresholds targeted a wide range of physical function construct (threshold-parameters range: -4.2 to 5.6). Cross-cultural validity was good as four items only showed DIF for language and their impact on item scores was minimal. Physical Function scores were strongly associated with scores on all other measures (all correlations ≤ -0.60 as expected). The Dutch-Flemish PROMIS Physical Function item bank exhibited good psychometric properties. Development of a computer adaptive test based on the large bank is warranted. Copyright © 2017 Elsevier Inc. All rights reserved.
Health and role functioning: the use of focus groups in the development of an item bank.
Anatchkova, Milena D; Bjorner, Jakob B
2010-02-01
Role functioning is an important part of health-related quality of life. However, assessment of role functioning is complicated by the wide definition of roles and by fluctuations in role participation across the life-span. The aim of this study is to explore variations in role functioning across the lifespan using qualitative approaches, to inform the development of a role functioning item bank and to pilot test sample items from the bank. Eight focus groups were conducted with a convenience sample of 38 English-speaking adults recruited in Rhode Island. Participants were stratified by gender and four age groups. Focus groups were taped, transcribed, and analyzed for thematic content. Participants of all ages identified family roles as the most important. There was age variation in the importance of social life roles, with younger and older adults rating them as more important. Occupational roles were identified as important by younger and middle-aged participants. The potential of health problems to affect role participation was recognized. Participants found the sample items easy to understand, response options identical in meaning and preferred five response choices. Participants identified key aspects of role functioning and provided insights on their perception of the impact of health on their role participation. These results will inform item bank generation.
Development and initial evaluation of the SCI-FI/AT
Jette, Alan M.; Slavin, Mary D.; Ni, Pengsheng; Kisala, Pamela A.; Tulsky, David S.; Heinemann, Allen W.; Charlifue, Susie; Tate, Denise G.; Fyffe, Denise; Morse, Leslie; Marino, Ralph; Smith, Ian; Williams, Steve
2015-01-01
Objectives To describe the domain structure and calibration of the Spinal Cord Injury Functional Index for samples using Assistive Technology (SCI-FI/AT) and report the initial psychometric properties of each domain. Design Cross sectional survey followed by computerized adaptive test (CAT) simulations. Setting Inpatient and community settings. Participants A sample of 460 adults with traumatic spinal cord injury (SCI) stratified by level of injury, completeness of injury, and time since injury. Interventions None Main outcome measure SCI-FI/AT Results Confirmatory factor analysis (CFA) and Item response theory (IRT) analyses identified 4 unidimensional SCI-FI/AT domains: Basic Mobility (41 items) Self-care (71 items), Fine Motor Function (35 items), and Ambulation (29 items). High correlations of full item banks with 10-item simulated CATs indicated high accuracy of each CAT in estimating a person's function, and there was high measurement reliability for the simulated CAT scales compared with the full item bank. SCI-FI/AT item difficulties in the domains of Self-care, Fine Motor Function, and Ambulation were less difficult than the same items in the original SCI-FI item banks. Conclusion With the development of the SCI-FI/AT, clinicians and investigators have available multidimensional assessment scales that evaluate function for users of AT to complement the scales available in the original SCI-FI. PMID:26010975
Development and initial evaluation of the SCI-FI/AT.
Jette, Alan M; Slavin, Mary D; Ni, Pengsheng; Kisala, Pamela A; Tulsky, David S; Heinemann, Allen W; Charlifue, Susie; Tate, Denise G; Fyffe, Denise; Morse, Leslie; Marino, Ralph; Smith, Ian; Williams, Steve
2015-05-01
To describe the domain structure and calibration of the Spinal Cord Injury Functional Index for samples using Assistive Technology (SCI-FI/AT) and report the initial psychometric properties of each domain. Cross sectional survey followed by computerized adaptive test (CAT) simulations. Inpatient and community settings. A sample of 460 adults with traumatic spinal cord injury (SCI) stratified by level of injury, completeness of injury, and time since injury. None SCI-FI/AT RESULTS: Confirmatory factor analysis (CFA) and Item response theory (IRT) analyses identified 4 unidimensional SCI-FI/AT domains: Basic Mobility (41 items) Self-care (71 items), Fine Motor Function (35 items), and Ambulation (29 items). High correlations of full item banks with 10-item simulated CATs indicated high accuracy of each CAT in estimating a person's function, and there was high measurement reliability for the simulated CAT scales compared with the full item bank. SCI-FI/AT item difficulties in the domains of Self-care, Fine Motor Function, and Ambulation were less difficult than the same items in the original SCI-FI item banks. With the development of the SCI-FI/AT, clinicians and investigators have available multidimensional assessment scales that evaluate function for users of AT to complement the scales available in the original SCI-FI.
Jafari, Peyman; Bagheri, Zahra; Ayatollahi, Seyyed Mohamad Taghi; Soltani, Zahra
2012-03-13
Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQL™ 4.0 Generic Core Scales. The PedsQL™ 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients.
Psychometric properties of the Triarchic Psychopathy Measure: An item response theory approach.
Shou, Yiyun; Sellbom, Martin; Xu, Jing
2018-05-01
There is cumulative evidence for the cross-cultural validity of the Triarchic Psychopathy Measure (TriPM; Patrick, 2010) among non-Western populations. Recent studies using correlational and regression analyses show promising construct validity of the TriPM in Chinese samples. However, little is known about the efficiency of items in TriPM in assessing the proposed latent traits. The current study evaluated the psychometric properties of the Chinese TriPM at the item level using item response theory analyses. It also examined the measurement invariance of the TriPM between the Chinese and the U.S. student samples by applying differential item functioning analyses under the item response theory framework. The results supported the unidimensional nature of the Disinhibition and Meanness scales. Both scales had a greater level of precision in the respective underlying constructs at the positive ends. The two scales, however, had several items that were weakly associated with their respective latent traits in the Chinese student sample. Boldness, on the other hand, was found to be multidimensional, and reflected a more normally distributed range of variation. The examination of measurement bias via differential item functioning analyses revealed that a number of items of the TriPM were not equivalent across the Chinese and the U.S. Some modification and adaptation of items might be considered for improving the precision of the TriPM for Chinese participants. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
ERIC Educational Resources Information Center
Banerjee, Jayanti; Papageorgiou, Spiros
2016-01-01
The research reported in this article investigates differential item functioning (DIF) in a listening comprehension test. The study explores the relationship between test-taker age and the items' language domains across multiple test forms. The data comprise test-taker responses (N = 2,861) to a total of 133 unique items, 46 items of which were…
Solving the measurement invariance anchor item problem in item response theory.
Meade, Adam W; Wright, Natalie A
2012-09-01
The efficacy of tests of differential item functioning (measurement invariance) has been well established. It is clear that when properly implemented, these tests can successfully identify differentially functioning (DF) items when they exist. However, an assumption of these analyses is that the metric for different groups is linked using anchor items that are invariant. In practice, however, it is impossible to be certain which items are DF and which are invariant. This problem of anchor items, or referent indicators, has long plagued invariance research, and a multitude of suggested approaches have been put forth. Unfortunately, the relative efficacy of these approaches has not been tested. This study compares 11 variations on 5 qualitatively different approaches from recent literature for selecting optimal anchor items. A large-scale simulation study indicates that for nearly all conditions, an easily implemented 2-stage procedure recently put forth by Lopez Rivas, Stark, and Chernyshenko (2009) provided optimal power while maintaining nominal Type I error. With this approach, appropriate anchor items can be easily and quickly located, resulting in more efficacious invariance tests. Recommendations for invariance testing are illustrated using a pedagogical example of employee responses to an organizational culture measure.
Combining item response theory with multiple imputation to equate health assessment questionnaires.
Gu, Chenyang; Gutman, Roee
2017-09-01
The assessment of patients' functional status across the continuum of care requires a common patient assessment tool. However, assessment tools that are used in various health care settings differ and cannot be easily contrasted. For example, the Functional Independence Measure (FIM) is used to evaluate the functional status of patients who stay in inpatient rehabilitation facilities, the Minimum Data Set (MDS) is collected for all patients who stay in skilled nursing facilities, and the Outcome and Assessment Information Set (OASIS) is collected if they choose home health care provided by home health agencies. All three instruments or questionnaires include functional status items, but the specific items, rating scales, and instructions for scoring different activities vary between the different settings. We consider equating different health assessment questionnaires as a missing data problem, and propose a variant of predictive mean matching method that relies on Item Response Theory (IRT) models to impute unmeasured item responses. Using real data sets, we simulated missing measurements and compared our proposed approach to existing methods for missing data imputation. We show that, for all of the estimands considered, and in most of the experimental conditions that were examined, the proposed approach provides valid inferences, and generally has better coverages, relatively smaller biases, and shorter interval estimates. The proposed method is further illustrated using a real data set. © 2016, The International Biometric Society.
Wang, Zonghua; Zhou, Juan; Luo, Xingli; Xu, Yan; She, Xi; Chen, Ling; Yin, Honghua; Wang, Xianyuan
2015-01-01
The impact of strabismus on visual function, self-image, self-esteem, and social interactions decrease health-related quality of life (HRQoL).The purpose of this study was to evaluate and refine the adult strabismus quality of life questionnaire (AS-20) by using Rasch analysis among Chinese adult patients with strabismus. We evaluated the fitness of the AS-20 with Rasch model in Chinese population by assessing unidimensionality, infit and outfit, person and item separation index and reliability, response ordering, targeting and differential item functioning (DIF). The overall AS-20 did not demonstrate unidimensional; however, it was achieved separately in the two Rasch-revised subscales: the psychosocial subscale (11 items) and the function subscale (9 items). The features of good targeting, optimal item infit and outfit, and no notable local dependence were found for each of the subscales. The rating scale was appropriate for the psychosocial subscale but a reduction to four response categories was required for the function subscale. No significant DIF were revealed for any demographic and clinical factors (e.g., age, gender, and strabismus types). The AS-20 was demonstrated by Rasch analysis to be a rigorous instrument for measuring health-related quality of life in Chinese strabismus patents if some revisions were made regarding the subscale construct and response options.
Wan, Li-ping; He, Run-lian; Ai, Yong-mei; Zhang, Hui-min; Xing, Min; Yang, Lin; Song, Yan-long; Yu, Hong-mei
2013-07-01
To introduce the Item Function Analysis(IFA) of Quality of Life- Alzheimer's disease(QOL-AD)Chinese version and to explore the feasibility of its application on Chinese patients with AD. Two hundred AD patients were interviewed and assessed by QOL-AD, through the stratified cluster sampling method. Multilog 7.03. was used for Item Function Analysis. Difference scale(a), difficulty scale(b)and Item Characteristic Curve(ICC) of each item of QOL-AD were provided. Different scales of the item 1, 7 were below 0.6, while all the others were above 0.6. As for ICC. The first and last lines for the other items were monotonic in which the two in between were in inverted V-shape, with very steep slopes, except for the item 1 and 7. Results form the IFA showed that QOL-AD was applicable to be used in the Chinese patients with AD.
Detecting DIF in Polytomous Items Using MACS, IRT and Ordinal Logistic Regression
ERIC Educational Resources Information Center
Elosua, Paula; Wells, Craig
2013-01-01
The purpose of the present study was to compare the Type I error rate and power of two model-based procedures, the mean and covariance structure model (MACS) and the item response theory (IRT), and an observed-score based procedure, ordinal logistic regression, for detecting differential item functioning (DIF) in polytomous items. A simulation…
Application of a Method of Estimating DIF for Polytomous Test Items.
ERIC Educational Resources Information Center
Camilli, Gregory; Congdon, Peter
1999-01-01
Demonstrates a method for studying differential item functioning (DIF) that can be used with dichotomous or polytomous items and that is valid for data that follow a partial credit Item Response Theory model. A simulation study shows that positively biased Type I error rates are in accord with results from previous studies. (SLD)
Lambert, Michael Canute; Ferguson, Gail M; Rowan, George T
2016-03-01
Cross-national study of adolescents' psychological adjustment requires measures that permit reliable and valid assessment across informants and nations, but such measures are virtually nonexistent. Item-response-theory-based linking is a promising yet underutilized methodological procedure that permits more accurate assessment across informants and nations. To demonstrate this procedure, the Resilience Scale of the Behavioral Assessment for Children of African Heritage (Lambert et al., 2005) was administered to 250 African American and 294 Jamaican nonreferred adolescents and their caregivers. Multiple items without significant differential item functioning emerged, allowing scale linking across informants and nations. Calibrating item parameters via item response theory linking can permit cross-informant cross-national assessment of youth. (c) 2016 APA, all rights reserved).
Forrest, Christopher B; Devine, Janine; Bevans, Katherine B; Becker, Brandon D; Carle, Adam C; Teneralli, Rachel E; Moon, JeanHee; Tucker, Carole A; Ravens-Sieberer, Ulrike
2018-01-01
To describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Life Satisfaction item banks, child-report, and parent-proxy editions. A pool of 55 life satisfaction items was administered to 1992 children 8-17 years old and 964 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and assessment of construct validity. Thirteen items were deleted because of poor psychometric performance. An 8-item short form was administered to a national sample of 996 children 8-17 years old, and 1294 parents of children 5-17 years old. The combined sample (2988 children and 2258 parents) was used in item response theory (IRT) calibration analyses. The final item banks were unidimensional, the items were locally independent, and the items were free from impactful differential item functioning. The 8-item and 4-item short form scales showed excellent reliability, convergent validity, and discriminant validity. Life satisfaction decreased with declining socio-economic status, presence of a special health care need, and increasing age for girls, but not boys. After IRT calibration, we found that 4- and 8-item short forms had a high degree of precision (reliability) across a wide range (>4 SD units) of the latent variable. The PROMIS Pediatric Life Satisfaction item banks and their short forms provide efficient, precise, and valid assessments of life satisfaction in children and youth.
Forrest, Christopher B; Ravens-Sieberer, Ulrike; Devine, Janine; Becker, Brandon D; Teneralli, Rachel; Moon, JeanHee; Carle, Adam; Tucker, Carole A; Bevans, Katherine B
2018-03-01
The purpose of this study is to describe the psychometric evaluation and item response theory calibration of the PROMIS Pediatric Positive Affect item bank, child-report and parent-proxy editions. The initial item pool comprising 53 items, previously developed using qualitative methods, was administered to 1,874 children 8-17 years old and 909 parents of children 5-17 years old. Analyses included descriptive statistics, reliability, factor analysis, differential item functioning, and construct validity. A total of 14 items were deleted, because of poor psychometric performance, and an 8-item short form constructed from the remaining 39 items was administered to a national sample of 1,004 children 8-17 years old, and 1,306 parents of children 5-17 years old. The combined sample was used in item response theory (IRT) calibration analyses. The final item bank appeared unidimensional, the items appeared locally independent, and the items were free from differential item functioning. The scales showed excellent reliability and convergent and discriminant validity. Positive affect decreased with children's age and was lower for those with a special health care need. After IRT calibration, we found that 4 and 8 item short forms had a high degree of precision (reliability) across a wide range of the latent trait (>4 SD units). The PROMIS Pediatric Positive Affect item bank and its short forms provide an efficient, precise, and valid assessment of positive affect in children and youth.
Murray, Aja Louise; Allison, Carrie; Smith, Paula L; Baron-Cohen, Simon; Booth, Tom; Auyeung, Bonnie
2017-05-01
Diagnostic bias is a concern in autism spectrum conditions (ASC) where prevalence and presentation differ by sex. To ensure that females with ASC are not under-identified, it is important that ASC screening tools do not systematically underestimate autistic traits in females relative to males. We evaluated whether the AQ-10, a brief screen for ASC recommended by the National Institute of Clinical Excellence in cases of suspected ASC, exhibits such a bias. Using an item response theory approach, we evaluated differential item functioning and differential test functioning. We found that although individual items showed some sex bias, these biases at times favored males and at other times favored females. Thus, at the level of test scores the item-level biases cancelled out to give an unbiased overall score. Results support the continued use of the AQ-10 sum score in its current form; however, suggest that caution should be exercised when interpreting responses to individual items. The nature of the item level biases could serve as a guide for future research into how ASC affects males and females differently. Autism Res 2017, 10: 790-800. © 2016 International Society for Autism Research, Wiley Periodicals, Inc. © 2016 International Society for Autism Research, Wiley Periodicals, Inc.
Using the Rasch Measurement Model in Psychometric Analysis of the Family Effectiveness Measure
McCreary, Linda L.; Conrad, Karen M.; Conrad, Kendon J.; Scott, Christy K; Funk, Rodney R.; Dennis, Michael L.
2013-01-01
Background Valid assessment of family functioning can play a vital role in optimizing client outcomes. Because family functioning is influenced by family structure, socioeconomic context, and culture, existing measures of family functioning--primarily developed with nuclear, middle class European American families--may not be valid assessments of families in diverse populations. The Family Effectiveness Measure was developed to address this limitation. Objectives To test the Family Effectiveness Measure with data from a primarily low-income African American convenience sample, using the Rasch measurement model. Method A sample of 607 adult women completed the measure. Rasch analysis was used to assess unidimensionality, response category functioning, item fit, person reliability, differential item functioning by race and parental status, and item hierarchy. Criterion-related validity was tested using correlations with five other variables related to family functioning. Results The Family Effectiveness Measure measures two separate constructs: The effective family functioning construct was a psychometrically sound measure of the target construct that was more efficient due to the deletion of 22 items. The ineffective family functioning construct consisted of 16 of those deleted items but was not as strong psychometrically. Items in both constructs evidenced no differential item functioning by race. Criterion-related validity was supported for both. Discussion In contrast to the prevailing conceptualization that family functioning is a single construct, assessed by positively and negatively worded items, use of the Rasch analysis suggested the existence of two constructs. While the effective family functioning is a strong and efficient measure of family functioning, the ineffective family functioning will require additional item development and psychometric testing. PMID:23636342
Montpetit, Kathleen; Haley, Stephen; Bilodeau, Nathalie; Ni, Pengsheng; Tian, Feng; Gorton, George; Mulcahey, M J
2011-02-01
This article reports on the content range and measurement precision of an upper extremity (UE) computer adaptive testing (CAT) platform of physical function in children with cerebral palsy. Upper extremity items representing skills of all abilities were administered to 305 parents. These responses were compared with two traditional standardized measures: Pediatric Outcomes Data Collection Instrument and Functional Independence Measure for Children. The UE CAT correlated strongly with the upper extremity component of these measures and had greater precision when describing individual functional ability. The UE item bank has wider range with items populating the lower end of the ability spectrum. This new UE item bank and CAT have the capability to quickly assess children of all ages and abilities with good precision and, most importantly, with items that are meaningful and appropriate for their age and level of physical function.
Oude Voshaar, Martijn A H; Ten Klooster, Peter M; Vonkeman, Harald E; van de Laar, Mart A F J
2017-11-01
Traditional patient-reported physical function instruments often poorly differentiate patients with mild-to-moderate disability. We describe the development and psychometric evaluation of a generic item bank for measuring everyday activity limitations in outpatient populations. Seventy-two items generated from patient interviews and mapped to the International Classification of Functioning, Disability and Health (ICF) domestic life chapter were administered to 1128 adults representative of the Dutch population. The partial credit model was fitted to the item responses and evaluated with respect to its assumptions, model fit, and differential item functioning (DIF). Measurement performance of a computerized adaptive testing (CAT) algorithm was compared with the SF-36 physical functioning scale (PF-10). A final bank of 41 items was developed. All items demonstrated acceptable fit to the partial credit model and measurement invariance across age, sex, and educational level. Five- and ten-item CAT simulations were shown to have high measurement precision, which exceeded that of SF-36 physical functioning scale across the physical function continuum. Floor effects were absent for a 10-item empirical CAT simulation, and ceiling effects were low (13.5%) compared with SF-36 physical functioning (38.1%). CAT also discriminated better than SF-36 physical functioning between age groups, number of chronic conditions, and respondents with or without rheumatic conditions. The Rasch assessment of everyday activity limitations (REAL) item bank will hopefully prove a useful instrument for assessing everyday activity limitations. T-scores obtained using derived measures can be used to benchmark physical function outcomes against the general Dutch adult population.
ERIC Educational Resources Information Center
Samejima, Fumiko; Changas, Paul S.
The methods and approaches for estimating the operating characteristics of the discrete item responses without assuming any mathematical form have been developed and expanded. It has been made possible that, even if the test information function of a given test is not constant for the interval of ability of interest, it is used as the Old Test.…
ERIC Educational Resources Information Center
Huynh, Huynh
By noting that a Rasch or two parameter logistic (2PL) item belongs to the exponential family of random variables and that the probability density function (pdf) of the correct response (X=1) and the incorrect response (X=0) are symmetric with respect to the vertical line at the item location, it is shown that the conjugate prior for ability is…
Bacci, Elizabeth D; Staniewska, Dorota; Coyne, Karin S; Boyer, Stacey; White, Leigh Ann; Zach, Neta; Cedarbaum, Jesse M
2016-01-01
Our objective was to examine dimensionality and item-level performance of the Amyotrophic Lateral Sclerosis Functional Rating Scale-Revised (ALSFRS-R) across time using classical and modern test theory approaches. Confirmatory factor analysis (CFA) and Item Response Theory (IRT) analyses were conducted using data from patients with amyotrophic lateral sclerosis (ALS) Pooled Resources Open-Access ALS Clinical Trials (PRO-ACT) database with complete ALSFRS-R data (n = 888) at three time-points (Time 0, Time 1 (6-months), Time 2 (1-year)). Results demonstrated that in this population of 888 patients, mean age was 54.6 years, 64.4% were male, and 93.7% were Caucasian. The CFA supported a 4* individual-domain structure (bulbar, gross motor, fine motor, and respiratory domains). IRT analysis within each domain revealed misfitting items and overlapping item response category thresholds at all time-points, particularly in the gross motor and respiratory domain items. Results indicate that many of the items of the ALSFRS-R may sub-optimally distinguish among varying levels of disability assessed by each domain, particularly in patients with less severe disability. Measure performance improved across time as patient disability severity increased. In conclusion, modifications to select ALSFRS-R items may improve the instrument's specificity to disability level and sensitivity to treatment effects.
Adaptation of the Practice Environment Scale for military nurses: a psychometric analysis.
Swiger, Pauline A; Raju, Dheeraj; Breckenridge-Sproat, Sara; Patrician, Patricia A
2017-09-01
The aim of this study was to confirm the psychometric properties of Practice Environment Scale of the Nursing Work Index in a military population. This study also demonstrates association rule analysis, a contemporary exploratory technique. One of the instruments most commonly used to evaluate the nursing practice environment is the Practice Environment Scale of the Nursing Work Index. Although the instrument has been widely used, the reliability, validity and individual item function are not commonly evaluated. Gaps exist with regard to confirmatory evaluation of the subscale factors, individual item analysis and evaluation in the outpatient setting and with non-registered nursing staff. This was a secondary data analysis of existing survey data. Multiple psychometric methods were used for this analysis using survey data collected in 2014. First, descriptive analyses were conducted, including exploration using association rules. Next, internal consistency was tested and confirmatory factor analysis was performed to test the factor structure. The specified factor structure did not hold; therefore, exploratory factor analysis was performed. Finally, item analysis was executed using item response theory. The differential item functioning technique allowed the comparison of responses by care setting and nurse type. The results of this study indicate that responses differ between groups and that several individual items could be removed without altering the psychometric properties of the instrument. The instrument functions moderately well in a military population; however, researchers may want to consider nurse type and care setting during analysis to identify any meaningful variation in responses. © 2017 John Wiley & Sons Ltd.
Development of an item bank and computer adaptive test for role functioning.
Anatchkova, Milena D; Rose, Matthias; Ware, John E; Bjorner, Jakob B
2012-11-01
Role functioning (RF) is a key component of health and well-being and an important outcome in health research. The aim of this study was to develop an item bank to measure impact of health on role functioning. A set of different instruments including 75 newly developed items asking about the impact of health on role functioning was completed by 2,500 participants. Established item response theory methods were used to develop an item bank based on the generalized partial credit model. Comparison of group mean bank scores of participants with different self-reported general health status and chronic conditions was used to test the external validity of the bank. After excluding items that did not meet established requirements, the final item bank consisted of a total of 64 items covering three areas of role functioning (family, social, and occupational). Slopes in the bank ranged between .93 and 4.37; the mean threshold range was -1.09 to -2.25. Item bank-based scores were significantly different for participants with and without chronic conditions and with different levels of self-reported general health. An item bank assessing health impact on RF across three content areas has been successfully developed. The bank can be used for development of short forms or computerized adaptive tests to be applied in the assessment of role functioning as one of the common denominators across applications of generic health assessment.
Shen, Minxue; Hu, Ming; Sun, Zhenqiu
2017-01-01
Objectives To develop and validate brief scales to measure common emotional and behavioural problems among adolescents in the examination-oriented education system and collectivistic culture of China. Setting Middle schools in Hunan province. Participants 5442 middle school students aged 11–19 years were sampled. 4727 valid questionnaires were collected and used for validation of the scales. The final sample included 2408 boys and 2319 girls. Primary and secondary outcome measures The tools were assessed by the item response theory, classical test theory (reliability and construct validity) and differential item functioning. Results Four scales to measure anxiety, depression, study problem and sociality problem were established. Exploratory factor analysis showed that each scale had two solutions. Confirmatory factor analysis showed acceptable to good model fit for each scale. Internal consistency and test–retest reliability of all scales were above 0.7. Item response theory showed that all items had acceptable discrimination parameters and most items had appropriate difficulty parameters. 10 items demonstrated differential item functioning with respect to gender. Conclusions Four brief scales were developed and validated among adolescents in middle schools of China. The scales have good psychometric properties with minor differential item functioning. They can be used in middle school settings, and will help school officials to assess the students’ emotional/behavioural problems. PMID:28062469
Psychological distress in cancer survivors: the further development of an item bank.
Smith, Adam B; Armes, Jo; Richardson, Alison; Stark, Dan P
2013-02-01
Assessment of psychological distress by patient report is necessary to meet patients' needs throughout the cancer journey. We have previously developed an item bank to assess psychological distress but not evaluated it for cancer survivors. Our first aim in this study was to test whether we could extend our item bank to include cancer survivors. The second aim was to examine whether the item bank could assess positive affect as a single construct alongside negative psychological symptoms. Responses from 1315 cancer survivors to the Hospital Anxiety and Depression Scale (HADS) and the Positive and Negative Affect Scale (PANAS) were considered for inclusion in a pre-existing item bank created from a heterogeneous sample of 4914 cancer patients. Differential item functioning (DIF) was used to assess whether HADS responses drawn from the two samples were equivalent. Common-item equating was used to anchor the shared (HADS) items, whilst the PANAS items were added. Item fit was evaluated at each stage, and misfitting items were removed. Unidimensionality was assessed with a principal components factor analysis. The DIF analysis did not reveal any differences between the HADS item locations from the two samples. Three misfitting PANAS items were removed, resulting in a final unidimensional bank of 80 items with good internal reliability (α = 0.85). The new item bank is valid for use across the cancer journey, including cancer survivors, and modestly improves the assessment of all levels of psychological distress and positive psychological function. Copyright © 2011 John Wiley & Sons, Ltd.
Tarrant, Marie; Ware, James; Mohammed, Ahmed M
2009-07-07
Four- or five-option multiple choice questions (MCQs) are the standard in health-science disciplines, both on certification-level examinations and on in-house developed tests. Previous research has shown, however, that few MCQs have three or four functioning distractors. The purpose of this study was to investigate non-functioning distractors in teacher-developed tests in one nursing program in an English-language university in Hong Kong. Using item-analysis data, we assessed the proportion of non-functioning distractors on a sample of seven test papers administered to undergraduate nursing students. A total of 514 items were reviewed, including 2056 options (1542 distractors and 514 correct responses). Non-functioning options were defined as ones that were chosen by fewer than 5% of examinees and those with a positive option discrimination statistic. The proportion of items containing 0, 1, 2, and 3 functioning distractors was 12.3%, 34.8%, 39.1%, and 13.8% respectively. Overall, items contained an average of 1.54 (SD = 0.88) functioning distractors. Only 52.2% (n = 805) of all distractors were functioning effectively and 10.2% (n = 158) had a choice frequency of 0. Items with more functioning distractors were more difficult and more discriminating. The low frequency of items with three functioning distractors in the four-option items in this study suggests that teachers have difficulty developing plausible distractors for most MCQs. Test items should consist of as many options as is feasible given the item content and the number of plausible distractors; in most cases this would be three. Item analysis results can be used to identify and remove non-functioning distractors from MCQs that have been used in previous tests.
ERIC Educational Resources Information Center
Robitzsch, Alexander; Rupp, Andre A.
2009-01-01
This article describes the results of a simulation study to investigate the impact of missing data on the detection of differential item functioning (DIF). Specifically, it investigates how four methods for dealing with missing data (listwise deletion, zero imputation, two-way imputation, response function imputation) interact with two methods of…
Rose, M; Bjorner, J B; Becker, J; Fries, J F; Ware, J E
2008-01-01
The Patient-Reported Outcomes Measurement Information System (PROMIS) was initiated to improve precision, reduce respondent burden, and enhance the comparability of health outcomes measures. We used item response theory (IRT) to construct and evaluate a preliminary item bank for physical function assuming four subdomains. Data from seven samples (N=17,726) using 136 items from nine questionnaires were evaluated. A generalized partial credit model was used to estimate item parameters, which were normed to a mean of 50 (SD=10) in the US population. Item bank properties were evaluated through Computerized Adaptive Test (CAT) simulations. IRT requirements were fulfilled by 70 items covering activities of daily living, lower extremity, and central body functions. The original item context partly affected parameter stability. Items on upper body function, and need for aid or devices did not fit the IRT model. In simulations, a 10-item CAT eliminated floor and decreased ceiling effects, achieving a small standard error (< 2.2) across scores from 20 to 50 (reliability >0.95 for a representative US sample). This precision was not achieved over a similar range by any comparable fixed length item sets. The methods of the PROMIS project are likely to substantially improve measures of physical function and to increase the efficiency of their administration using CAT.
Item Selection and Pre-equating with Empirical Item Characteristic Curves.
ERIC Educational Resources Information Center
Livingston, Samuel A.
An empirical item characteristic curve shows the probability of a correct response as a function of the student's total test score. These curves can be estimated from large-scale pretest data. They enable test developers to select items that discriminate well in the score region where decisions are made. A similar set of curves can be used to…
Anchor Selection Strategies for DIF Analysis: Review, Assessment, and New Approaches
ERIC Educational Resources Information Center
Kopf, Julia; Zeileis, Achim; Strobl, Carolin
2015-01-01
Differential item functioning (DIF) indicates the violation of the invariance assumption, for instance, in models based on item response theory (IRT). For item-wise DIF analysis using IRT, a common metric for the item parameters of the groups that are to be compared (e.g., for the reference and the focal group) is necessary. In the Rasch model,…
Calibration of an Item Bank for the Assessment of Basque Language Knowledge
ERIC Educational Resources Information Center
Lopez-Cuadrado, Javier; Perez, Tomas A.; Vadillo, Jose A.; Gutierrez, Julian
2010-01-01
The main requisite for a functional computerized adaptive testing system is the need of a calibrated item bank. This text presents the tasks carried out during the calibration of an item bank for assessing knowledge of Basque language. It has been done in terms of the 3-parameter logistic model provided by the item response theory. Besides, this…
Factor Structure and Reliability of Test Items for Saudi Teacher Licence Assessment
ERIC Educational Resources Information Center
Alsadaawi, Abdullah Saleh
2017-01-01
The Saudi National Assessment Centre administers the Computer Science Teacher Test for teacher certification. The aim of this study is to explore gender differences in candidates' scores, and investigate dimensionality, reliability, and differential item functioning using confirmatory factor analysis and item response theory. The confirmatory…
Likelihood-Ratio DIF Testing: Effects of Nonnormality
ERIC Educational Resources Information Center
Woods, Carol M.
2008-01-01
Differential item functioning (DIF) occurs when an item has different measurement properties for members of one group versus another. Likelihood-ratio (LR) tests for DIF based on item response theory (IRT) involve statistically comparing IRT models that vary with respect to their constraints. A simulation study evaluated how violation of the…
Testing Manifest Monotonicity Using Order-Constrained Statistical Inference
ERIC Educational Resources Information Center
Tijmstra, Jesper; Hessen, David J.; van der Heijden, Peter G. M.; Sijtsma, Klaas
2013-01-01
Most dichotomous item response models share the assumption of latent monotonicity, which states that the probability of a positive response to an item is a nondecreasing function of a latent variable intended to be measured. Latent monotonicity cannot be evaluated directly, but it implies manifest monotonicity across a variety of observed scores,…
Mumbardó-Adam, C; Guàrdia-Olmos, J; Giné, C; Raley, S K; Shogren, K A
2018-04-01
A new measure of self-determination, the Self-Determination Inventory: Student Report (Spanish version), has recently been adapted and empirically validated in Spanish language. As it is the first instrument intended to measure self-determination in youth with and without disabilities, there is a need to further explore and strengthen its psychometric analysis based on item response patterns. Through item response theory approach, this study examined item observed distributions across the essential characteristics of self-determination. The results demonstrated satisfactory to excellent item functioning patterns across characteristics, particularly within agentic action domains. Increased variability across items was also found within action-control beliefs dimensions, specifically within the self-realisation subdomain. These findings further support the instrument's psychometric properties and outline future research directions. © 2017 MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.
Kratz, Anna L; Schilling, Stephen G; Goesling, Jenna; Williams, David A
2015-06-01
Pain is often the focus of research and clinical care in fibromyalgia (FM); however, cognitive dysfunction is also a common, distressing, and disabling symptom in FM. Current efforts to address this problem are limited by the lack of a comprehensive, valid measure of subjective cognitive dysfunction in FM that is easily interpretable, accessible, and brief. The purpose of this study was to leverage cognitive functioning item banks that were developed as part of the Patient Reported Outcomes Measurement Information System (PROMIS) to devise a 10-item short form measure of cognitive functioning for use in FM. In study 1, a nationwide (U.S.) sample of 1,035 adults with FM (age range = 18-82, 95.2% female) completed 2 cognitive item pools. Factor analyses and item response theory analyses were used to identify dimensionality and optimally performing items. A recommended 10-item measure, called the Multidimensional Inventory of Subjective Cognitive Impairment (MISCI) was created. In study 2, 232 adults with FM completed the MISCI and a legacy measure of cognitive functioning that is used in FM clinical trials, the Multiple Ability Self-Report Questionnaire (MASQ). The MISCI showed excellent internal reliability, low ceiling/floor effects, and good convergent validity with the MASQ (r = -.82). This paper presents the MISCI, a 10-item measure of cognitive dysfunction in FM, developed through classical test theory and item response theory. This brief but comprehensive measure shows evidence of excellent construct validity through large correlations with a lengthy legacy measure of cognitive functioning. Copyright © 2015 American Pain Society. Published by Elsevier Inc. All rights reserved.
An introduction to Item Response Theory and Rasch Analysis of the Eating Assessment Tool (EAT-10).
Kean, Jacob; Brodke, Darrel S; Biber, Joshua; Gross, Paul
2018-03-01
Item response theory has its origins in educational measurement and is now commonly applied in health-related measurement of latent traits, such as function and symptoms. This application is due in large part to gains in the precision of measurement attributable to item response theory and corresponding decreases in response burden, study costs, and study duration. The purpose of this paper is twofold: introduce basic concepts of item response theory and demonstrate this analytic approach in a worked example, a Rasch model (1PL) analysis of the Eating Assessment Tool (EAT-10), a commonly used measure for oropharyngeal dysphagia. The results of the analysis were largely concordant with previous studies of the EAT-10 and illustrate for brain impairment clinicians and researchers how IRT analysis can yield greater precision of measurement.
ERIC Educational Resources Information Center
Molenaar, Dylan; Dolan, Conor V.; de Boeck, Paul
2012-01-01
The Graded Response Model (GRM; Samejima, "Estimation of ability using a response pattern of graded scores," Psychometric Monograph No. 17, Richmond, VA: The Psychometric Society, 1969) can be derived by assuming a linear regression of a continuous variable, Z, on the trait, [theta], to underlie the ordinal item scores (Takane & de Leeuw in…
Lawton IADL scale in dementia: can item response theory make it more informative?
McGrory, Sarah; Shenkin, Susan D; Austin, Elizabeth J; Starr, John M
2014-07-01
impairment of functional abilities represents a crucial component of dementia diagnosis. Current functional measures rely on the traditional aggregate method of summing raw scores. While this summary score provides a quick representation of a person's ability, it disregards useful information on the item level. to use item response theory (IRT) methods to increase the interpretive power of the Lawton Instrumental Activities of Daily Living (IADL) scale by establishing a hierarchy of item 'difficulty' and 'discrimination'. this cross-sectional study applied IRT methods to the analysis of IADL outcomes. Participants were 202 members of the Scottish Dementia Research Interest Register (mean age = 76.39, range = 56-93, SD = 7.89 years) with complete itemised data available. a Mokken scale with good reliability (Molenaar Sijtsama statistic 0.79) was obtained, satisfying the IRT assumption that the items comprise a single unidimensional scale. The eight items in the scale could be placed on a hierarchy of 'difficulty' (H coefficient = 0.55), with 'Shopping' being the most 'difficult' item and 'Telephone use' being the least 'difficult' item. 'Shopping' was the most discriminatory item differentiating well between patients of different levels of ability. IRT methods are capable of providing more information about functional impairment than a summed score. 'Shopping' and 'Telephone use' were identified as items that reveal key information about a patient's level of ability, and could be useful screening questions for clinicians. © The Author 2013. Published by Oxford University Press on behalf of the British Geriatrics Society. All rights reserved. For Permissions, please email: journals.permissions@ oup.com.
A Comparison of Lord's Chi Square and Raju's Area Measures in Detection of DIF.
ERIC Educational Resources Information Center
Cohen, Allan S.; Kim, Seock-Ho
1993-01-01
The effectiveness of two statistical tests of the area between item response functions (exact signed area and exact unsigned area) estimated in different samples, a measure of differential item functioning (DIF), was compared with Lord's chi square. Lord's chi square was found the most effective in determining DIF. (SLD)
Hays, Ron D; Spritzer, Karen L; Amtmann, Dagmar; Lai, Jin-Shei; Dewitt, Esi Morgan; Rothrock, Nan; Dewalt, Darren A; Riley, William T; Fries, James F; Krishnan, Eswar
2013-11-01
To create upper-extremity and mobility subdomain scores from the Patient-Reported Outcomes Measurement Information System (PROMIS) physical functioning adult item bank. Expert reviews were used to identify upper-extremity and mobility items from the PROMIS item bank. Psychometric analyses were conducted to assess empirical support for scoring upper-extremity and mobility subdomains. Data were collected from the U.S. general population and multiple disease groups via self-administered surveys. The sample (N=21,773) included 21,133 English-speaking adults who participated in the PROMIS wave 1 data collection and 640 Spanish-speaking Latino adults recruited separately. Not applicable. We used English- and Spanish-language data and existing PROMIS item parameters for the physical functioning item bank to estimate upper-extremity and mobility scores. In addition, we fit graded response models to calibrate the upper-extremity items and mobility items separately, compare separate to combined calibrations, and produce subdomain scores. After eliminating items because of local dependency, 16 items remained to assess upper extremity and 17 items to assess mobility. The estimated correlation between upper extremity and mobility was .59 using existing PROMIS physical functioning item parameters (r=.60 using parameters calibrated separately for upper-extremity and mobility items). Upper-extremity and mobility subdomains shared about 35% of the variance in common, and produced comparable scores whether calibrated separately or together. The identification of the subset of items tapping these 2 aspects of physical functioning and scored using the existing PROMIS parameters provides the option of scoring these subdomains in addition to the overall physical functioning score. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Development of a Brief Questionnaire to Assess Contraceptive Intent
Raine-Bennett, Tina R; Rocca, Corinne H
2015-01-01
Objective We sought to develop and validate an instrument that can enable providers to identify young women who may be at risk of contraceptive non-adherence. Methods Item response theory based methods were used to evaluate the psychometric properties of the Contraceptive Intent Questionnaire, a 15-item self-administered questionnaire, based on theory and prior qualitative and quantitative research. The questionnaire was administered to 200 women aged 15–24 years who were initiating contraceptives. We assessed item fit to the item response model, internal consistency, internal structure validity, and differential item functioning. Results All items fit a one-dimensional model. The separation reliability coefficient was 0.73. Participants’ overall scores covered the full range of the scale (0–15), and items appropriately matched the range of participants’ contraceptive intent. Items met the criteria for internal structure validity and most items functioned similarly between groups of women. Conclusion The Contraceptive Intent Questionnaire appears to be a reliable and valid tool. Future testing is needed to assess predictive ability and clinical utility. Practice Implications The Contraceptive Intent Questionnaire may serve as a valid tool to help providers identify women who may have problems with contraceptive adherence, as well as to pinpoint areas in which counseling may be directed. PMID:26104994
Development of a brief questionnaire to assess contraceptive intent.
Raine-Bennett, Tina R; Rocca, Corinne H
2015-11-01
We sought to develop and validate an instrument that can enable providers to identify young women who may be at risk of contraceptive non-adherence. Item response theory based methods were used to evaluate the psychometric properties of the Contraceptive Intent Questionnaire, a 15-item self-administered questionnaire, based on theory and prior qualitative and quantitative research. The questionnaire was administered to 200 women aged 15-24 years who were initiating contraceptives. We assessed item fit to the item response model, internal consistency, internal structure validity, and differential item functioning. All items fit a one-dimensional model. The separation reliability coefficient was 0.73. Participants' overall scores covered the full range of the scale (0-15), and items appropriately matched the range of participants' contraceptive intent. Items met the criteria for internal structure validity and most items functioned similarly between groups of women. The Contraceptive Intent Questionnaire appears to be a reliable and valid tool. Future testing is needed to assess predictive ability and clinical utility. The Contraceptive Intent Questionnaire may serve as a valid tool to help providers identify women who may have problems with contraceptive adherence, as well as to pinpoint areas in which counseling may be directed. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Unsworth, Nash; Brewer, Gene A; Spillers, Gregory J
2011-09-01
In three experiments search termination decisions were examined as a function of response type (correct vs. incorrect) and confidence. It was found that the time between the last retrieved item and the decision to terminate search (exit latency) was related to the type of response and confidence in the last item retrieved. Participants were willing to search longer when the last retrieved item was a correct item vs. an incorrect item and when the confidence was high in the last retrieved item. It was also found that the number of errors retrieved during the recall period was related to search termination decisions such that the more errors retrieved, the more likely participants were to terminate the search. Finally, it was found that knowledge of overall search set size influenced the time needed to search for items, but did not influence search termination decisions. Copyright © 2011 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Michaelides, Michalis P.; Haertel, Edward H.
2014-01-01
The standard error of equating quantifies the variability in the estimation of an equating function. Because common items for deriving equated scores are treated as fixed, the only source of variability typically considered arises from the estimation of common-item parameters from responses of samples of examinees. Use of alternative, equally…
Lix, Lisa M; Wu, Xiuyun; Hopman, Wilma; Mayo, Nancy; Sajobi, Tolulope T; Liu, Juxin; Prior, Jerilynn C; Papaioannou, Alexandra; Josse, Robert G; Towheed, Tanveer E; Davison, K Shawn; Sawatzky, Richard
2016-01-01
Self-reported health status measures, like the Short Form 36-item Health Survey (SF-36), can provide rich information about the overall health of a population and its components, such as physical, mental, and social health. However, differential item functioning (DIF), which arises when population sub-groups with the same underlying (i.e., latent) level of health have different measured item response probabilities, may compromise the comparability of these measures. The purpose of this study was to test for DIF on the SF-36 physical functioning (PF) and mental health (MH) sub-scale items in a Canadian population-based sample. Study data were from the prospective Canadian Multicentre Osteoporosis Study (CaMos), which collected baseline data in 1996-1997. DIF was tested using a multiple indicators multiple causes (MIMIC) method. Confirmatory factor analysis defined the latent variable measurement model for the item responses and latent variable regression with demographic and health status covariates (i.e., sex, age group, body weight, self-perceived general health) produced estimates of the magnitude of DIF effects. The CaMos cohort consisted of 9423 respondents; 69.4% were female and 51.7% were less than 65 years. Eight of 10 items on the PF sub-scale and four of five items on the MH sub-scale exhibited DIF. Large DIF effects were observed on PF sub-scale items about vigorous and moderate activities, lifting and carrying groceries, walking one block, and bathing or dressing. On the MH sub-scale items, all DIF effects were small or moderate in size. SF-36 PF and MH sub-scale scores were not comparable across population sub-groups defined by demographic and health status variables due to the effects of DIF, although the magnitude of this bias was not large for most items. We recommend testing and adjusting for DIF to ensure comparability of the SF-36 in population-based investigations.
Paz, Sylvia H; Spritzer, Karen L; Reise, Steven P; Hays, Ron D
2017-06-01
About 70% of Latinos, 5 years old or older, in the United States speak Spanish at home. Measurement equivalence of the PROMIS ® pain interference (PI) item bank by language of administration (English versus Spanish) has not been evaluated. A sample of 527 adult Spanish-speaking Latinos completed the Spanish version of the 41-item PROMIS ® pain interference item bank. We evaluate dimensionality, monotonicity and local independence of the Spanish-language items. Then we evaluate differential item functioning (DIF) using ordinal logistic regression with item response theory scores estimated from DIF-free "anchor" items. One of the 41 items in the Spanish version of the PROMIS ® PI item bank was identified as having significant uniform DIF. English- and Spanish-speaking subjects with the same level of pain interference responded differently to 1 of the 41 items in the PROMIS ® PI item bank. This item was not retained due to proprietary issues. The original English language item parameters can be used when estimating PROMIS ® PI scores.
Jones, Richard N
2006-11-01
Knowledge of the extent to which measurement of adult cognitive functioning differs between Spanish and English language administrations of the Mini-Mental State Examination (MMSE) is critical for inclusive, representative, and valid research of older adults in the United States. We sought to demonstrate the use of an item response theory (IRT) based structural equation model, that is, the MIMIC model (multiple indicators, multiple causes), to evaluate MMSE responses for evidence of differential item functioning (DIF) attributable to language of administration. We studied participants in a dementia case registry study (n = 1546), 42% of whom were examined with the Spanish language MMSE. Twelve of 21 items were identified as having significant uniform DIF. The 4 most discrepant included orientation to season, orientation to state, repeat phrase, and follow command. DIF accounted for two-thirds of the observed difference in underlying level of cognitive functioning between Spanish- and English-language administration groups. Failing to account for measurement differences may lead to spurious inferences regarding language group differences in level of underlying level of cognitive functioning. The MIMIC model can be used to detect and adjust for such measurement differences in substantive research.
Cluster Analysis for Cognitive Diagnosis: Theory and Applications
ERIC Educational Resources Information Center
Chiu, Chia-Yi; Douglas, Jeffrey A.; Li, Xiaodong
2009-01-01
Latent class models for cognitive diagnosis often begin with specification of a matrix that indicates which attributes or skills are needed for each item. Then by imposing restrictions that take this into account, along with a theory governing how subjects interact with items, parametric formulations of item response functions are derived and…
IRT-ZIP Modeling for Multivariate Zero-Inflated Count Data
ERIC Educational Resources Information Center
Wang, Lijuan
2010-01-01
This study introduces an item response theory-zero-inflated Poisson (IRT-ZIP) model to investigate psychometric properties of multiple items and predict individuals' latent trait scores for multivariate zero-inflated count data. In the model, two link functions are used to capture two processes of the zero-inflated count data. Item parameters are…
2012-01-01
Background Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQLTM 4.0 Generic Core Scales. Methods The PedsQLTM 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. Results The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. Conclusions This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients. PMID:22414135
Application of Item Response Theory to Tests of Substance-related Associative Memory
Shono, Yusuke; Grenard, Jerry L.; Ames, Susan L.; Stacy, Alan W.
2015-01-01
A substance-related word association test (WAT) is one of the commonly used indirect tests of substance-related implicit associative memory and has been shown to predict substance use. This study applied an item response theory (IRT) modeling approach to evaluate psychometric properties of the alcohol- and marijuana-related WATs and their items among 775 ethnically diverse at-risk adolescents. After examining the IRT assumptions, item fit, and differential item functioning (DIF) across gender and age groups, the original 18 WAT items were reduced to 14- and 15-items in the alcohol- and marijuana-related WAT, respectively. Thereafter, unidimensional one- and two-parameter logistic models (1PL and 2PL models) were fitted to the revised WAT items. The results demonstrated that both alcohol- and marijuana-related WATs have good psychometric properties. These results were discussed in light of the framework of a unified concept of construct validity (Messick, 1975, 1989, 1995). PMID:25134051
Langer, Michelle M.; Hill, Cheryl D.; Thissen, David; Burwinkle, Tasha M.; Varni, James W.; DeWalt, Darren A.
2008-01-01
Objective To demonstrate the value of item response theory (IRT) and differential item functioning (DIF) methods in examining a health-related quality of life (HRQOL) measure in children and adolescents. Study Design and Setting This illustration uses data from 5,429 children using the four subscales of the PedsQL™ 4.0 Generic Core Scales. The IRT model-based likelihood ratio test was used to detect and evaluate DIF between healthy children and children with a chronic condition. Results DIF was detected for a majority of items but cancelled out at the total test score level due to opposing directions of DIF. Post-hoc analysis indicated that this pattern of results may be due to multidimensionality. We discuss issues in detecting and handling DIF. Conclusion This paper describes how to perform DIF analyses in validating a questionnaire to ensure that scores have equivalent meaning across subgroups. It offers insight into ways information gained through the analysis can be used to evaluate an existing scale. PMID:18226750
ERIC Educational Resources Information Center
Woods, Carol M.; Thissen, David
2006-01-01
The purpose of this paper is to introduce a new method for fitting item response theory models with the latent population distribution estimated from the data using splines. A spline-based density estimation system provides a flexible alternative to existing procedures that use a normal distribution, or a different functional form, for the…
Item Type and Gender Differences on the Mental Rotations Test
ERIC Educational Resources Information Center
Voyer, Daniel; Doyle, Randi A.
2010-01-01
This study investigated gender differences on the Mental Rotations Test (MRT) as a function of item and response types. Accordingly, 86 male and 109 female undergraduate students completed the MRT without time limits. Responses were coded as reflecting two correct (CC), one correct and one wrong (CW), two wrong (WW), one correct and one blank…
ERIC Educational Resources Information Center
Dai, Yunyun
2013-01-01
Mixtures of item response theory (IRT) models have been proposed as a technique to explore response patterns in test data related to cognitive strategies, instructional sensitivity, and differential item functioning (DIF). Estimation proves challenging due to difficulties in identification and questions of effect size needed to recover underlying…
Kılıç, Aslı; Hoyer, William J; Howard, Marc W
2013-01-01
BACKGROUND/STUDY CONTEXT: Older adults exhibit an age-related deficit in item memory as a function of the length of the retention interval, but older adults and young adults usually show roughly equivalent benefits due to the spacing of item repetitions in continuous memory tasks. The current experiment investigates the seemingly paradoxical effects of retention interval and spacing in young and older adults using a continuous recognition memory procedure. Fifty young adults and 52 older adults gave memory confidence ratings to words that were presented once (P1), twice (P2), or three times (P3), and the effects of the lag length and retention interval were assessed at P2 and at P3, respectively. Response times at P2 were disproportionately longer for older adults than for younger adults as a function of the number of items occurring between P1 and P2, suggestive of age-related loss in item memory. Ratings of confidence in memory responses revealed that older adults remembered fewer items at P2 with a high degree of certainty. Confidence ratings given at P3 suggested that young and older adults derived equivalent benefits from the spacing between P1 and P2. Findings of this study support theoretical accounts that suggest that recursive reminding and/or item retrieval difficulty promote item retention in older adults.
Item selection via Bayesian IRT models.
Arima, Serena
2015-02-10
With reference to a questionnaire that aimed to assess the quality of life for dysarthric speakers, we investigate the usefulness of a model-based procedure for reducing the number of items. We propose a mixed cumulative logit model, which is known in the psychometrics literature as the graded response model: responses to different items are modelled as a function of individual latent traits and as a function of item characteristics, such as their difficulty and their discrimination power. We jointly model the discrimination and the difficulty parameters by using a k-component mixture of normal distributions. Mixture components correspond to disjoint groups of items. Items that belong to the same groups can be considered equivalent in terms of both difficulty and discrimination power. According to decision criteria, we select a subset of items such that the reduced questionnaire is able to provide the same information that the complete questionnaire provides. The model is estimated by using a Bayesian approach, and the choice of the number of mixture components is justified according to information criteria. We illustrate the proposed approach on the basis of data that are collected for 104 dysarthric patients by local health authorities in Lecce and in Milan. Copyright © 2014 John Wiley & Sons, Ltd.
Chan, Kitty S; Gross, Alden L; Pezzin, Liliana E; Brandt, Jason; Kasper, Judith D
2015-12-01
To harmonize measures of cognitive performance using item response theory (IRT) across two international aging studies. Data for persons ≥65 years from the Health and Retirement Study (HRS, N = 9,471) and the English Longitudinal Study of Aging (ELSA, N = 5,444). Cognitive performance measures varied (HRS fielded 25, ELSA 13); 9 were in common. Measurement precision was examined for IRT scores based on (a) common items, (b) common items adjusted for differential item functioning (DIF), and (c) DIF-adjusted all items. Three common items (day of date, immediate word recall, and delayed word recall) demonstrated DIF by survey. Adding survey-specific items improved precision but mainly for HRS respondents at lower cognitive levels. IRT offers a feasible strategy for harmonizing cognitive performance measures across other surveys and for other multi-item constructs of interest in studies of aging. Practical implications depend on sample distribution and the difficulty mix of in-common and survey-specific items. © The Author(s) 2015.
Buck, Harleah G; Harkness, Karen; Ali, Muhammad Usman; Carroll, Sandra L; Kryworuchko, Jennifer; McGillion, Michael
2017-04-01
Caregivers (CGs) contribute important assistance with heart failure (HF) self-care, including daily maintenance, symptom monitoring, and management. Until CGs' contributions to self-care can be quantified, it is impossible to characterize it, account for its impact on patient outcomes, or perform meaningful cost analyses. The purpose of this study was to conduct psychometric testing and item reduction on the recently developed 34-item Caregiver Contribution to Heart Failure Self-care (CACHS) instrument using classical and item response theory methods. Fifty CGs (mean age 63 years ±12.84; 70% female) recruited from a HF clinic completed the CACHS in 2014 and results evaluated using classical test theory and item response theory. Items would be deleted for low (<.05) or high (>.95) endorsement, low (<.3) or high (>.7) corrected item-total correlations, significant pairwise correlation coefficients, floor or ceiling effects, relatively low latent trait and item information function levels (<1.5 and p > .5), and differential item functioning. After analysis, 14 items were excluded, resulting in a 20-item instrument (self-care maintenance eight items; monitoring seven items; and management five items). Most items demonstrated moderate to high discrimination (median 2.13, minimum .77, maximum 5.05), and appropriate item difficulty (-2.7 to 1.4). Internal consistency reliability was excellent (Cronbach α = .94, average inter-item correlation = .41) with no ceiling effects. The newly developed 20-item version of the CACHS is supported by rigorous instrument development and represents a novel instrument to measure CGs' contribution to HF self-care. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Dong, Sunghee; Jeong, Jichai
2018-02-01
Objective. Memory is formed by the interaction of various brain functions at the item and task level. Revealing individual and combined effects of item- and task-related processes on retrieving episodic memory is an unsolved problem because of limitations in existing neuroimaging techniques. To investigate these issues, we analyze fast and slow optical signals measured from a custom-built continuous wave functional near-infrared spectroscopy (CW-fNIRS) system. Approach. In our work, we visually encode the words to the subjects and let them recall the words after a short rest. The hemodynamic responses evoked by the episodic memory are compared with those evoked by the semantic memory in retrieval blocks. In the fast optical signal, we compare the effects of old and new items (previously seen and not seen) to investigate the item-related process in episodic memory. The Kalman filter is simultaneously applied to slow and fast optical signals in different time windows. Main results. A significant task-related HbR decrease was observed in the episodic memory retrieval blocks. Mean amplitude and peak latency of a fast optical signal are dependent upon item types and reaction time, respectively. Moreover, task-related hemodynamic and item-related fast optical responses are correlated in the right prefrontal cortex. Significance. We demonstrate that episodic memory is retrieved from the right frontal area by a functional connectivity between the maintained mental state through retrieval and item-related transient activity. To the best of our knowledge, this demonstration of functional NIRS research is the first to examine the relationship between item- and task-related memory processes in the prefrontal area using single modality.
Dong, Sunghee; Jeong, Jichai
2018-02-01
Memory is formed by the interaction of various brain functions at the item and task level. Revealing individual and combined effects of item- and task-related processes on retrieving episodic memory is an unsolved problem because of limitations in existing neuroimaging techniques. To investigate these issues, we analyze fast and slow optical signals measured from a custom-built continuous wave functional near-infrared spectroscopy (CW-fNIRS) system. In our work, we visually encode the words to the subjects and let them recall the words after a short rest. The hemodynamic responses evoked by the episodic memory are compared with those evoked by the semantic memory in retrieval blocks. In the fast optical signal, we compare the effects of old and new items (previously seen and not seen) to investigate the item-related process in episodic memory. The Kalman filter is simultaneously applied to slow and fast optical signals in different time windows. A significant task-related HbR decrease was observed in the episodic memory retrieval blocks. Mean amplitude and peak latency of a fast optical signal are dependent upon item types and reaction time, respectively. Moreover, task-related hemodynamic and item-related fast optical responses are correlated in the right prefrontal cortex. We demonstrate that episodic memory is retrieved from the right frontal area by a functional connectivity between the maintained mental state through retrieval and item-related transient activity. To the best of our knowledge, this demonstration of functional NIRS research is the first to examine the relationship between item- and task-related memory processes in the prefrontal area using single modality.
A Procedure to Detect Item Bias Present Simultaneously in Several Items
1991-04-25
exhibit a coherent and major biasing influence at the test level. In partic- ular, this can be true even if each individual item displays only a minor...response functions (IRFs) without the use of item parameter estimation algorithms when the sample size is too small for their use. Thissen, Steinberg...convention). A random sample of examinees is drawn from each group, and a test of N items is administered to them. Typically it is suspected that a
ERIC Educational Resources Information Center
Suh, Youngsuk; Talley, Anna E.
2015-01-01
This study compared and illustrated four differential distractor functioning (DDF) detection methods for analyzing multiple-choice items. The log-linear approach, two item response theory-model-based approaches with likelihood ratio tests, and the odds ratio approach were compared to examine the congruence among the four DDF detection methods.…
IRT-LR-DIF with Estimation of the Focal-Group Density as an Empirical Histogram
ERIC Educational Resources Information Center
Woods, Carol M.
2008-01-01
Item response theory-likelihood ratio-differential item functioning (IRT-LR-DIF) is used to evaluate the degree to which items on a test or questionnaire have different measurement properties for one group of people versus another, irrespective of group-mean differences on the construct. Usually, the latent distribution is presumed normal for both…
Item Analysis and Differential Item Functioning of a Brief Conduct Problem Screen
ERIC Educational Resources Information Center
Wu, Johnny; King, Kevin M.; Witkiewitz, Katie; Racz, Sarah Jensen; McMahon, Robert J.
2012-01-01
Research has shown that boys display higher levels of childhood conduct problems than girls, and Black children display higher levels than White children, but few studies have tested for scalar equivalence of conduct problems across gender and race. The authors conducted a 2-parameter item response theory (IRT) model to examine item…
Interest Inventory Items as Reinforcing Stimuli: A Test of the A-R-D Theory.
ERIC Educational Resources Information Center
Staats, Arthur W.; And Others
An experiement was conducted to test the hypothesis that interest inventory items would function as reinforcing stimuli in a visual discrimination task. When previously rated liked and disliked items from the Strong Vocational Interest Blank were differentially presented following one of two responses, subjects learned to respond to the stimulus…
Female Sexual Function Index Short Version: A MsFLASH Item Response Analysis.
Carpenter, Janet S; Jones, Salene M W; Studts, Christina R; Heiman, Julia R; Reed, Susan D; Newton, Katherine M; Guthrie, Katherine A; Larson, Joseph C; Cohen, Lee S; Freeman, Ellen W; Jane Lau, R; Learman, Lee A; Shifren, Jan L
2016-11-01
The Female Sexual Function Index (FSFI) is a psychometrically sound and popular 19-item self-report measure, but its length may preclude its use in studies with multiple outcome measures, especially when sexual function is not a primary endpoint. Only one attempt has been made to create a shorter scale, resulting in the Italian FSFI-6, later translated into Spanish and Korean without further psychometric analysis. Our study evaluated whether a subset of items on the 19-item English-language FSFI would perform as well as the full-length FSFI in peri- and postmenopausal women. We used baseline data from 898 peri- and postmenopausal women recruited from multiple communities, ages 42-62 years, and enrolled in randomized controlled trials for vasomotor symptom management. Goals were to (1) create a psychometrically sound, shorter version of the FSFI for use in peri- and postmenopausal women as a continuous measure and (2) compare it to the Italian FSFI-6. Results indicated that a 9-item scale provided more information than the FSFI-6 across a spectrum of sexual functioning, was able to capture sample variability, and showed sufficient range without floor or ceiling effects. All but one of the items from the Italian 6-item version were included in the 9-item version. Most omitted FSFI items focused on frequency of events or experiences. When assessment of sexual function is a secondary endpoint and subject burden related to questionnaire length is a priority, the 9-item FSFI may provide important information about sexual function in English-speaking peri- and postmenopausal women.
Lo, Barbara Chuen Yee; Zhao, Yue; Kwok, Alice Wai Yee; Chan, Wai; Chan, Calais Kin Yuen
2017-07-01
The present study applied item response theory to examine the psychometric properties of the Asian Adolescent Depression Scale and to construct a short form among 1,084 teenagers recruited from secondary schools in Hong Kong. Findings suggested that some items of the full form reflected higher levels of severity and were more discriminating than others, and the Asian Adolescent Depression Scale was useful in measuring a broad range of depressive severity in community youths. Differential item functioning emerged in several items where females reported higher depressive severity than males. In the short form construction, preliminary validation suggested that, relative to the 20-item full form, our derived short form offered significantly greater diagnostic performance and stronger discriminatory ability in differentiating depressed and nondepressed groups, and simultaneously maintained adequate measurement precision with a reduced response burden in assessing depression in the Asian adolescents. Cultural variance in depressive symptomatology and clinical implications are discussed.
Jordan, Pascal; Shedden-Mora, Meike C; Löwe, Bernd
2017-01-01
The Generalized Anxiety Disorder scale (GAD-7) is one of the most frequently used diagnostic self-report scales for screening, diagnosis and severity assessment of anxiety disorder. Its psychometric properties from the view of the Item Response Theory paradigm have rarely been investigated. We aimed to close this gap by analyzing the GAD-7 within a large sample of primary care patients with respect to its psychometric properties and its implications for scoring using Item Response Theory. Robust, nonparametric statistics were used to check unidimensionality of the GAD-7. A graded response model was fitted using a Bayesian approach. The model fit was evaluated using posterior predictive p-values, item information functions were derived and optimal predictions of anxiety were calculated. The sample included N = 3404 primary care patients (60% female; mean age, 52,2; standard deviation 19.2) The analysis indicated no deviations of the GAD-7 scale from unidimensionality and a decent fit of a graded response model. The commonly suggested ultra-brief measure consisting of the first two items, the GAD-2, was supported by item information analysis. The first four items discriminated better than the last three items with respect to latent anxiety. The information provided by the first four items should be weighted more heavily. Moreover, estimates corresponding to low to moderate levels of anxiety show greater variability. The psychometric validity of the GAD-2 was supported by our analysis.
Shedden-Mora, Meike C.; Löwe, Bernd
2017-01-01
Objective The Generalized Anxiety Disorder scale (GAD-7) is one of the most frequently used diagnostic self-report scales for screening, diagnosis and severity assessment of anxiety disorder. Its psychometric properties from the view of the Item Response Theory paradigm have rarely been investigated. We aimed to close this gap by analyzing the GAD-7 within a large sample of primary care patients with respect to its psychometric properties and its implications for scoring using Item Response Theory. Methods Robust, nonparametric statistics were used to check unidimensionality of the GAD-7. A graded response model was fitted using a Bayesian approach. The model fit was evaluated using posterior predictive p-values, item information functions were derived and optimal predictions of anxiety were calculated. Results The sample included N = 3404 primary care patients (60% female; mean age, 52,2; standard deviation 19.2) The analysis indicated no deviations of the GAD-7 scale from unidimensionality and a decent fit of a graded response model. The commonly suggested ultra-brief measure consisting of the first two items, the GAD-2, was supported by item information analysis. The first four items discriminated better than the last three items with respect to latent anxiety. Conclusion The information provided by the first four items should be weighted more heavily. Moreover, estimates corresponding to low to moderate levels of anxiety show greater variability. The psychometric validity of the GAD-2 was supported by our analysis. PMID:28771530
The Psychometric Properties of Classroom Response System Data: A Case Study
NASA Astrophysics Data System (ADS)
Kortemeyer, Gerd
2016-08-01
Classroom response systems (often referred to as "clickers") have slowly gained adoption over the recent decade; however, critics frequently doubt their pedagogical value starting with the validity of the gathered responses: There is concern that students simply "click" random answers. This case study looks at different measures of response reliability, starting from a global look at correlations between formative clicker responses and summative examination performance to how clicker questions are used in context. It was found that clicker performance is a moderate indicator of course performance as a whole, and that while the psychometric properties of clicker items are more erratic than those of examination data, they still have acceptable internal consistency and include items with high discrimination. It was also found that clicker responses and item properties do provide highly meaningful feedback within a lecture context, i.e., when their position and function within lecture sessions are taken into consideration. Within this framework, conceptual questions provide measurably more meaningful feedback than items that require calculations.
Setodji, Claude M; Elliott, Marc N; Abel, Gary; Burt, Jenni; Roland, Martin; Campbell, John
2015-09-01
To evaluate two 5-item patient experience scales from the English General Practice (GP) Patient Survey for evidence of differential item functioning (DIF) given prior evidence of substantially worse reported health care experiences for South Asian compared with white British respondents. A national survey of English patients' primary care experiences. We used classic test and item response theory analysis to examine the possibility of DIF by patient ethnicity (South Asian, white British) after controlling for age, sex, health status, and quality of life in the English GP Patient Survey conducted in 2011/2012. Data were available for 873,051 respondents (818,219 white British/54,832 South Asian from 7795 English practices) who answered items relating to experiences of GP or nurses' care. Internal consistency reliability was high and similar for South Asian and white British patients. White British patients reported better average experiences than South Asians, but there was no evidence of DIF or different item response curves for white British and South Asian respondents, even in sensitivity analyses using matched samples. All communication items in the English GP Patient Survey showed similar South Asian versus white British differences, with no evidence of DIF. In contrast, differences due to scale use or expectations are typically variable rather than constant across scales. While other possibilities remain, these findings increase the likelihood that the observed negative responses of South Asian patients to this national survey reflect true differences in their experiences of care.
Meikle, Mary B; Henry, James A; Griest, Susan E; Stewart, Barbara J; Abrams, Harvey B; McArdle, Rachel; Myers, Paula J; Newman, Craig W; Sandridge, Sharon; Turk, Dennis C; Folmer, Robert L; Frederick, Eric J; House, John W; Jacobson, Gary P; Kinney, Sam E; Martin, William H; Nagler, Stephen M; Reich, Gloria E; Searchfield, Grant; Sweetow, Robert; Vernon, Jack A
2012-01-01
Chronic subjective tinnitus is a prevalent condition that causes significant distress to millions of Americans. Effective tinnitus treatments are urgently needed, but evaluating them is hampered by the lack of standardized measures that are validated for both intake assessment and evaluation of treatment outcomes. This work was designed to develop a new self-report questionnaire, the Tinnitus Functional Index (TFI), that would have documented validity both for scaling the severity and negative impact of tinnitus for use in intake assessment and for measuring treatment-related changes in tinnitus (responsiveness) and that would provide comprehensive coverage of multiple tinnitus severity domains. To use preexisting knowledge concerning tinnitus-related problems, an Item Selection Panel (17 expert judges) surveyed the content (175 items) of nine widely used tinnitus questionnaires. From those items, the Panel identified 13 separate domains of tinnitus distress and selected 70 items most likely to be responsive to treatment effects. Eliminating redundant items while retaining good content validity and adding new items to achieve the recommended minimum of 3 to 4 items per domain yielded 43 items, which were then used for constructing TFI Prototype 1.Prototype 1 was tested at five clinics. The 326 participants included consecutive patients receiving tinnitus treatment who provided informed consent-constituting a convenience sample. Construct validity of Prototype 1 as an outcome measure was evaluated by measuring responsiveness of the overall scale and its individual items at 3 and 6 mo follow-up with 65 and 42 participants, respectively. Using a predetermined list of criteria, the 30 best-functioning items were selected for constructing TFI Prototype 2.Prototype 2 was tested at four clinics with 347 participants, including 155 and 86 who provided 3 and 6 mo follow-up data, respectively. Analyses were the same as for Prototype 1. Results were used to select the 25 best-functioning items for the final TFI. Both prototypes and the final TFI displayed strong measurement properties, with few missing data, high validity for scaling of tinnitus severity, and good reliability. All TFI versions exhibited the same eight factors characterizing tinnitus severity and negative impact. Responsiveness, evaluated by computing effect sizes for responses at follow-up, was satisfactory in all TFI versions.In the final TFI, Cronbach's alpha was 0.97 and test-retest reliability 0.78. Convergent validity (r = 0.86 with Tinnitus Handicap Inventory [THI]; r = 0.75 with Visual Analog Scale [VAS]) and discriminant validity (r = 0.56 with Beck Depression Inventory-Primary Care [BDI-PC]) were good. The final TFI was successful at detecting improvement from the initial clinic visit to 3 mo with moderate to large effect sizes and from initial to 6 mo with large effect sizes. Effect sizes for the TFI were generally larger than those obtained for the VAS and THI. After careful evaluation, a 13-point reduction was considered a preliminary criterion for meaningful reduction in TFI outcome scores. The TFI should be useful in both clinical and research settings because of its responsiveness to treatment-related change, validity for scaling the overall severity of tinnitus, and comprehensive coverage of multiple domains of tinnitus severity.
Tadić, Valerija; Cooper, Andrew; Cumberland, Phillippa; Lewando-Hundt, Gillian; Rahi, Jugnoo S
2013-12-01
To develop a novel age-appropriate measure of functional vision (FV) for self-reporting by visually impaired (VI) children and young people. Questionnaire development. A representative patient sample of VI children and young people aged 10 to 15 years, visual acuity of the logarithm of the minimum angle of resolution (logMAR) worse than 0.48, and a school-based (nonrandom) expert group sample of VI students aged 12 to 17 years. A total of 32 qualitative semistructured interviews supplemented by narrative feedback from 15 eligible VI children and young people were used to generate draft instrument items. Seventeen VI students were consulted individually on item relevance and comprehensibility, instrument instructions, format, and administration methods. The resulting draft instrument was piloted with 101 VI children and young people comprising a nationally representative sample, drawn from 21 hospitals in the United Kingdom. Initial item reduction was informed by presence of missing data and individual item response pattern. Exploratory factor analysis (FA) and parallel analysis (PA), and Rasch analysis (RA) were applied to test the instrument's psychometric properties. Psychometric indices and validity assessment of the Functional Vision Questionnaire for Children and Young People (FVQ_CYP). A total of 712 qualitative statements became a 56-item draft scale, capturing the level of difficulty in performing vision-dependent activities. After piloting, items were removed iteratively as follows: 11 for high percentage of missing data, 4 for skewness, and 1 for inadequate item infit and outfit values in RA, 3 having shown differential item functioning across age groups and 1 across gender in RA. The remaining 36 items showed item fit values within acceptable limits, good measurement precision and targeting, and ordered response categories. The reduced scale has a clear unidimensional structure, with all items having a high factor loading on the single factor in FA and PA. The summary scores correlated significantly with visual acuity. We have developed a novel, psychometrically robust self-report questionnaire for children and young people-the FVQ_CYP-that captures the functional impact of visual disability from their perspective. The 36-item, 4-point unidimensional scale has potential as a complementary adjunct to objective clinical assessments in routine pediatric ophthalmology practice and in research. Copyright © 2013 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
Development and validation of an item response theory-based Social Responsiveness Scale short form.
Sturm, Alexandra; Kuhfeld, Megan; Kasari, Connie; McCracken, James T
2017-09-01
Research and practice in autism spectrum disorder (ASD) rely on quantitative measures, such as the Social Responsiveness Scale (SRS), for characterization and diagnosis. Like many ASD diagnostic measures, SRS scores are influenced by factors unrelated to ASD core features. This study further interrogates the psychometric properties of the SRS using item response theory (IRT), and demonstrates a strategy to create a psychometrically sound short form by applying IRT results. Social Responsiveness Scale analyses were conducted on a large sample (N = 21,426) of youth from four ASD databases. Items were subjected to item factor analyses and evaluation of item bias by gender, age, expressive language level, behavior problems, and nonverbal IQ. Item selection based on item psychometric properties, DIF analyses, and substantive validity produced a reduced item SRS short form that was unidimensional in structure, highly reliable (α = .96), and free of gender, age, expressive language, behavior problems, and nonverbal IQ influence. The short form also showed strong relationships with established measures of autism symptom severity (ADOS, ADI-R, Vineland). Degree of association between all measures varied as a function of expressive language. Results identified specific SRS items that are more vulnerable to non-ASD-related traits. The resultant 16-item SRS short form may possess superior psychometric properties compared to the original scale and emerge as a more precise measure of ASD core symptom severity, facilitating research and practice. Future research using IRT is needed to further refine existing measures of autism symptomatology. © 2017 Association for Child and Adolescent Mental Health.
Haroz, E E; Bolton, P; Gross, A; Chan, K S; Michalopoulos, L; Bass, J
2016-07-01
Prevalence estimates of depression vary between countries, possibly due to differential functioning of items between settings. This study compared the performance of the widely used Hopkins symptom checklist 15-item depression scale (HSCL-15) across multiple settings using item response theory analyses. Data came from adult populations in the low and middle income countries (LMIC) of Colombia, Indonesia, Kurdistan Iraq, Rwanda, Iraq, Thailand (Burmese refugees), and Uganda (N = 4732). Item parameters based on a graded response model were compared across LMIC settings. Differential item functioning (DIF) by setting was evaluated using multiple indicators multiple causes (MIMIC) models. Most items performed well across settings except items related to suicidal ideation and "loss of sexual interest or pleasure," which had low discrimination parameters (suicide: a = 0.31 in Thailand to a = 2.49 in Indonesia; sexual interest: a = 0.74 in Rwanda to a = 1.26 in one region of Kurdistan). Most items showed some degree of DIF, but DIF only impacted aggregate scale-level scores in Indonesia. Thirteen of the 15 HSCL depression items performed well across diverse settings, with most items showing a strong relationship to the underlying trait of depression. The results support the cross-cultural applicability of most of these depression symptoms across LMIC settings. DIF impacted aggregate depression scores in one setting illustrating a possible source of measurement invariance in prevalence estimates.
Tadić, Valerija; Cooper, Andrew; Cumberland, Phillippa; Lewando-Hundt, Gillian; Rahi, Jugnoo S
2016-01-01
To report piloting and initial validation of the VQoL_CYP, a novel age-appropriate vision-related quality of life (VQoL) instrument for self-reporting by children with visual impairment (VI). Participants were a random patient sample of children with VI aged 10-15 years. 69 patients, drawn from patient databases at Great Ormond Street Hospital and Moorfields Eye Hospital, United Kingdom, participated in piloting of the draft 47-item VQoL instrument, which enabled preliminary item reduction. Subsequent administration of the instrument, alongside functional vision (FV) and generic health-related quality of life (HRQoL) self-report measures, to 101 children with VI comprising a nationally representative sample enabled further item reduction and evaluation of psychometric properties using Rasch analysis. Construct validity was assessed through Pearson correlation coefficients. Item reduction through piloting (8 items removed for skewness and individual item response pattern) and validation (1 item removed for skewness and 3 for misfit in Rasch) produced a 35-item scale, with fit values within acceptable limits, no notable differential item functioning, good measurement precision, ordered response categories and acceptable targeting in Rasch. The VQoL_CYP showed good construct validity, correlating strongly with HRQoL scores, moderately with FV scores but not with acuity. Robust child-appropriate self-report VQoL measures for children with VI are necessary for understanding the broader impacts of living with a visual disability, distinguishing these from limited functioning per se. Future planned use in larger patient samples will allow further psychometric development of the VQoL_CYP as an adjunct to objective outcomes assessment.
Medvedev, Oleg N; Turner-Stokes, Lynne; Ashford, Stephen; Siegert, Richard J
2018-02-28
To determine whether the UK Functional Assessment Measure (UK FIM+FAM) fits the Rasch model in stroke patients with complex disability and, if so, to derive a conversion table of Rasch-transformed interval level scores. The sample included a UK multicentre cohort of 1,318 patients admitted for specialist rehabilitation following a stroke. Rasch analysis was conducted for the 30-item scale including 3 domains of items measuring physical, communication and psychosocial functions. The fit of items to the Rasch model was examined using 3 different analytical approaches referred to as "pathways". The best fit was achieved in the pathway where responses from motor, communication and psychosocial domains were summarized into 3 super-items and where some items were split because of differential item functioning (DIF) relative to left and right hemisphere location (χ2 (10) = 14.48, p = 0.15). Re-scoring of items showing disordered thresholds did not significantly improve the overall model fit. The UK FIM+FAM with domain super-items satisfies expectations of the unidimensional Rasch model without the need for re-scoring. A conversion table was produced to convert the total scale scores into interval-level data based on person estimates of the Rasch model. The clinical benefits of interval-transformed scores require further evaluation.
Evaluating Statistical Targets for Assembling Parallel Mixed-Format Test Forms
ERIC Educational Resources Information Center
Debeer, Dries; Ali, Usama S.; van Rijn, Peter W.
2017-01-01
Test assembly is the process of selecting items from an item pool to form one or more new test forms. Often new test forms are constructed to be parallel with an existing (or an ideal) test. Within the context of item response theory, the test information function (TIF) or the test characteristic curve (TCC) are commonly used as statistical…
ERIC Educational Resources Information Center
Immekus, Jason C.; Maller, Susan J.
2009-01-01
The Kaufman Adolescent and Adult Intelligence Test (KAIT[TM]) is an individually administered test of intelligence for individuals ranging in age from 11 to 85+ years. The item response theory-likelihood ratio procedure, based on the two-parameter logistic model, was used to detect differential item functioning (DIF) in the KAIT across males and…
ERIC Educational Resources Information Center
Yi, Yeon-Sook
2017-01-01
The present study examines the relative importance of attributes within and across items by applying four cognitive diagnostic assessment models. The current study utilizes the function of the models that can indicate inter-attribute relationships that reflect the response behaviors of examinees to analyze scored test-taker responses to four forms…
Levis, Alexander W; Harel, Daphna; Kwakkenbos, Linda; Carrier, Marie-Eve; Mouthon, Luc; Poiraudeau, Serge; Bartlett, Susan J; Khanna, Dinesh; Malcarne, Vanessa L; Sauve, Maureen; van den Ende, Cornelia H M; Poole, Janet L; Schouffoer, Anne A; Welling, Joep; Thombs, Brett D
2016-11-01
To develop and validate a short form of the Cochin Hand Function Scale (CHFS), which measures hand disability, for use in systemic sclerosis, using objective criteria and reproducible techniques. Responses on the 18-item CHFS were obtained from English-speaking patients enrolled in the Scleroderma Patient-Centered Intervention Network Cohort. CHFS unidimensionality was verified using confirmatory factor analysis, and an item response theory model was fit to CHFS items. Optimal test assembly (OTA) methods identified a maximally precise short form for each possible form length between 1 and 17 items. The final short form selected was the form with the least number of items that maintained statistically equivalent convergent validity, compared to the full-length CHFS, with the Health Assessment Questionnaire (HAQ) disability index (DI) and the physical function domain of the 29-item Patient-Reported Outcomes Measurement Information System (PROMIS-29). There were 601 patients included. A 6-item short form of the CHFS (CHFS-6) was selected. The CHFS-6 had a Cronbach's alpha of 0.93. Correlations of the CHFS-6 summed score with HAQ DI (r = 0.79) and PROMIS-29 physical function (r = -0.54) were statistically equivalent to the CHFS (r = 0.81 and r = -0.56). The correlation with the full CHFS was high (r = 0.98). The OTA procedure generated a valid short form of the CHFS with minimal loss of information compared to the full-length form. The OTA method used was based on objective, prespecified criteria, but should be further studied for viability as a general procedure for shortening patient-reported outcome measures in health research. © 2016, American College of Rheumatology.
Bost, James E; Williams, Brian A; Bottegal, Matthew T; Dang, Qianyu; Rubio, Doris M
2007-12-01
We evaluated the validity and responsiveness of three instruments: the numeric rating scale (NRS) pain score, the 8-item Short-Form Health Survey (SF-8), and the 40-item Quality of Recovery from Anesthesia (QoR) Survey in 154 outpatients undergoing anterior cruciate ligament reconstruction (ACLR). The objective was to provide a robust psychometric basis for outcome survey selection for surgical outpatients undergoing regional anesthesia without general anesthesia. Patients undergoing ACLR with a standardized spinal anesthesia plan were randomized to receive a perineural catheter with either placebo injection-infusion, or injection-infusion with levobupivacaine. Patients completed the NRS, SF-8, and QoR instruments for four postoperative days to evaluate pain, physical function, and mental function. Regarding pain, neither the NRS nor the QoR offered advantages over the SF-8. Regarding physical function, the QoR physical independence composite offered no advantage over the SF-8 physical component summary. The QoR physical comfort composite assessed short-term changes in treatment-related side effects, and thus provided information not covered by the SF-8. Regarding mental function, the SF-8 mental component summary and QoR emotional state composite showed little change over the four days, although the latter measure showed higher responsiveness to change. For ACLR outpatients receiving regional anesthesia, the SF-8 is sufficient to assess postoperative pain and physical function. Adding the QoR physical comfort composite will help assess short-term side effects.
NASA Astrophysics Data System (ADS)
Greenberg, Ariela Caren
Differential item functioning (DIF) and differential distractor functioning (DDF) are methods used to screen for item bias (Camilli & Shepard, 1994; Penfield, 2008). Using an applied empirical example, this mixed-methods study examined the congruency and relationship of DIF and DDF methods in screening multiple-choice items. Data for Study I were drawn from item responses of 271 female and 236 male low-income children on a preschool science assessment. Item analyses employed a common statistical approach of the Mantel-Haenszel log-odds ratio (MH-LOR) to detect DIF in dichotomously scored items (Holland & Thayer, 1988), and extended the approach to identify DDF (Penfield, 2008). Findings demonstrated that the using MH-LOR to detect DIF and DDF supported the theoretical relationship that the magnitude and form of DIF and are dependent on the DDF effects, and demonstrated the advantages of studying DIF and DDF in multiple-choice items. A total of 4 items with DIF and DDF and 5 items with only DDF were detected. Study II incorporated an item content review, an important but often overlooked and under-published step of DIF and DDF studies (Camilli & Shepard). Interviews with 25 female and 22 male low-income preschool children and an expert review helped to interpret the DIF and DDF results and their comparison, and determined that a content review process of studied items can reveal reasons for potential item bias that are often congruent with the statistical results. Patterns emerged and are discussed in detail. The quantitative and qualitative analyses were conducted in an applied framework of examining the validity of the preschool science assessment scores for evaluating science programs serving low-income children, however, the techniques can be generalized for use with measures across various disciplines of research.
Cohen, Matthew L; Kisala, Pamela A; Dyson-Hudson, Trevor A; Tulsky, David S
2018-05-01
To develop modern patient-reported outcome measures that assess pain interference and pain behavior after spinal cord injury (SCI). Grounded-theory based qualitative item development; large-scale item calibration field-testing; confirmatory factor analyses; graded response model item response theory analyses; statistical linking techniques to transform scores to the Patient Reported Outcome Measurement Information System (PROMIS) metric. Five SCI Model Systems centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. N/A. Spinal Cord Injury - Quality of Life (SCI-QOL) Pain Interference item bank, SCI-QOL Pain Interference short form, and SCI-QOL Pain Behavior scale. Seven hundred fifty-seven individuals with traumatic SCI completed 58 items addressing various aspects of pain. Items were then separated by whether they assessed pain interference or pain behavior, and poorly functioning items were removed. Confirmatory factor analyses confirmed that each set of items was unidimensional, and item response theory analyses were used to estimate slopes and thresholds for the items. Ultimately, 7 items (4 from PROMIS) comprised the Pain Behavior scale and 25 items (18 from PROMIS) comprised the Pain Interference item bank. Ten of these 25 items were selected to form the Pain Interference short form. The SCI-QOL Pain Interference item bank and the SCI-QOL Pain Behavior scale demonstrated robust psychometric properties. The Pain Interference item bank is available as a computer adaptive test or short form for research and clinical applications, and scores are transformed to the PROMIS metric.
ERIC Educational Resources Information Center
Hou, Likun; de la Torre, Jimmy; Nandakumar, Ratna
2014-01-01
Analyzing examinees' responses using cognitive diagnostic models (CDMs) has the advantage of providing diagnostic information. To ensure the validity of the results from these models, differential item functioning (DIF) in CDMs needs to be investigated. In this article, the Wald test is proposed to examine DIF in the context of CDMs. This study…
Mokkink, Lidwine Brigitta; Galindo-Garre, Francisca; Uitdehaag, Bernard Mj
2016-12-01
The Multiple Sclerosis Walking Scale-12 (MSWS-12) measures walking ability from the patients' perspective. We examined the quality of the MSWS-12 using an item response theory model, the graded response model (GRM). A total of 625 unique Dutch multiple sclerosis (MS) patients were included. After testing for unidimensionality, monotonicity, and absence of local dependence, a GRM was fit and item characteristics were assessed. Differential item functioning (DIF) for the variables gender, age, duration of MS, type of MS and severity of MS, reliability, total test information, and standard error of the trait level (θ) were investigated. Confirmatory factor analysis showed a unidimensional structure of the 12 items of the scale, explaining 88% of the variance. Item 2 did not fit into the GRM model. Reliability was 0.93. Items 8 and 9 (of the 11 and 12 item version respectively) showed DIF on the variable severity, based on the Expanded Disability Status Scale (EDSS). However, the EDSS is strongly related to the content of both items. Our results confirm the good quality of the MSWS-12. The trait level (θ) scores and item parameters of both the 12- and 11-item versions were highly comparable, although we do not suggest to change the content of the MSWS-12. © The Author(s), 2016.
Bayes Factor Covariance Testing in Item Response Models.
Fox, Jean-Paul; Mulder, Joris; Sinharay, Sandip
2017-12-01
Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning the underlying covariance structure are evaluated using (fractional) Bayes factor tests. The support for a unidimensional factor (i.e., assumption of local independence) and differential item functioning are evaluated by testing the covariance components. The posterior distribution of common covariance components is obtained in closed form by transforming latent responses with an orthogonal (Helmert) matrix. This posterior distribution is defined as a shifted-inverse-gamma, thereby introducing a default prior and a balanced prior distribution. Based on that, an MCMC algorithm is described to estimate all model parameters and to compute (fractional) Bayes factor tests. Simulation studies are used to show that the (fractional) Bayes factor tests have good properties for testing the underlying covariance structure of binary response data. The method is illustrated with two real data studies.
Examining Multiple Sources of Differential Item Functioning on the Clinician & Group CAHPS® Survey
Rodriguez, Hector P; Crane, Paul K
2011-01-01
Objective To evaluate psychometric properties of a widely used patient experience survey. Data Sources English-language responses to the Clinician & Group Consumer Assessment of Healthcare Providers and Systems (CG-CAHPS®) survey (n = 12,244) from a 2008 quality improvement initiative involving eight southern California medical groups. Methods We used an iterative hybrid ordinal logistic regression/item response theory differential item functioning (DIF) algorithm to identify items with DIF related to patient sociodemographic characteristics, duration of the physician–patient relationship, number of physician visits, and self-rated physical and mental health. We accounted for all sources of DIF and determined its cumulative impact. Principal Findings The upper end of the CG-CAHPS® performance range is measured with low precision. With sensitive settings, some items were found to have DIF. However, overall DIF impact was negligible, as 0.14 percent of participants had salient DIF impact. Latinos who spoke predominantly English at home had the highest prevalence of salient DIF impact at 0.26 percent. Conclusions The CG-CAHPS® functions similarly across commercially insured respondents from diverse backgrounds. Consequently, previously documented racial and ethnic group differences likely reflect true differences rather than measurement bias. The impact of low precision at the upper end of the scale should be clarified. PMID:22092021
Ni, Pengsheng; McDonough, Christine M; Jette, Alan M; Bogusz, Kara; Marfeo, Elizabeth E; Rasch, Elizabeth K; Brandt, Diane E; Meterko, Mark; Haley, Stephen M; Chan, Leighton
2013-09-01
To develop and test an instrument to assess physical function for Social Security Administration (SSA) disability programs, the SSA-Physical Function (SSA-PF) instrument. Item response theory (IRT) analyses were used to (1) create a calibrated item bank for each of the factors identified in prior factor analyses, (2) assess the fit of the items within each scale, (3) develop separate computer-adaptive testing (CAT) instruments for each scale, and (4) conduct initial psychometric testing. Cross-sectional data collection; IRT analyses; CAT simulation. Telephone and Internet survey. Two samples: SSA claimants (n=1017) and adults from the U.S. general population (n=999). None. Model fit statistics, correlation, and reliability coefficients. IRT analyses resulted in 5 unidimensional SSA-PF scales: Changing & Maintaining Body Position, Whole Body Mobility, Upper Body Function, Upper Extremity Fine Motor, and Wheelchair Mobility for a total of 102 items. High CAT accuracy was demonstrated by strong correlations between simulated CAT scores and those from the full item banks. On comparing the simulated CATs with the full item banks, very little loss of reliability or precision was noted, except at the lower and upper ranges of each scale. No difference in response patterns by age or sex was noted. The distributions of claimant scores were shifted to the lower end of each scale compared with those of a sample of U.S. adults. The SSA-PF instrument contributes important new methodology for measuring the physical function of adults applying to the SSA disability programs. Initial evaluation revealed that the SSA-PF instrument achieved considerable breadth of coverage in each content domain and demonstrated noteworthy psychometric properties. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
EXTENDING THE FLOOR AND THE CEILING FOR ASSESSMENT OF PHYSICAL FUNCTION
Fries, James F.; Lingala, Bharathi; Siemons, Liseth; Glas, Cees A. W.; Cella, David; Hussain, Yusra N; Bruce, Bonnie; Krishnan, Eswar
2014-01-01
Objective The objective of the current study was to improve the assessment of physical function by improving the precision of assessment at the floor (extremely poor function) and at the ceiling (extremely good health) of the health continuum. Methods Under the NIH PROMIS program, we developed new physical function floor and ceiling items to supplement the existing item bank. Using item response theory (IRT) and the standard PROMIS methodology, we developed 30 floor items and 26 ceiling items and administered them during a 12-month prospective observational study of 737 individuals at the extremes of health status. Change over time was compared across anchor instruments and across items by means of effect sizes. Using the observed changes in scores, we back-calculated sample size requirements for the new and comparison measures. Results We studied 444 subjects with chronic illness and/or extreme age, and 293 generally fit subjects including athletes in training. IRT analyses confirmed that the new floor and ceiling items outperformed reference items (p<0.001). The estimated post-hoc sample size requirements were reduced by a factor of two to four at the floor and a factor of two at the ceiling. Conclusion Extending the range of physical function measurement can substantially improve measurement quality, can reduce sample size requirements and improve research efficiency. The paradigm shift from Disability to Physical Function includes the entire spectrum of physical function, signals improvement in the conceptual base of outcome assessment, and may be transformative as medical goals more closely approach societal goals for health. PMID:24782194
Fitting measurement models to vocational interest data: are dominance models ideal?
Tay, Louis; Drasgow, Fritz; Rounds, James; Williams, Bruce A
2009-09-01
In this study, the authors examined the item response process underlying 3 vocational interest inventories: the Occupational Preference Inventory (C.-P. Deng, P. I. Armstrong, & J. Rounds, 2007), the Interest Profiler (J. Rounds, T. Smith, L. Hubert, P. Lewis, & D. Rivkin, 1999; J. Rounds, C. M. Walker, et al., 1999), and the Interest Finder (J. E. Wall & H. E. Baker, 1997; J. E. Wall, L. L. Wise, & H. E. Baker, 1996). Item response theory (IRT) dominance models, such as the 2-parameter and 3-parameter logistic models, assume that item response functions (IRFs) are monotonically increasing as the latent trait increases. In contrast, IRT ideal point models, such as the generalized graded unfolding model, have IRFs that peak where the latent trait matches the item. Ideal point models are expected to fit better because vocational interest inventories ask about typical behavior, as opposed to requiring maximal performance. Results show that across all 3 interest inventories, the ideal point model provided better descriptions of the response process. The importance of specifying the correct item response model for precise measurement is discussed. In particular, scores computed by a dominance model were shown to be sometimes illogical: individuals endorsing mostly realistic or mostly social items were given similar scores, whereas scores based on an ideal point model were sensitive to which type of items respondents endorsed.
Developing an African youth psychosocial assessment: an application of item response theory.
Betancourt, Theresa S; Yang, Frances; Bolton, Paul; Normand, Sharon-Lise
2014-06-01
This study aimed to refine a dimensional scale for measuring psychosocial adjustment in African youth using item response theory (IRT). A 60-item scale derived from qualitative data was administered to 667 war-affected adolescents (55% female). Exploratory factor analysis (EFA) determined the dimensionality of items based on goodness-of-fit indices. Items with loadings less than 0.4 were dropped. Confirmatory factor analysis (CFA) was used to confirm the scale's dimensionality found under the EFA. Item discrimination and difficulty were estimated using a graded response model for each subscale using weighted least squares means and variances. Predictive validity was examined through correlations between IRT scores (θ) for each subscale and ratings of functional impairment. All models were assessed using goodness-of-fit and comparative fit indices. Fisher's Information curves examined item precision at different underlying ranges of each trait. Original scale items were optimized and reconfigured into an empirically-robust 41-item scale, the African Youth Psychosocial Assessment (AYPA). Refined subscales assess internalizing and externalizing problems, prosocial attitudes/behaviors and somatic complaints without medical cause. The AYPA is a refined dimensional assessment of emotional and behavioral problems in African youth with good psychometric properties. Validation studies in other cultures are recommended. Copyright © 2014 John Wiley & Sons, Ltd.
Developing an African youth psychosocial assessment: an application of item response theory
BETANCOURT, THERESA S.; YANG, FRANCES; BOLTON, PAUL; NORMAND, SHARON-LISE
2014-01-01
This study aimed to refine a dimensional scale for measuring psychosocial adjustment in African youth using item response theory (IRT). A 60-item scale derived from qualitative data was administered to 667 war-affected adolescents (55% female). Exploratory factor analysis (EFA) determined the dimensionality of items based on goodness-of-fit indices. Items with loadings less than 0.4 were dropped. Confirmatory factor analysis (CFA) was used to confirm the scale's dimensionality found under the EFA. Item discrimination and difficulty were estimated using a graded response model for each subscale using weighted least squares means and variances. Predictive validity was examined through correlations between IRT scores (θ) for each subscale and ratings of functional impairment. All models were assessed using goodness-of-fit and comparative fit indices. Fisher's Information curves examined item precision at different underlying ranges of each trait. Original scale items were optimized and reconfigured into an empirically-robust 41-item scale, the African Youth Psychosocial Assessment (AYPA). Refined subscales assess internalizing and externalizing problems, prosocial attitudes/behaviors and somatic complaints without medical cause. The AYPA is a refined dimensional assessment of emotional and behavioral problems in African youth with good psychometric properties. Validation studies in other cultures are recommended. PMID:24478113
Prisciandaro, James J; Tolliver, Bryan K
2016-11-15
The Young Mania Rating Scale (YMRS) and Montgomery-Asberg Depression Rating Scale (MADRS) are among the most widely used outcome measures for clinical trials of medications for Bipolar Disorder (BD). Nonetheless, very few studies have examined the measurement characteristics of the YMRS and MADRS in individuals with BD using modern psychometric methods. The present study evaluated the YMRS and MADRS in the Systematic Treatment Enhancement Program for BD (STEP-BD) study using Item Response Theory (IRT). Baseline data from 3716 STEP-BD participants were available for the present analysis. The Graded Response Model (GRM) was fit separately to YMRS and MADRS item responses. Differential item functioning (DIF) was examined by regressing a variety of clinically relevant covariates (e.g., sex, substance dependence) on all test items and on the latent symptom severity dimension, within each scale. Both scales: 1) contained several items that provided little or no psychometric information, 2) were inefficient, in that the majority of item response categories did not provide incremental psychometric information, 3) poorly measured participants outside of a narrow band of severity, 4) evidenced DIF for nearly all items, suggesting that item responses were, in part, determined by factors other than symptom severity. Limited to outpatients; DIF analysis only sensitive to certain forms of DIF. The present study provides evidence for significant measurement problems involving the YMRS and MADRS. More work is needed to refine these measures and/or develop suitable alternative measures of BD symptomatology for clinical trials research. Copyright © 2016 Elsevier B.V. All rights reserved.
RhinAsthma patient perspective: A Rasch validation study.
Molinengo, Giorgia; Baiardini, Ilaria; Braido, Fulvio; Loera, Barbara
2018-02-01
In daily practice, Health-Related Quality of Life (HRQoL) tools are useful for supplementing clinical data with the patient's perspective. To encourage their use by clinicians, the availability of tools that can quickly provide valid results is crucial. A new HRQoL tool has been proposed for patients with asthma and rhinitis: the RhinAsthma Patient Perspective-RAPP. The aim of this study was to evaluate the psychometric robustness of the RAPP using the Item Response Theory (IRT) approach, to evaluate the scalability of items and test whether or not patients use the items response scale correctly. 155 patients (53.5% women, mean age 39.1, range 16-76) were recruited during a multicenter study. RAPP metric properties were investigated using IRT models. Differential item functioning (DIF) was used for gender, age, and asthma control test (ACT). The RAPP adequately fitted the Rating Scale model, demonstrating the equality of the rating scale structure for all items. All statistics on items were satisfactory. The RAPP had adequate internal reliability and showed good ability to discriminate among different groups of participants. DIF analysis indicated that there were no differential item functioning issues for gender. One item showed a DIF by age and four items by ACT. The psychometric evaluation performed using IRT models demonstrated that the RAPP met all the criteria to be considered a reliable and valid method of measurement. From a clinical perspective, this will allow physicians to confidently interpret scores as good indicators of Quality of Life of patients with asthma.
Hong, Quan Nha; Coutu, Marie-France; Berbiche, Djamal
2017-01-01
The Work Role Functioning Questionnaire (WRFQ) was developed to assess workers' perceived ability to perform job demands and is used to monitor presenteeism. Still few studies on its validity can be found in the literature. The purpose of this study was to assess the items and factorial composition of the Canadian French version of the WRFQ (WRFQ-CF). Two measurement approaches were used to test the WRFQ-CF: Classical Test Theory (CTT) and non-parametric Item Response Theory (IRT). A total of 352 completed questionnaires were analyzed. A four-factor and three-factor model models were tested and shown respectively good fit with 14 items (Root Mean Square Error of Approximation (RMSEA) = 0.06, Standardized Root Mean Square Residual (SRMR) = 0.04, Bentler Comparative Fit Index (CFI) = 0.98) and with 17 items (RMSEA = 0.059, SRMR = 0.048, CFI = 0.98). Using IRT, 13 problematic items were identified, of which 9 were common with CTT. This study tested different models with fewer problematic items found in a three-factor model. Using a non-parametric IRT and CTT for item purification gave complementary results. IRT is still scarcely used and can be an interesting alternative method to enhance the quality of a measurement instrument. More studies are needed on the WRFQ-CF to refine its items and factorial composition.
Rocca, Corinne H.; Krishnan, Suneeta; Barrett, Geraldine; Wilson, Mark
2010-01-01
We evaluated the psychometric properties of the London Measure of Unplanned Pregnancy among Indian women using classical methods and Item Response Modeling. The scale exhibited good internal consistency and internal structure, with overall scores correlating well with each item’s response categories. Items performed similarly for pregnant and non-pregnant women, and scores decreased with increasing parity, providing evidence for validity. Analyses also detected limitations, including infrequent selection of middle response categories and some evidence of differential item functioning by parity. We conclude that the LMUP represents an improvement over existing measures but recommend steps for enhancing scale performance for this cultural context. PMID:21170147
Cordier, Reinie; Speyer, Renée; Schindler, Antonio; Michou, Emilia; Heijnen, Bas Joris; Baijens, Laura; Karaduman, Ayşe; Swan, Katina; Clavé, Pere; Joosten, Annette Veronica
2018-02-01
The Swallowing Quality of Life questionnaire (SWAL-QOL) is widely used clinically and in research to evaluate quality of life related to swallowing difficulties. It has been described as a valid and reliable tool, but was developed and tested using classic test theory. This study describes the reliability and validity of the SWAL-QOL using item response theory (IRT; Rasch analysis). SWAL-QOL data were gathered from 507 participants at risk of oropharyngeal dysphagia (OD) across four European countries. OD was confirmed in 75.7% of participants via videofluoroscopy and/or fiberoptic endoscopic evaluation, or a clinical diagnosis based on meeting selected criteria. Patients with esophageal dysphagia were excluded. Data were analysed using Rasch analysis. Item and person reliability was good for all the items combined. However, person reliability was poor for 8 subscales and item reliability was poor for one subscale. Eight subscales exhibited poor person separation and two exhibited poor item separation. Overall item and person fit statistics were acceptable. However, at an individual item fit level results indicated unpredictable item responses for 28 items, and item redundancy for 10 items. The item-person dimensionality map confirmed these findings. Results from the overall Rasch model fit and Principal Component Analysis were suggestive of a second dimension. For all the items combined, none of the item categories were 'category', 'threshold' or 'step' disordered; however, all subscales demonstrated category disordered functioning. Findings suggest an urgent need to further investigate the underlying structure of the SWAL-QOL and its psychometric characteristics using IRT.
Cheng, Su-Fen; Lee-Hsieh, Jane; Turton, Michael A; Lin, Kuan-Chia
2014-06-01
Little research has investigated the establishment of norms for nursing students' self-directed learning (SDL) ability, recognized as an important capability for professional nurses. An item response theory (IRT) approach was used to establish norms for SDL abilities valid for the different nursing programs in Taiwan. The purposes of this study were (a) to use IRT with a graded response model to reexamine the SDL instrument, or the SDLI, originally developed by this research team using confirmatory factor analysis and (b) to establish SDL ability norms for the four different nursing education programs in Taiwan. Stratified random sampling with probability proportional to size was used. A minimum of 15% of students from the four different nursing education degree programs across Taiwan was selected. A total of 7,879 nursing students from 13 schools were recruited. The research instrument was the 20-item SDLI developed by Cheng, Kuo, Lin, and Lee-Hsieh (2010). IRT with the graded response model was used with a two-parameter logistic model (discrimination and difficulty) for the data analysis, calculated using MULTILOG. Norms were established using percentile rank. Analysis of item information and test information functions revealed that 18 items exhibited very high discrimination and two items had high discrimination. The test information function was higher in this range of scores, indicating greater precision in the estimate of nursing student SDL. Reliability fell between .80 and .94 for each domain and the SDLI as a whole. The total information function shows that the SDLI is appropriate for all nursing students, except for the top 2.5%. SDL ability norms were established for each nursing education program and for the nation as a whole. IRT is shown to be a potent and useful methodology for scale evaluation. The norms for SDL established in this research will provide practical standards for nursing educators and students in Taiwan.
Assessing psychological well-being: self-report instruments for the NIH Toolbox.
Salsman, John M; Lai, Jin-Shei; Hendrie, Hugh C; Butt, Zeeshan; Zill, Nicholas; Pilkonis, Paul A; Peterson, Christopher; Stoney, Catherine M; Brouwers, Pim; Cella, David
2014-02-01
Psychological well-being (PWB) has a significant relationship with physical and mental health. As a part of the NIH Toolbox for the Assessment of Neurological and Behavioral Function, we developed self-report item banks and short forms to assess PWB. Expert feedback and literature review informed the selection of PWB concepts and the development of item pools for positive affect, life satisfaction, and meaning and purpose. Items were tested with a community-dwelling US Internet panel sample of adults aged 18 and above (N = 552). Classical and item response theory (IRT) approaches were used to evaluate unidimensionality, fit of items to the overall measure, and calibrations of those items, including differential item function (DIF). IRT-calibrated item banks were produced for positive affect (34 items), life satisfaction (16 items), and meaning and purpose (18 items). Their psychometric properties were supported based on the results of factor analysis, fit statistics, and DIF evaluation. All banks measured the concepts precisely (reliability ≥0.90) for more than 98% of participants. These adult scales and item banks for PWB provide the flexibility, efficiency, and precision necessary to promote future epidemiological, observational, and intervention research on the relationship of PWB with physical and mental health.
ERIC Educational Resources Information Center
Harwell, Michael; Moreno, Mario; Phillips, Alison; Guzey, S. Selcen; Moore, Tamara J.; Roehrig, Gillian H.
2015-01-01
The purpose of this study was to develop, scale, and validate assessments in engineering, science, and mathematics with grade appropriate items that were sensitive to the curriculum developed by teachers. The use of item response theory to assess item functioning was a focus of the study. The work is part of a larger project focused on increasing…
The e-MSWS-12: improving the multiple sclerosis walking scale using item response theory.
Engelhard, Matthew M; Schmidt, Karen M; Engel, Casey E; Brenton, J Nicholas; Patek, Stephen D; Goldman, Myla D
2016-12-01
The Multiple Sclerosis Walking Scale (MSWS-12) is the predominant patient-reported measure of multiple sclerosis (MS) -elated walking ability, yet it had not been analyzed using item response theory (IRT), the emerging standard for patient-reported outcome (PRO) validation. This study aims to reduce MSWS-12 measurement error and facilitate computerized adaptive testing by creating an IRT model of the MSWS-12 and distributing it online. MSWS-12 responses from 284 subjects with MS were collected by mail and used to fit and compare several IRT models. Following model selection and assessment, subpopulations based on age and sex were tested for differential item functioning (DIF). Model comparison favored a one-dimensional graded response model (GRM). This model met fit criteria and explained 87 % of response variance. The performance of each MSWS-12 item was characterized using category response curves (CRCs) and item information. IRT-based MSWS-12 scores correlated with traditional MSWS-12 scores (r = 0.99) and timed 25-foot walk (T25FW) speed (r = -0.70). Item 2 showed DIF based on age (χ 2 = 19.02, df = 5, p < 0.01), and Item 11 showed DIF based on sex (χ 2 = 13.76, df = 5, p = 0.02). MSWS-12 measurement error depends on walking ability, but could be lowered by improving or replacing items with low information or DIF. The e-MSWS-12 includes IRT-based scoring, error checking, and an estimated T25FW derived from MSWS-12 responses. It is available at https://ms-irt.shinyapps.io/e-MSWS-12 .
2013-01-01
Background Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. Methods The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Results Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. Conclusions The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information. PMID:23453056
Zoanetti, Nathan; Beaves, Mark; Griffin, Patrick; Wallace, Euan M
2013-03-04
Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information.
NASA Astrophysics Data System (ADS)
Chiu, Tina
This dissertation includes three studies that analyze a new set of assessment tasks developed by the Learning Progressions in Middle School Science (LPS) Project. These assessment tasks were designed to measure science content knowledge on the structure of matter domain and scientific argumentation, while following the goals from the Next Generation Science Standards (NGSS). The three studies focus on the evidence available for the success of this design and its implementation, generally labelled as "validity" evidence. I use explanatory item response models (EIRMs) as the overarching framework to investigate these assessment tasks. These models can be useful when gathering validity evidence for assessments as they can help explain student learning and group differences. In the first study, I explore the dimensionality of the LPS assessment by comparing the fit of unidimensional, between-item multidimensional, and Rasch testlet models to see which is most appropriate for this data. By applying multidimensional item response models, multiple relationships can be investigated, and in turn, allow for a more substantive look into the assessment tasks. The second study focuses on person predictors through latent regression and differential item functioning (DIF) models. Latent regression models show the influence of certain person characteristics on item responses, while DIF models test whether one group is differentially affected by specific assessment items, after conditioning on latent ability. Finally, the last study applies the linear logistic test model (LLTM) to investigate whether item features can help explain differences in item difficulties.
Peterson, Alexander C; Sutherland, Jason M; Liu, Guiping; Crump, R Trafford; Karimuddin, Ahmer A
2018-06-01
The Fecal Incontinence Quality of Life Scale (FIQL) is a commonly used patient-reported outcome measure for fecal incontinence, often used in clinical trials, yet has not been validated in English since its initial development. This study uses modern methods to thoroughly evaluate the psychometric characteristics of the FIQL and its potential for differential functioning by gender. This study analyzed prospectively collected patient-reported outcome data from a sample of patients prior to colorectal surgery. Patients were recruited from 14 general and colorectal surgeons in Vancouver Coastal Health hospitals in Vancouver, Canada. Confirmatory factor analysis was used to assess construct validity. Item response theory was used to evaluate test reliability, describe item-level characteristics, identify local item dependence, and test for differential functioning by gender. 236 patients were included for analysis, with mean age 58 and approximately half female. Factor analysis failed to identify the lifestyle, coping, depression, and embarrassment domains, suggesting lack of construct validity. Items demonstrated low difficulty, indicating that the test has the highest reliability among individuals who have low quality of life. Five items are suggested for removal or replacement. Differential test functioning was minimal. This study has identified specific improvements that can be made to each domain of the Fecal Incontinence Quality of Life Scale and to the instrument overall. Formatting, scoring, and instructions may be simplified, and items with higher difficulty developed. The lifestyle domain can be used as is. The embarrassment domain should be significantly revised before use.
Haggerty, Jeannie L.; Bouharaoui, Fatima; Santor, Darcy A.
2011-01-01
Evaluating the extent to which groups or subgroups of individuals differ with respect to primary healthcare experience depends on first ruling out the possibility of bias. Objective: To determine whether item or subscale performance differs systematically between French/English, high/low education subgroups and urban/rural residency. Method: A sample of 645 adult users balanced by French/English language (in Quebec and Nova Scotia, respectively), high/low education and urban/rural residency responded to six validated instruments: the Primary Care Assessment Survey (PCAS); the Primary Care Assessment Tool – Short Form (PCAT-S); the Components of Primary Care Index (CPCI); the first version of the EUROPEP (EUROPEP-I); the Interpersonal Processes of Care Survey, version II (IPC-II); and part of the Veterans Affairs National Outpatient Customer Satisfaction Survey (VANOCSS). We normalized subscale scores to a 0-to-10 scale and tested for between-group differences using ANOVA tests. We used a parametric item response model to test for differences between subgroups in item discriminability and item difficulty. We re-examined group differences after removing items with differential item functioning. Results: Experience of care was assessed more positively in the English-speaking (Nova Scotia) than in the French-speaking (Quebec) respondents. We found differential English/French item functioning in 48% of the 153 items: discriminability in 20% and differential difficulty in 28%. English items were more discriminating generally than the French. Removing problematic items did not change the differences in French/English assessments. Differential item functioning by high/low education status affected 27% of items, with items being generally more discriminating in high-education groups. Between-group comparisons were unchanged. In contrast, only 9% of items showed differential item functioning by geography, affecting principally the accessibility attribute. Removing problematic items reversed a previously non-significant finding, revealing poorer first-contact access in rural than in urban areas. Conclusion: Differential item functioning does not bias or invalidate French/English comparisons on subscales, but additional development is required to make French and English items equivalent. These instruments are relatively robust by educational status and geography, but results suggest potential differences in the underlying construct in low-education and rural respondents. PMID:23205035
Jafari, Peyman; Sharafi, Zahra; Bagheri, Zahra; Shalileh, Sara
2014-06-01
Measurement equivalence is a necessary assumption for meaningful comparison of pediatric quality of life rated by children and parents. In this study, differential item functioning (DIF) analysis is used to examine whether children and their parents respond consistently to the items in the KINDer Lebensqualitätsfragebogen (KINDL; in German, Children Quality of Life Questionnaire). Two DIF detection methods, graded response model (GRM) and ordinal logistic regression (OLR), were applied for comparability. The KINDL was completed by 1,086 school children and 1,061 of their parents. While the GRM revealed that 12 out of the 24 items were flagged with DIF, the OLR identified 14 out of the 24 items with DIF. Seven items with DIF and five items without DIF were common across the two methods, yielding a total agreement rate of 50 %. This study revealed that parent proxy-reports cannot be used as a substitute for a child's ratings in the KINDL.
Packham, Tara L; Cappelleri, Joseph C; Sadosky, Alesia; MacDermid, Joy C; Brunner, Florian
2017-03-04
painDETECT (PD-Q) is a self-reported assessment of pain qualities developed as a screening tool for pain of neuropathic origin. Rasch analysis is a strategy for examining the measurement characteristics of a scale using a form of item response theory. We conducted a Rasch analysis to consider if the scoring and measurement properties of PD-Q would support its use as an outcome measure. Rasch analysis was conducted on PD-Q scores drawn from a cross-sectional study of the burden and costs of NeP. The analysis followed an iterative process based on recommendations in the literature, including examination of sequential scoring categories, unidimensionality, reliability and differential item function. Data from 624 persons with a diagnosis of painful diabetic polyneuropathy, small fibre neuropathy, and neuropathic pain associated with chronic low back pain, spinal cord injury, HIV-related pain, or chronic post-surgical pain was used for this analysis. PD-Q demonstrated fit to the Rasch model after adjustments of scoring categories for four items, and omission of the time course and radiating questions. The resulting seven-item scale of pain qualities demonstrated good reliability with a person-separation index of 0.79. No scoring bias (differential item functioning) was found for this version. Rasch modelling suggests the seven pain-qualities items from PD-Q may be used as an outcome measure. Further research is required to confirm validity and responsiveness in a clinical setting.
Best Design for Multidimensional Computerized Adaptive Testing With the Bifactor Model
Seo, Dong Gi; Weiss, David J.
2015-01-01
Most computerized adaptive tests (CATs) have been studied using the framework of unidimensional item response theory. However, many psychological variables are multidimensional and might benefit from using a multidimensional approach to CATs. This study investigated the accuracy, fidelity, and efficiency of a fully multidimensional CAT algorithm (MCAT) with a bifactor model using simulated data. Four item selection methods in MCAT were examined for three bifactor pattern designs using two multidimensional item response theory models. To compare MCAT item selection and estimation methods, a fixed test length was used. The Ds-optimality item selection improved θ estimates with respect to a general factor, and either D- or A-optimality improved estimates of the group factors in three bifactor pattern designs under two multidimensional item response theory models. The MCAT model without a guessing parameter functioned better than the MCAT model with a guessing parameter. The MAP (maximum a posteriori) estimation method provided more accurate θ estimates than the EAP (expected a posteriori) method under most conditions, and MAP showed lower observed standard errors than EAP under most conditions, except for a general factor condition using Ds-optimality item selection. PMID:29795848
Baylor, Carolyn; Yorkston, Kathryn; Eadie, Tanya; Kim, Jiseon; Chung, Hyewon; Amtmann, Dagmar
2015-01-01
Purpose The purpose of this study was to calibrate the items for the Communicative Participation Item Bank (CPIB) using Item Response Theory (IRT). One overriding objective was to examine if the IRT item parameters would be consistent across different diagnostic groups, thereby allowing creation of a disorder-generic instrument. The intended outcomes were the final item bank and a short form ready for clinical and research applications. Methods Self-report data were collected from 701 individuals representing four diagnoses: multiple sclerosis, Parkinson’s disease, amyotrophic lateral sclerosis and head and neck cancer. Participants completed the CPIB and additional self-report questionnaires. CPIB data were analyzed using the IRT Graded Response Model (GRM). Results The initial set of 94 candidate CPIB items were reduced to an item bank of 46 items demonstrating unidimensionality, local independence, good item fit, and good measurement precision. Differential item function (DIF) analyses detected no meaningful differences across diagnostic groups. A 10-item, disorder-generic short form was generated. Conclusions The CPIB provides speech-language pathologists with a unidimensional, self-report outcomes measurement instrument dedicated to the construct of communicative participation. This instrument may be useful to clinicians and researchers wanting to implement measures of communicative participation in their work. PMID:23816661
Sources of Interactional Problems in a Survey of Racial/Ethnic Discrimination
Johnson, Timothy P.; Shariff-Marco, Salma; Willis, Gordon; Cho, Young Ik; Breen, Nancy; Gee, Gilbert C.; Krieger, Nancy; Grant, David; Alegria, Margarita; Mays, Vickie M.; Williams, David R.; Landrine, Hope; Liu, Benmei; Reeve, Bryce B.; Takeuchi, David; Ponce, Ninez A.
2014-01-01
Cross-cultural variability in respondent processing of survey questions may bias results from multiethnic samples. We analyzed behavior codes, which identify difficulties in the interactions of respondents and interviewers, from a discrimination module contained within a field test of the 2007 California Health Interview Survey. In all, 553 (English) telephone interviews yielded 13,999 interactions involving 22 items. Multilevel logistic regression modeling revealed that respondent age and several item characteristics (response format, customized questions, length, and first item with new response format), but not race/ethnicity, were associated with interactional problems. These findings suggest that item function within a multi-cultural, albeit English language, survey may be largely influenced by question features, as opposed to respondent characteristics such as race/ethnicity. PMID:26166949
Arias González, Víctor B; Crespo Sierra, María Teresa; Arias Martínez, Benito; Martínez-Molina, Agustín; Ponce, Fernando P
2015-09-23
The Connor-Davidson Resilience Scale (CD-RISC) is inarguably one of the best-known instruments in the field of resilience assessment. However, the criteria for the psychometric quality of the instrument were based only on classical test theory. The aim of this paper has focused on the calibration of the CD-RISC with a nonclinical sample of 444 adults using the Rasch-Andrich Rating Scale Model, in order to clarify its structure and analyze its psychometric properties at the level of item. Two items showed misfit to the model and were eliminated. The remaining 22 items form basically a unidimensional scale. The CD-RISC has good psychometric properties. The fit of both the items and the persons to the Rasch model was good, and the response categories were functioning properly. Two of the items showed differential item functioning. The CD-RISC has an obvious ceiling effect, which suggests to include more difficult items in future versions of the scale.
Reliability and validity of a short form household food security scale in a Caribbean community.
Gulliford, Martin C; Mahabir, Deepak; Rocke, Brian
2004-06-16
We evaluated the reliability and validity of the short form household food security scale in a different setting from the one in which it was developed. The scale was interview administered to 531 subjects from 286 households in north central Trinidad in Trinidad and Tobago, West Indies. We evaluated the six items by fitting item response theory models to estimate item thresholds, estimating agreement among respondents in the same households and estimating the slope index of income-related inequality (SII) after adjusting for age, sex and ethnicity. Item-score correlations ranged from 0.52 to 0.79 and Cronbach's alpha was 0.87. Item responses gave within-household correlation coefficients ranging from 0.70 to 0.78. Estimated item thresholds (standard errors) from the Rasch model ranged from -2.027 (0.063) for the 'balanced meal' item to 2.251 (0.116) for the 'hungry' item. The 'balanced meal' item had the lowest threshold in each ethnic group even though there was evidence of differential functioning for this item by ethnicity. Relative thresholds of other items were generally consistent with US data. Estimation of the SII, comparing those at the bottom with those at the top of the income scale, gave relative odds for an affirmative response of 3.77 (95% confidence interval 1.40 to 10.2) for the lowest severity item, and 20.8 (2.67 to 162.5) for highest severity item. Food insecurity was associated with reduced consumption of green vegetables after additionally adjusting for income and education (0.52, 0.28 to 0.96). The household food security scale gives reliable and valid responses in this setting. Differing relative item thresholds compared with US data do not require alteration to the cut-points for classification of 'food insecurity without hunger' or 'food insecurity with hunger'. The data provide further evidence that re-evaluation of the 'balanced meal' item is required.
Using a psychometric lens to examine gender differences on the FCI
NASA Astrophysics Data System (ADS)
Lindell, Rebecca; Papak, Alexis; Stewart, John; Traxler, Adrienne
2017-01-01
Multiple research studies show that there appears to be an inherent difference between male and female students' performance on the Force Concept Inventory (FCI). Unlike these studies, we chose to create two different samples, one with only female students and the other with only male students, to reduce the effects of the gender-imbalance inherent in a single sample of all physics students. Using a psychometric lens, we evaluate the differences between the male and female students' performance on the FCI. We utilized classical test theory to flag 13 items on the FCI that were poorly functioning for female students. Notably, most of these items were not flagged when the dataset was aggregated across genders. In the next stage of the research, we utilized Item Response Theory (IRT) to discover if the remaining 17 items on the FCI are also poorly functioning for female students. By eliminating the poorly functioning items on the FCI, we further examined the gender difference of the Force Concept Inventory.
A Multidimensional Ideal Point Item Response Theory Model for Binary Data.
Maydeu-Olivares, Albert; Hernández, Adolfo; McDonald, Roderick P
2006-12-01
We introduce a multidimensional item response theory (IRT) model for binary data based on a proximity response mechanism. Under the model, a respondent at the mode of the item response function (IRF) endorses the item with probability one. The mode of the IRF is the ideal point, or in the multidimensional case, an ideal hyperplane. The model yields closed form expressions for the cell probabilities. We estimate and test the goodness of fit of the model using only information contained in the univariate and bivariate moments of the data. Also, we pit the new model against the multidimensional normal ogive model estimated using NOHARM in four applications involving (a) attitudes toward censorship, (b) satisfaction with life, (c) attitudes of morality and equality, and (d) political efficacy. The normal PDF model is not invariant to simple operations such as reverse scoring. Thus, when there is no natural category to be modeled, as in many personality applications, it should be fit separately with and without reverse scoring for comparisons.
Generalized Full-Information Item Bifactor Analysis
Cai, Li; Yang, Ji Seung; Hansen, Mark
2011-01-01
Full-information item bifactor analysis is an important statistical method in psychological and educational measurement. Current methods are limited to single group analysis and inflexible in the types of item response models supported. We propose a flexible multiple-group item bifactor analysis framework that supports a variety of multidimensional item response theory models for an arbitrary mixing of dichotomous, ordinal, and nominal items. The extended item bifactor model also enables the estimation of latent variable means and variances when data from more than one group are present. Generalized user-defined parameter restrictions are permitted within or across groups. We derive an efficient full-information maximum marginal likelihood estimator. Our estimation method achieves substantial computational savings by extending Gibbons and Hedeker’s (1992) bifactor dimension reduction method so that the optimization of the marginal log-likelihood only requires two-dimensional integration regardless of the dimensionality of the latent variables. We use simulation studies to demonstrate the flexibility and accuracy of the proposed methods. We apply the model to study cross-country differences, including differential item functioning, using data from a large international education survey on mathematics literacy. PMID:21534682
de Sá Junior, Antonio Reis; de Andrade, Arthur Guerra; Andrade, Laura Helena; Gorenstein, Clarice; Wang, Yuan-Pang
2018-07-01
This study examines the response pattern of depressive symptoms in a nationwide student sample, through item analyses of a rating scale by both classical test theory (CTT) and item response theory (IRT). The 21-item Beck Depression Inventory-II (BDI-II) was administered to 12,711 college students. First, the psychometric properties of the scale were described. Thereafter, the endorsement probability of depressive symptom in each scale item was analyzed through CTT and IRT. Graphical plots depicted the endorsement probability of scale items and intensity of depression. Three items of different difficulty level were compared through CTT and IRT approach. Four in five students reported the presence of depressive symptoms. The BDI-II items presented good reliability and were distributed along the symptomatic continuum of depression. Similarly, in both CTT and IRT approaches, the item 'changes in sleep' was easily endorsed, 'loss of interest' moderately and 'suicidal thoughts' hardly. Graphical representation of BDI-II of both methods showed much equivalence in terms of item discrimination and item difficulty. The item characteristic curve of the IRT method provided informative evaluation of item performance. The inventory was applied only in college students. Depressive symptoms were frequent psychopathological manifestations among college students. The performance of the BDI-II items indicated convergent results from both methods of analysis. While the CTT was easy to understand and to apply, the IRT was more complex to understand and to implement. Comprehensive assessment of the functioning of each BDI-II item might be helpful in efficient detection of depressive conditions in college students. Copyright © 2018 Elsevier B.V. All rights reserved.
A Rasch Analysis of the Junior Metacognitive Awareness Inventory with Singapore Students
ERIC Educational Resources Information Center
Ning, Hoi Kwan
2018-01-01
The psychometric properties of the 2 versions of the Junior Metacognitive Awareness Inventory were examined with Singapore student samples. Other than 2 misfitting items and an underutilized response scale, Rasch analysis demonstrated that the instruments have good measurement precision, and no differential item functioning was detected across…
The Discriminating Power of Items that Measure More than One Dimension.
ERIC Educational Resources Information Center
Reckase, Mark D.
The work presented in this paper defined conceptually the concepts of multidimensional discrimination and information, derived mathematical expressions for the concepts for a particular multidimensional item response theory (IRT) model, and applied the concepts to actual test data. Multidimensional discrimination was defined as a function of the…
Cognitive Diagnostic Attribute-Level Discrimination Indices
ERIC Educational Resources Information Center
Henson, Robert; Roussos, Louis; Douglas, Jeff; He, Xuming
2008-01-01
Cognitive diagnostic models (CDMs) model the probability of correctly answering an item as a function of an examinee's attribute mastery pattern. Because estimation of the mastery pattern involves more than a continuous measure of ability, reliability concepts introduced by classical test theory and item response theory do not apply. The cognitive…
ERIC Educational Resources Information Center
Instructional Objectives Exchange, Los Angeles, CA.
Ninety objectives and related test items for use in grades 7 through 12 are presented. Each sample contains an objective, test items, and criteria for judging the adequacy of the response. Objectives are organized into the following categories: (1) property of metals; (2) operations and functions; (3) cutting and shearing; (4) filing; (5) cutting…
IRTs of the ABCs: Children's Letter Name Acquisition
ERIC Educational Resources Information Center
Phillips, Beth M.; Piasta, Shayne B.; Anthony, Jason L.; Lonigan, Christopher J.; Francis, David J.
2012-01-01
We examined the developmental sequence of letter name knowledge acquisition by children from 2 to five years of age. Data from 2 samples representing diverse regions, ethnicity, and socioeconomic backgrounds (ns=1074 and 500) were analyzed using item response theory (IRT) and differential item functioning techniques. Results from factor analyses…
Ni, Pengsheng; McDonough, Christine M.; Jette, Alan M.; Bogusz, Kara; Marfeo, Elizabeth E.; Rasch, Elizabeth K.; Brandt, Diane E.; Meterko, Mark; Chan, Leighton
2014-01-01
Objectives To develop and test an instrument to assess physical function (PF) for Social Security Administration (SSA) disability programs, the SSA-PF. Item Response Theory (IRT) analyses were used to 1) create a calibrated item bank for each of the factors identified in prior factor analyses, 2) assess the fit of the items within each scale, 3) develop separate Computer-Adaptive Test (CAT) instruments for each scale, and 4) conduct initial psychometric testing. Design Cross-sectional data collection; IRT analyses; CAT simulation. Setting Telephone and internet survey. Participants Two samples: 1,017 SSA claimants, and 999 adults from the US general population. Interventions None. Main Outcome Measure Model fit statistics, correlation and reliability coefficients, Results IRT analyses resulted in five unidimensional SSA-PF scales: Changing & Maintaining Body Position, Whole Body Mobility, Upper Body Function, Upper Extremity Fine Motor, and Wheelchair Mobility for a total of 102 items. High CAT accuracy was demonstrated by strong correlations between simulated CAT scores and those from the full item banks. Comparing the simulated CATs to the full item banks, very little loss of reliability or precision was noted, except at the lower and upper ranges of each scale. No difference in response patterns by age or sex was noted. The distributions of claimant scores were shifted to the lower end of each scale compared to those of a sample of US adults. Conclusions The SSA-PF instrument contributes important new methodology for measuring the physical function of adults applying to the SSA disability programs. Initial evaluation revealed that the SSA-PF instrument achieved considerable breadth of coverage in each content domain and demonstrated noteworthy psychometric properties. PMID:23578594
Doostfatemeh, Marziyeh; Ayatollahi, Seyyed Mohammad Taghi; Jafari, Peyman
2015-08-01
In child-parent agreement studies in the field of paediatric health-related quality of life (HRQoL), little attention has been paid to the effect of gender in parental proxy rating of children's HRQoL. This study aims to test the potential interchangeability of parent dyads in reporting children's HRQoL on both item and scale levels of the PedsQL™ 4.0 instrument, using the approach of differential item functioning (DIF). The PedsQL™ 4.0 Generic Core Scales were completed by 576 father-and-mother dyads. A polytomous item response theory model, graded response model, was used to detect DIF across fathers and mothers. Assessment at item level showed that fathers and mothers perceived the meaning of items of the PedsQL™ 4.0 consistently. Regarding the scale level, a moderate to high level of agreement was observed between mothers' and fathers' reports on all similar subscales. Although the significant mean score differences in total, physical and emotional functioning indicated that fathers gave higher scores to their children, the small effect size implied that this difference may not be practically meaningful. Our findings revealed that discrepancy in parent dyads in rating children's HRQoL is a "real" difference and not an artefact due to measurement non-invariance. Fathers were seen to have slightly different insights into their children, especially for emotional functioning, but overall the results were not all that different. This suggests that paternal proxy-reports can be included in studies along with maternal proxy-reports, and the two may be combined when looking at parent-child agreement. Parent-child agreement studies in Iran are not affected by parents' gender, and therefore, researchers may rely on the assumption of the interchangeability of fathers and mothers in these studies.
Taylor, Fiona; Higgins, Sophie; Carson, Robyn T; Eremenco, Sonya; Foley, Catherine; Lacy, Brian E; Parkman, Henry P; Reasner, David S; Shields, Alan L; Tack, Jan; Talley, Nicholas J
2018-01-01
Objectives: The Functional Dyspepsia Symptom Diary (FDSD) was developed to address the lack of symptom-focused, patient-reported outcome (PRO) measures designed for use in functional dyspepsia (FD) patients and meeting Food and Drug Administration recommendations for PRO instrument development. Methods: Concept elicitation interviews were conducted with FD participants to identify symptoms important and relevant to FD patients. A preliminary version of the FDSD was constructed, then completed by FD participants on an electronic device in cognitive interviews to evaluate the readability, comprehensibility, relevance, and comprehensiveness of the FDSD, and to preliminarily evaluate its measurement properties. Results: During concept elicitation interviews, 45 participants spontaneously reported 19 symptom concepts. Of those, seven symptoms were selected for assessment by the eight-item FDSD. Cognitive interviews with 57 participants confirmed that participants were able to comprehend and provide meaningful responses to the FDSD, and that the handheld electronic FDSD format was suitable for use in the target population. Scores of the FDSD were well-distributed among response options, item discrimination indices suggested that the FDSD items differentiate among patients with varying degrees of FD severity, and inter-item correlations suggested that no items of the FDSD were capturing redundant information. Internal consistency estimates (0.87) and construct-related validity estimates using known-groups methods were within acceptable ranges. Conclusions: The FDSD is a content-valid PRO measure, with preliminary psychometric evidence providing support for the FDSD’s items and total score. Further psychometric evaluations are recommended to more fully test the FDSD’s score performance and other measurement properties in the target patient population. PMID:28925989
Verdam, Mathilde G E; Oort, Frans J; Sprangers, Mirjam A G
2016-06-01
The structural equation modeling (SEM) approach for detection of response shift (Oort in Qual Life Res 14:587-598, 2005. doi: 10.1007/s11136-004-0830-y ) is especially suited for continuous data, e.g., questionnaire scales. The present objective is to explain how the SEM approach can be applied to discrete data and to illustrate response shift detection in items measuring health-related quality of life (HRQL) of cancer patients. The SEM approach for discrete data includes two stages: (1) establishing a model of underlying continuous variables that represent the observed discrete variables, (2) using these underlying continuous variables to establish a common factor model for the detection of response shift and to assess true change. The proposed SEM approach was illustrated with data of 485 cancer patients whose HRQL was measured with the SF-36, before and after start of antineoplastic treatment. Response shift effects were detected in items of the subscales mental health, physical functioning, role limitations due to physical health, and bodily pain. Recalibration response shifts indicated that patients experienced relatively fewer limitations with "bathing or dressing yourself" (effect size d = 0.51) and less "nervousness" (d = 0.30), but more "pain" (d = -0.23) and less "happiness" (d = -0.16) after antineoplastic treatment as compared to the other symptoms of the same subscale. Overall, patients' mental health improved, while their physical health, vitality, and social functioning deteriorated. No change was found for the other subscales of the SF-36. The proposed SEM approach to discrete data enables response shift detection at the item level. This will lead to a better understanding of the response shift phenomena at the item level and therefore enhances interpretation of change in the area of HRQL.
Rasch Analysis of the Power as Knowing Participation in Change Tool--the Brazilian version.
Guedes, Erika de Souza; Orozco-Vargas, Luiz Carlos; Turrini, Ruth Natália Teresa; de Sousa, Regina Márcia Cardoso; dos Santos, Mariana Alvina; da Cruz, Diná de Almeida Lopes Monteiro
2013-01-01
the objective of this study was to evaluate the items contained in the Brazilian version of the Power as Knowing Participation in Change Tool (PKPCT). investigation of the psychometric properties of the mentioned questionnaire through Rasch analysis. the data from 952 nursing assistants and 627 baccalaureate nurses were analyzed (average age 44.1 (SD=9.5); 13.0% men). The subscales Choices, Awareness, Freedom and Involvement were tested separately and presented unidimensionality; the categories of the responses given to the items were compiled from 7 to 3 levels and the items fit the model well, except for the following/leading item, in which the infit and outfit values were above 1.4; this item has also presented Differential Item Functioning (DIF) according to the participant's role. The reliability of the items was of 0.99 and the reliability of the participants ranged from 0.80 to 0.84 in the subscales. Items with extremely high levels of difficulty were not identified. the PKPCT should not be viewed as unidimensional, items with extremely high levels of difficulty in the scale need to be created and the differential functioning of some items has to be further investigated.
Dirven, Linda; Groenvold, Mogens; Taphoorn, Martin J B; Conroy, Thierry; Tomaszewski, Krzysztof A; Young, Teresa; Petersen, Morten Aa
2017-11-01
The European Organisation of Research and Treatment of Cancer (EORTC) Quality of Life Group is developing computerized adaptive testing (CAT) versions of all EORTC Quality of Life Questionnaire (QLQ-C30) scales with the aim to enhance measurement precision. Here we present the results on the field-testing and psychometric evaluation of the item bank for cognitive functioning (CF). In previous phases (I-III), 44 candidate items were developed measuring CF in cancer patients. In phase IV, these items were psychometrically evaluated in a large sample of international cancer patients. This evaluation included an assessment of dimensionality, fit to the item response theory (IRT) model, differential item functioning (DIF), and measurement properties. A total of 1030 cancer patients completed the 44 candidate items on CF. Of these, 34 items could be included in a unidimensional IRT model, showing an acceptable fit. Although several items showed DIF, these had a negligible impact on CF estimation. Measurement precision of the item bank was much higher than the two original QLQ-C30 CF items alone, across the whole continuum. Moreover, CAT measurement may on average reduce study sample sizes with about 35-40% compared to the original QLQ-C30 CF scale, without loss of power. A CF item bank for CAT measurement consisting of 34 items was established, applicable to various cancer patients across countries. This CAT measurement system will facilitate precise and efficient assessment of HRQOL of cancer patients, without loss of comparability of results.
HIV/AIDS knowledge among men who have sex with men: applying the item response theory.
Gomes, Raquel Regina de Freitas Magalhães; Batista, José Rodrigues; Ceccato, Maria das Graças Braga; Kerr, Lígia Regina Franco Sansigolo; Guimarães, Mark Drew Crosland
2014-04-01
To evaluate the level of HIV/AIDS knowledge among men who have sex with men in Brazil using the latent trait model estimated by Item Response Theory. Multicenter, cross-sectional study, carried out in ten Brazilian cities between 2008 and 2009. Adult men who have sex with men were recruited (n = 3,746) through Respondent Driven Sampling. HIV/AIDS knowledge was ascertained through ten statements by face-to-face interview and latent scores were obtained through two-parameter logistic modeling (difficulty and discrimination) using Item Response Theory. Differential item functioning was used to examine each item characteristic curve by age and schooling. Overall, the HIV/AIDS knowledge scores using Item Response Theory did not exceed 6.0 (scale 0-10), with mean and median values of 5.0 (SD = 0.9) and 5.3, respectively, with 40.7% of the sample with knowledge levels below the average. Some beliefs still exist in this population regarding the transmission of the virus by insect bites, by using public restrooms, and by sharing utensils during meals. With regard to the difficulty and discrimination parameters, eight items were located below the mean of the scale and were considered very easy, and four items presented very low discrimination parameter (< 0.34). The absence of difficult items contributed to the inaccuracy of the measurement of knowledge among those with median level and above. Item Response Theory analysis, which focuses on the individual properties of each item, allows measures to be obtained that do not vary or depend on the questionnaire, which provides better ascertainment and accuracy of knowledge scores. Valid and reliable scales are essential for monitoring HIV/AIDS knowledge among the men who have sex with men population over time and in different geographic regions, and this psychometric model brings this advantage.
Crins, Martine H. P.; Roorda, Leo D.; Smits, Niels; de Vet, Henrica C. W.; Westhovens, Rene; Cella, David; Cook, Karon F.; Revicki, Dennis; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Terwee, Caroline B.
2015-01-01
The Dutch-Flemish PROMIS Group translated the adult PROMIS Pain Interference item bank into Dutch-Flemish. The aims of the current study were to calibrate the parameters of these items using an item response theory (IRT) model, to evaluate the cross-cultural validity of the Dutch-Flemish translations compared to the original English items, and to evaluate their reliability and construct validity. The 40 items in the bank were completed by 1085 Dutch chronic pain patients. Before calibrating the items, IRT model assumptions were evaluated using confirmatory factor analysis (CFA). Items were calibrated using the graded response model (GRM), an IRT model appropriate for items with more than two response options. To evaluate cross-cultural validity, differential item functioning (DIF) for language (Dutch vs. English) was examined. Reliability was evaluated based on standard errors and Cronbach’s alpha. To evaluate construct validity correlations with scores on legacy instruments (e.g., the Disabilities of the Arm, Shoulder and Hand Questionnaire) were calculated. Unidimensionality of the Dutch-Flemish PROMIS Pain Interference item bank was supported by CFA tests of model fit (CFI = 0.986, TLI = 0.986). Furthermore, the data fit the GRM and showed good coverage across the pain interference continuum (threshold-parameters range: -3.04 to 3.44). The Dutch-Flemish PROMIS Pain Interference item bank has good cross-cultural validity (only two out of 40 items showing DIF), good reliability (Cronbach’s alpha = 0.98), and good construct validity (Pearson correlations between 0.62 and 0.75). A computer adaptive test (CAT) and Dutch-Flemish PROMIS short forms of the Dutch-Flemish PROMIS Pain Interference item bank can now be developed. PMID:26214178
Development of a PROMIS item bank to measure pain interference.
Amtmann, Dagmar; Cook, Karon F; Jensen, Mark P; Chen, Wen-Hung; Choi, Seung; Revicki, Dennis; Cella, David; Rothrock, Nan; Keefe, Francis; Callahan, Leigh; Lai, Jin-Shei
2010-07-01
This paper describes the psychometric properties of the PROMIS-pain interference (PROMIS-PI) bank. An initial candidate item pool (n=644) was developed and evaluated based on the review of existing instruments, interviews with patients, and consultation with pain experts. From this pool, a candidate item bank of 56 items was selected and responses to the items were collected from large community and clinical samples. A total of 14,848 participants responded to all or a subset of candidate items. The responses were calibrated using an item response theory (IRT) model. A final 41-item bank was evaluated with respect to IRT assumptions, model fit, differential item function (DIF), precision, and construct and concurrent validity. Items of the revised bank had good fit to the IRT model (CFI and NNFI/TLI ranged from 0.974 to 0.997), and the data were strongly unidimensional (e.g., ratio of first and second eigenvalue=35). Nine items exhibited statistically significant DIF. However, adjusting for DIF had little practical impact on score estimates and the items were retained without modifying scoring. Scores provided substantial information across levels of pain; for scores in the T-score range 50-80, the reliability was equivalent to 0.96-0.99. Patterns of correlations with other health outcomes supported the construct validity of the item bank. The scores discriminated among persons with different numbers of chronic conditions, disabling conditions, levels of self-reported health, and pain intensity (p<0.0001). The results indicated that the PROMIS-PI items constitute a psychometrically sound bank. Computerized adaptive testing and short forms are available. Copyright 2010 International Association for the Study of Pain. All rights reserved.
Crins, Martine H P; Roorda, Leo D; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Terwee, Caroline B
2015-01-01
The Dutch-Flemish PROMIS Group translated the adult PROMIS Pain Interference item bank into Dutch-Flemish. The aims of the current study were to calibrate the parameters of these items using an item response theory (IRT) model, to evaluate the cross-cultural validity of the Dutch-Flemish translations compared to the original English items, and to evaluate their reliability and construct validity. The 40 items in the bank were completed by 1085 Dutch chronic pain patients. Before calibrating the items, IRT model assumptions were evaluated using confirmatory factor analysis (CFA). Items were calibrated using the graded response model (GRM), an IRT model appropriate for items with more than two response options. To evaluate cross-cultural validity, differential item functioning (DIF) for language (Dutch vs. English) was examined. Reliability was evaluated based on standard errors and Cronbach's alpha. To evaluate construct validity correlations with scores on legacy instruments (e.g., the Disabilities of the Arm, Shoulder and Hand Questionnaire) were calculated. Unidimensionality of the Dutch-Flemish PROMIS Pain Interference item bank was supported by CFA tests of model fit (CFI = 0.986, TLI = 0.986). Furthermore, the data fit the GRM and showed good coverage across the pain interference continuum (threshold-parameters range: -3.04 to 3.44). The Dutch-Flemish PROMIS Pain Interference item bank has good cross-cultural validity (only two out of 40 items showing DIF), good reliability (Cronbach's alpha = 0.98), and good construct validity (Pearson correlations between 0.62 and 0.75). A computer adaptive test (CAT) and Dutch-Flemish PROMIS short forms of the Dutch-Flemish PROMIS Pain Interference item bank can now be developed.
Kawaguchi, Hideaki; Taguchi, Masamoto; Sukigara, Masune; Sakuragi, Shoji; Sugiyama, Naoya; Chiba, Hisomu; Kawasaki, Tatsuhito
2017-06-15
We comprehensively evaluated cognitive and social functioning in patients requiring long-term inpatient psychiatric care using the International Classification of Functioning, Disability, and Health. We surveyed 1967 patients receiving long-term inpatient psychiatric care. Patients were further categorized into an old long-stay group (n = 892, >5 years in hospitals) and a new long-stay group (n = 1075, 1-5 years in hospitals). We obtained responses for all the International Classification of Functioning, Disability, and Health items in domain b (Body Functions) and domain d (Activities and Participation). We estimated weighted means for each item using the propensity score to adjust for confounding factors. Responses were received from 307 hospitals (response rate of hospitals: 25.5%). Cognitive and social functioning in the old long-stay group was more severely impaired than in the new long-stay group. No statistically significant differences were observed regarding the International Classification of Functioning, Disability, and Health items associated with basic activities of daily living between the two groups. Combined therapy consisting of cognitive remediation and rehabilitation on social functioning for this patient population should be started from the early stage of hospitalization. Non-restrictive, independent environments may also be optimal for this patient population. Implications for rehabilitation Rehabilitation of cognitive and social functioning for patients requiring long-term inpatient psychiatric care should be started in the early stages of hospitalization. In psychiatric fields, the International Classification of Functioning, Disability, and Health checklist could facilitate individualized rehabilitation planning by allowing healthcare professionals to visually assess the comprehensive functioning of each patient using graphics such as radar charts.
Mueller, Anne E; Segal, Daniel L; Gavett, Brandon; Marty, Meghan A; Yochim, Brian; June, Andrea; Coolidge, Frederick L
2015-07-01
The Geriatric Anxiety Scale (GAS; Segal et al. (Segal, D. L., June, A., Payne, M., Coolidge, F. L. and Yochim, B. (2010). Journal of Anxiety Disorders, 24, 709-714. doi:10.1016/j.janxdis.2010.05.002) is a self-report measure of anxiety that was designed to address unique issues associated with anxiety assessment in older adults. This study is the first to use item response theory (IRT) to examine the psychometric properties of a measure of anxiety in older adults. A large sample of older adults (n = 581; mean age = 72.32 years, SD = 7.64 years, range = 60 to 96 years; 64% women; 88% European American) completed the GAS. IRT properties were examined. The presence of differential item functioning (DIF) or measurement bias by age and sex was assessed, and a ten-item short form of the GAS (called the GAS-10) was created. All GAS items had discrimination parameters of 1.07 or greater. Items from the somatic subscale tended to have lower discrimination parameters than items on the cognitive or affective subscales. Two items were flagged for DIF, but the impact of the DIF was negligible. Women scored significantly higher than men on the GAS and its subscales. Participants in the young-old group (60 to 79 years old) scored significantly higher on the cognitive subscale than participants in the old-old group (80 years old and older). Results from the IRT analyses indicated that the GAS and GAS-10 have strong psychometric properties among older adults. We conclude by discussing implications and future research directions.
ERIC Educational Resources Information Center
Monahan, Patrick O.; Ankenmann, Robert D.
2010-01-01
When the matching score is either less than perfectly reliable or not a sufficient statistic for determining latent proficiency in data conforming to item response theory (IRT) models, Type I error (TIE) inflation may occur for the Mantel-Haenszel (MH) procedure or any differential item functioning (DIF) procedure that matches on summed-item…
Hamilton, Clayon B; Chesworth, Bert M
2013-11-01
The original 20-item Upper Extremity Functional Index (UEFI) has not undergone Rasch validation. The purpose of this study was to determine whether Rasch analysis supports the UEFI as a measure of a single construct (ie, upper extremity function) and whether a Rasch-validated UEFI has adequate reproducibility for individual-level patient evaluation. This was a secondary analysis of data from a repeated-measures study designed to evaluate the measurement properties of the UEFI over a 3-week period. Patients (n=239) with musculoskeletal upper extremity disorders were recruited from 17 physical therapy clinics across 4 Canadian provinces. Rasch analysis of the UEFI measurement properties was performed. If the UEFI did not fit the Rasch model, misfitting patients were deleted, items with poor response structure were corrected, and misfitting items and redundant items were deleted. The impact of differential item functioning on the ability estimate of patients was investigated. A 15-item modified UEFI was derived to achieve fit to the Rasch model where the total score was supported as a measure of upper extremity function only. The resultant UEFI-15 interval-level scale (0-100, worst to best state) demonstrated excellent internal consistency (person separation index=0.94) and test-retest reliability (intraclass correlation coefficient [2,1]=.95). The minimal detectable change at the 90% confidence interval was 8.1. Patients who were ambidextrous or bilaterally affected were excluded to allow for the analysis of differential item functioning due to limb involvement and arm dominance. Rasch analysis did not support the validity of the 20-item UEFI. However, the UEFI-15 was a valid and reliable interval-level measure of a single dimension: upper extremity function. Rasch analysis supports using the UEFI-15 in physical therapist practice to quantify upper extremity function in patients with musculoskeletal disorders of the upper extremity.
Chesworth, Bert M.
2013-01-01
Background The original 20-item Upper Extremity Functional Index (UEFI) has not undergone Rasch validation. Objective The purpose of this study was to determine whether Rasch analysis supports the UEFI as a measure of a single construct (ie, upper extremity function) and whether a Rasch-validated UEFI has adequate reproducibility for individual-level patient evaluation. Design This was a secondary analysis of data from a repeated-measures study designed to evaluate the measurement properties of the UEFI over a 3-week period. Methods Patients (n=239) with musculoskeletal upper extremity disorders were recruited from 17 physical therapy clinics across 4 Canadian provinces. Rasch analysis of the UEFI measurement properties was performed. If the UEFI did not fit the Rasch model, misfitting patients were deleted, items with poor response structure were corrected, and misfitting items and redundant items were deleted. The impact of differential item functioning on the ability estimate of patients was investigated. Results A 15-item modified UEFI was derived to achieve fit to the Rasch model where the total score was supported as a measure of upper extremity function only. The resultant UEFI-15 interval-level scale (0–100, worst to best state) demonstrated excellent internal consistency (person separation index=0.94) and test-retest reliability (intraclass correlation coefficient [2,1]=.95). The minimal detectable change at the 90% confidence interval was 8.1. Limitations Patients who were ambidextrous or bilaterally affected were excluded to allow for the analysis of differential item functioning due to limb involvement and arm dominance. Conclusion Rasch analysis did not support the validity of the 20-item UEFI. However, the UEFI-15 was a valid and reliable interval-level measure of a single dimension: upper extremity function. Rasch analysis supports using the UEFI-15 in physical therapist practice to quantify upper extremity function in patients with musculoskeletal disorders of the upper extremity. PMID:23813086
Paz, Sylvia H; Spritzer, Karen L; Morales, Leo S; Hays, Ron D
2013-09-01
To evaluate the equivalence of the PROMIS(®) physical functioning item bank by language of administration (English versus Spanish). The PROMIS(®) wave 1 English-language physical functioning bank consists of 124 items, and 114 of these were translated into Spanish. Item frequencies, means and standard deviations, item-scale correlations, and internal consistency reliability were calculated. The IRT assumption of unidimensionality was evaluated by fitting a single-factor confirmatory factor analytic model. IRT threshold and discrimination parameters were estimated using Samejima's Graded Response Model. DIF by language of administration was evaluated. Item means ranged from 2.53 (SD = 1.36) to 4.62 (SD = 0.82). Coefficient alpha was 0.99, and item-rest correlations ranged from 0.41 to 0.89. A one-factor model fits the data well (CFI = 0.971, TLI = 0.970, and RMSEA = 0.052). The slope parameters ranged from 0.45 ("Are you able to run 10 miles?") to 4.50 ("Are you able to put on a shirt or blouse?"). The threshold parameters ranged from -1.92 ("How much do physical health problems now limit your usual physical activities (such as walking or climbing stairs)?") to 6.06 ("Are you able to run 10 miles?"). Fifty of the 114 items were flagged for DIF based on an R(2) of 0.02 or above criterion. The expected total score was higher for Spanish- than English-language respondents. English- and Spanish-speaking subjects with the same level of underlying physical function responded differently to 50 of 114 items. This study has important implications in the study of physical functioning among diverse populations.
Falk, Carl F; Cai, Li
2016-06-01
We present a semi-parametric approach to estimating item response functions (IRF) useful when the true IRF does not strictly follow commonly used functions. Our approach replaces the linear predictor of the generalized partial credit model with a monotonic polynomial. The model includes the regular generalized partial credit model at the lowest order polynomial. Our approach extends Liang's (A semi-parametric approach to estimate IRFs, Unpublished doctoral dissertation, 2007) method for dichotomous item responses to the case of polytomous data. Furthermore, item parameter estimation is implemented with maximum marginal likelihood using the Bock-Aitkin EM algorithm, thereby facilitating multiple group analyses useful in operational settings. Our approach is demonstrated on both educational and psychological data. We present simulation results comparing our approach to more standard IRF estimation approaches and other non-parametric and semi-parametric alternatives.
Development and validation of the Overall Depression Severity and Impairment Scale.
Bentley, Kate H; Gallagher, Matthew W; Carl, Jenna R; Barlow, David H
2014-09-01
The need to capture severity and impairment of depressive symptomatology is widespread. Existing depression scales are lengthy and largely focus on individual symptoms rather than resulting impairment. The Overall Depression Severity and Impairment Scale (ODSIS) is a 5-item, continuous measure designed for use across heterogeneous mood disorders and with subthreshold depressive symptoms. This study examined the psychometric properties of the ODSIS in outpatients in a clinic for emotional disorders (N = 100), undergraduate students (N = 566), and community-based adults (N = 189). Internal consistency, latent structure, item response theory, classification accuracy, convergent and discriminant validity, and differential item functioning analyses were conducted. ODSIS scores exhibited excellent internal consistency, and confirmatory factor analyses supported a unidimensional structure. Item response theory results demonstrated that the ODSIS provides more information about individuals with high levels of depression than those with low levels of depression. Responses on the ODSIS discriminated well between individuals with and without a mood disorder and depression-related severity across clinical and subclinical levels. A cut score of 8 correctly classified 82% of outpatients as with or without a mood disorder; it evidenced a favorable balance of sensitivity and specificity and of positive and negative predictive values. The ODSIS demonstrated good convergent and discriminant validity, and results indicate that items function similarly across clinical and nonclinical samples. Overall, findings suggest that the ODSIS is a valid tool for measuring depression-related severity and impairment. The brevity and ease of use of the ODSIS support its utility for screening and monitoring treatment response across a variety of settings. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Does the hippocampus mediate objective binding or subjective remembering?
Slotnick, Scott D
2010-01-15
Human functional magnetic resonance imaging (fMRI) evidence suggests the hippocampus is associated with context memory to a greater degree than item memory (where only context memory requires item-in-context binding). A separate line of fMRI research suggests the hippocampus is associated with "remember" responses to a greater degree than "know" or familiarity based responses (where only remembering reflects the subjective experience of specific detail). Previous studies, however, have confounded context memory with remembering and item memory with knowing. The present fMRI study independently tested the binding hypothesis and remembering hypothesis of hippocampal function by evaluating activity within hippocampal regions-of-interest (ROIs). At encoding, participants were presented with colored and gray abstract shapes and instructed to remember each shape and whether it was colored or gray. At retrieval, old and new shapes were presented in gray and participants classified each shape as "old and previously colored", "old and previously gray", or "new", followed by a "remember" or "know" response. In 3 of 11 hippocampal ROIs, activity was significantly greater for context memory than item memory, the context memory-item memory by remember-know interaction was significant, and activity was significantly greater for context memory-knowing than item memory-remembering. This pattern of activity only supports the binding hypothesis. The analogous pattern of activity that would have supported the remembering hypothesis was never observed in the hippocampus. However, a targeted analysis revealed remembering specific activity in the left inferior parietal cortex. The present results suggest parietal cortex may be associated with subjective remembering while the hippocampus mediates binding.
Lin, Li; Dombeck, Carrie B.; Broderick, Joan E.; Snyder, Denise C.; Williams, Megan S.; Fawzy, Maria R.; Flynn, Kathryn E.
2013-01-01
Introduction Despite the ubiquity of 1-month recall periods for measures of sexual function, there is limited evidence for how well recalled responses correspond to individuals’ actual daily experiences. Aim To characterize the correspondence between daily sexual experiences and 1-month recall of those experiences. Methods Following a baseline assessment of sexual functioning, health, and demographic characteristics, 202 adults from the general population (101 women, 101 men) were recruited to complete daily assessments of their sexual function online for 30 days and a single recall measures of sexual function at day 30. Main Outcome Measures At the baseline and 30-day follow-ups, participants answered items asking about sexual satisfaction, sexual activities, interest, interfering factors, orgasm, sexual functioning, and use of therapeutic aids during the previous 30 days. Participants also completed a measure of positive and negative affect at follow-up. The main outcome measures were agreement between the daily and 1-month recall versions of the sexual function items. Results Accuracy of recall varied depending on the item and on the gender and mood of the respondent. Recall was better (low bias and higher correlations) for sexual activities, vaginal discomfort, erectile function, and more frequently used therapeutic aids. Recall was poorer for interest, affectionate behaviors (eg, kissing), and orgasm-related items. Men more than women overestimated frequency of interest and masturbation. Concurrent mood was related to over- or underreporting for 6 items addressing the frequency of masturbation and vaginal intercourse, erectile function, and orgasm. Conclusions A 1-month recall period seems acceptable for many aspects of sexual function in this population, but recall for some items was poor. Researchers should be aware that concurrent mood can have a powerful biasing effect on reports of sexual function. PMID:23802907
GITLIN, LAURA N.; ROTH, DAVID L.; BURGIO, LOUIS D.; LOEWENSTEIN, DAVID A.; WINTER, LARAINE; NICHOLS, LINDA; ARGÜELLES, SOLEDAD; CORCORAN, MARY; BURNS, ROBERT; MARTINDALE, JENNIFER
2008-01-01
Objective To evaluate psychometric properties and response patterns of the Caregiver Assessment of Function and Upset (CAFU), a 15-item multidimensional measure of dependence in dementia patients and caregiver reaction. Method 640 families were administered the CAFU (53% White, 43% African American, and 4% mixed race and ethnicity). We created a random split of the sample and conducted exploratory factor analyses on Sample 1 and confirmatory factor analyses on Sample 2. Convergent and discriminant validity were evaluated using Spearman rank correlation coefficients. Results A two-factor structure for functional items was derived, and excellent factorial validity was obtained. Convergent and discriminant validity were obtained for function and upset measures. Differential response patterns for dependence and caregiver upset were found for caregiver race, relationship, and care recipient gender but not for caregiver gender. Discussion The CAFU is easily administered, reliable, and valid for evaluating appraisals of dependencies and upsetting care areas. PMID:15750049
Marfeo, Elizabeth E; Ni, Pengsheng; McDonough, Christine; Peterik, Kara; Marino, Molly; Meterko, Mark; Rasch, Elizabeth K; Chan, Leighton; Brandt, Diane; Jette, Alan M
2018-03-01
Purpose To improve the mental health component of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration's (SSA) disability determination process. Specifically our goal was to expand the WD-FAB scales of mood & emotions, resilience, social interactions, and behavioral control to improve the depth and breadth of the current scales and expand the content coverage to include aspects of cognition & communication function. Methods Data were collected from a random, stratified sample of 1695 claimants applying for the SSA work disability benefits, and a general population sample of 2025 working age adults. 169 new items were developed to replenish the WD-FAB scales and analyzed using factor analysis and item response theory (IRT) analysis to construct unidimensional scales. We conducted computer adaptive test (CAT) simulations to examine the psychometric properties of the WD-FAB. Results Analyses supported the inclusion of four mental health subdomains: Cognition & Communication (68 items), Self-Regulation (34 items), Resilience & Sociability (29 items) and Mood & Emotions (34 items). All scales yielded acceptable psychometric properties. Conclusions IRT methods were effective in expanding the WD-FAB to assess mental health function. The WD-FAB has the potential to enhance work disability assessment both within the context of the SSA disability programs as well as other clinical and vocational rehabilitation settings.
A signal detection-item response theory model for evaluating neuropsychological measures.
Thomas, Michael L; Brown, Gregory G; Gur, Ruben C; Moore, Tyler M; Patt, Virginie M; Risbrough, Victoria B; Baker, Dewleen G
2018-02-05
Models from signal detection theory are commonly used to score neuropsychological test data, especially tests of recognition memory. Here we show that certain item response theory models can be formulated as signal detection theory models, thus linking two complementary but distinct methodologies. We then use the approach to evaluate the validity (construct representation) of commonly used research measures, demonstrate the impact of conditional error on neuropsychological outcomes, and evaluate measurement bias. Signal detection-item response theory (SD-IRT) models were fitted to recognition memory data for words, faces, and objects. The sample consisted of U.S. Infantry Marines and Navy Corpsmen participating in the Marine Resiliency Study. Data comprised item responses to the Penn Face Memory Test (PFMT; N = 1,338), Penn Word Memory Test (PWMT; N = 1,331), and Visual Object Learning Test (VOLT; N = 1,249), and self-report of past head injury with loss of consciousness. SD-IRT models adequately fitted recognition memory item data across all modalities. Error varied systematically with ability estimates, and distributions of residuals from the regression of memory discrimination onto self-report of past head injury were positively skewed towards regions of larger measurement error. Analyses of differential item functioning revealed little evidence of systematic bias by level of education. SD-IRT models benefit from the measurement rigor of item response theory-which permits the modeling of item difficulty and examinee ability-and from signal detection theory-which provides an interpretive framework encompassing the experimentally validated constructs of memory discrimination and response bias. We used this approach to validate the construct representation of commonly used research measures and to demonstrate how nonoptimized item parameters can lead to erroneous conclusions when interpreting neuropsychological test data. Future work might include the development of computerized adaptive tests and integration with mixture and random-effects models.
Systems Analysis Directorate Activities Summary August 1977
1977-09-01
are: x a. Cataloging direction b. Requirements computation c. Procurement direction d. Distribution management e. Disposal direction f...34inventory management," as a responsibility of NICP’s, includes cataloging, requirements computation, procurement direction, distribution management , maintenance...functions are cataloging, major item management, secondary item management, procurement direction, distribution management , overhaul and rebuild
Observed-Score Equating as a Test Assembly Problem.
ERIC Educational Resources Information Center
van der Linden, Wim J.; Luecht, Richard M.
1998-01-01
Derives a set of linear conditions of item-response functions that guarantees identical observed-score distributions on two test forms. The conditions can be added as constraints to a linear programming model for test assembly. An example illustrates the use of the model for an item pool from the Law School Admissions Test (LSAT). (SLD)
Strategy Execution in Cognitive Skill Learning: An Item-Level Test of Candidate Models
ERIC Educational Resources Information Center
Rickard, Timothy C.
2004-01-01
This article investigates the transition to memory-based performance that commonly occurs with practice on tasks that initially require use of a multistep algorithm. In an alphabet arithmetic task, item response times exhibited pronounced step-function decreases after moderate practice that were uniquely predicted by T. C. Rickard's (1997)…
Llamas-Ramos, Inés; Llamas-Ramos, Rocío; Buz, José; Cortés-Rodríguez, María; Martín-Nogueras, Ana María
2018-06-01
The Memorial Symptom Assessment Scale (MSAS) is a self-rating instrument for the assessment of symptom distress in cancer patients. The Spanish version of the MSAS has recently been validated. However, we lack evidence of the internal construct validity of the shorter versions (short form [MSAS-SF] and condensed form [CMSAS]). In addition, rigorous testing of these scales with modern psychometric methods is needed. The aim of this study was to evaluate the internal construct validity and reliability of the Spanish versions of the MSAS-SF and CMSAS in oncology outpatients using Rasch analysis. Data from a convenience sample of oncology outpatients receiving chemotherapy (n = 306; mean age 60 years; 63% women) at a university hospital were analyzed. The Rasch unidimensional measurement model was used to examine response category functioning, item hierarchy, targeting, unidimensionality, reliability, and differential item functioning by age, gender, and marital status. The response category structure of the symptom distress items was improved by collapsing two categories. The scales were adequately targeted to the study patients, showed overall Rasch model fit (mean Infit MnSq ranged from 0.98 to 1.05), met criteria for unidimensionality, and the reliability of scores was good (person reliability > 0.80), except for the CMSAS prevalence scale. Only four items showed differential item functioning. The present study demonstrated that the Spanish versions of the MSAS-SF and CMSAS have adequate psychometric properties to evaluate symptom distress in oncology outpatients. Additional studies of the CMSAS are recommended. Copyright © 2018 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
A model for incomplete longitudinal multivariate ordinal data.
Liu, Li C
2008-12-30
In studies where multiple outcome items are repeatedly measured over time, missing data often occur. A longitudinal item response theory model is proposed for analysis of multivariate ordinal outcomes that are repeatedly measured. Under the MAR assumption, this model accommodates missing data at any level (missing item at any time point and/or missing time point). It allows for multiple random subject effects and the estimation of item discrimination parameters for the multiple outcome items. The covariates in the model can be at any level. Assuming either a probit or logistic response function, maximum marginal likelihood estimation is described utilizing multidimensional Gauss-Hermite quadrature for integration of the random effects. An iterative Fisher-scoring solution, which provides standard errors for all model parameters, is used. A data set from a longitudinal prevention study is used to motivate the application of the proposed model. In this study, multiple ordinal items of health behavior are repeatedly measured over time. Because of a planned missing design, subjects answered only two-third of all items at a given point. Copyright 2008 John Wiley & Sons, Ltd.
Krekels, Ehj; Novakovic, A M; Vermeulen, A M; Friberg, L E; Karlsson, M O
2017-08-01
As biomarkers are lacking, multi-item questionnaire-based tools like the Positive and Negative Syndrome Scale (PANSS) are used to quantify disease severity in schizophrenia. Analyzing composite PANSS scores as continuous data discards information and violates the numerical nature of the scale. Here a longitudinal analysis based on Item Response Theory is presented using PANSS data from phase III clinical trials. Latent disease severity variables were derived from item-level data on the positive, negative, and general PANSS subscales each. On all subscales, the time course of placebo responses were best described with Weibull models, and dose-independent functions with exponential models to describe the onset of the full effect were used to describe paliperidone's effect. Placebo and drug effect were most pronounced on the positive subscale. The final model successfully describes the time course of treatment effects on the individual PANSS item-levels, on all PANSS subscale levels, and on the total score level. © 2017 The Authors CPT: Pharmacometrics & Systems Pharmacology published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.
Xu, Hui; Tracey, Terence J G
2017-03-01
The current study developed an abbreviated version of the Career Indecision Profile-65 (CIP-65; Hacker, Carr, Abrams, & Brown, 2013) by using item response theory. In order to improve the efficiency of the CIP-65 in measuring career indecision, the individual item performance of the CIP-65 was examined with respect to the ordering of response occurrence and gender differential item functioning. The best 5 items of each scale of the CIP-65 (i.e., neuroticism/negative affectivity, choice/commitment anxiety, lack of readiness, and interpersonal conflicts) were retained in the CIP-Short using a sample of 588 college students. A validation sample (N = 174) supported the reliability and structural validity of the CIP-Short. The convergent and divergent validity of the CIP-Short was additionally supported in the findings of a hypothesized differential relational pattern in a separate sample (N = 360). While the current study supported the CIP-Short being a sound brief measure of career indecision, the limitations of this study and suggestions for future research were discussed as well. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Evaluating linguistic equivalence of patient-reported outcomes in a cancer clinical trial.
Hahn, Elizabeth A; Bode, Rita K; Du, Hongyan; Cella, David
2006-01-01
In order to make meaningful cross-cultural or cross-linguistic comparisons of health-related quality of life (HRQL) or to pool international research data, it is essential to create unbiased measures that can detect clinically important differences. When HRQL scores differ between cultural/linguistic groups, it is important to determine whether this reflects real group differences, or is the result of systematic measurement variability. To investigate the linguistic measurement equivalence of a cancer-specific HRQL questionnaire, and to conduct a sensitivity analysis of treatment differences in HRQL in a clinical trial. Patients with newly diagnosed chronic myelogenous leukemia (n = 1049) completed serial HRQL assessments in an international Phase III trial. Two types of differential item functioning (uniform and non-uniform) were evaluated using item response theory and classical test theory approaches. A sensitivity analysis was conducted to compare HRQL between treatment arms using items without evidence of differential functioning. Among 27 items, nine (33%) did not exhibit any evidence of differential functioning in both linguistic comparisons (English versus French, English versus German). Although 18 items functioned differently, there was no evidence of systematic bias. In a sensitivity analysis, adjustment for differential functioning affected the magnitude, but not the direction or interpretation of clinical trial treatment arm differences. Sufficient sample sizes were available for only three of the eight language groups. Identification of differential functioning in two-thirds of the items suggests that current psychometric methods may be too sensitive. Enhanced methodologies are needed to differentiate trivial from substantive differential item functioning. Systematic variability in HRQL across different groups can be evaluated for its effect upon clinical trial results; a practice recommended when data are pooled across cultural or linguistic groups to make conclusions about treatment effects.
Assessing the mechanism of response in the retrosplenial cortex of good and poor navigators☆
Auger, Stephen D.; Maguire, Eleanor A.
2013-01-01
The retrosplenial cortex (RSC) is consistently engaged by a range of tasks that examine episodic memory, imagining the future, spatial navigation, and scene processing. Despite this, an account of its exact contribution to these cognitive functions remains elusive. Here, using functional MRI (fMRI) and multi-voxel pattern analysis (MVPA) we found that the RSC coded for the specific number of permanent outdoor items that were in view, that is, items which are fixed and never change their location. Moreover, this effect was selective, and was not apparent for other item features such as size and visual salience. This detailed detection of the number of permanent items in view was echoed in the parahippocampal cortex (PHC), although the two brain structures diverged when participants were divided into good and poor navigators. There was no difference in the responsivity of the PHC between the two groups, while significantly better decoding of the number of permanent items in view was possible from patterns of activity in the RSC of good compared to poor navigators. Within good navigators, the RSC also facilitated significantly better prediction of item permanence than the PHC. Overall, these findings suggest that the RSC in particular is concerned with coding the presence of every permanent item that is in view. This mechanism may represent a key building block for spatial and scene representations that are central to episodic memories and imagining the future, and could also be a prerequisite for successful navigation. PMID:24012136
Development of the NIH PROMIS® Sexual Function and Satisfaction Measures in Patients with Cancer
Flynn, Kathryn E.; Lin, Li; Cyranowski, Jill M.; Reeve, Bryce B.; Reese, Jennifer Barsky; Jeffery, Diana D.; Smith, Ashley Wilder; Porter, Laura S.; Dombeck, Carrie B.; Bruner, Deborah Watkins; Keefe, Francis J.; Weinfurt, Kevin P.
2013-01-01
Introduction We describe the development and validation of the PROMIS Sexual Function and Satisfaction (PROMIS SexFS) measures version 1.0 for cancer populations. Aim To develop a customizable self-report measure of sexual function and satisfaction as part of the U.S. National Institutes of Health PROMIS® Network. Methods Our multidisciplinary working group followed a comprehensive protocol for developing psychometrically robust patient reported outcome (PRO) measures including qualitative (scale development) and quantitative (psychometric evaluation) development. We performed an extensive literature review, conducted 16 focus groups with cancer patients and multiple discussions with clinicians, and evaluated candidate items in cognitive testing with patients. We administered items to 819 cancer patients. Items were calibrated using item response theory and evaluated for reliability and validity. Main Outcome Measures The PROMIS Sexual Function and Satisfaction (PROMIS SexFS) measures version 1.0 include 79 items in 11 domains: interest in sexual activity, lubrication, vaginal discomfort, erectile function, global satisfaction with sex life, orgasm, anal discomfort, therapeutic aids, sexual activities, interfering factors, and screener questions. Results In addition to content validity (patients indicate that items cover important aspects of their experiences) and face validity (patients indicate that items measure sexual function and satisfaction), the measure shows evidence for discriminant validity (domains discriminate between groups expected to be different), convergent validity (strong correlations between scores on PROMIS and scores on conceptually-similar older measures of sexual function), as well as favorable test-retest reliability among people not expected to change (inter-class correlations from 2 administrations of the instrument, 1 month apart). Conclusions The PROMIS SexFS offers researchers a reliable and valid set of tools to measure self-reported sexual function and satisfaction among diverse men and women. The measures are customizable; researchers can select the relevant domains and items comprising those domains for their study. PMID:23387911
A Mixed Effects Randomized Item Response Model
ERIC Educational Resources Information Center
Fox, J.-P.; Wyrick, Cheryl
2008-01-01
The randomized response technique ensures that individual item responses, denoted as true item responses, are randomized before observing them and so-called randomized item responses are observed. A relationship is specified between randomized item response data and true item response data. True item response data are modeled with a (non)linear…
Factor structure and gender stability in the multidimensional condom attitudes scale.
Starosta, Amy J; Berghoff, Christopher R; Earleywine, Mitch
2015-06-01
Sexually transmitted infections continue to trouble the United States and can be attenuated through increased condom use. Attitudes about condoms are an important multidimensional factor that can affect sexual health choices and have been successfully measured using the Multidimensional Condom Attitudes Scale (MCAS). Such attitudes have the potential to vary between men and women, yet little work has been undertaken to identify if the MCAS accurately captures attitudes without being influenced by underlying gender biases. We examined the factor structure and gender invariance on the MCAS using confirmatory factor analysis and item response theory, within-subscale differential item functioning analyses. More than 770 participants provided data via the Internet. Results of differential item functioning analyses identified three items as differentially functioning between the genders, and removal of these items is recommended. Findings confirmed the previously hypothesized multidimensional nature of condom attitudes and the five-factor structure of the MCAS even after the removal of the three problematic items. In general, comparisons across genders using the MCAS seem reasonable from a methodological standpoint. Results are discussed in terms of improving sexual health research and interventions. © The Author(s) 2014.
Comins, J D; Krogsgaard, M R; Kreiner, S; Brodersen, J
2013-10-01
The benefit of anterior cruciate ligament (ACL) reconstruction has been questioned based on patient-reported outcome measures (PROMs). Valid interpretation of such results requires confirmation of the psychometric properties of the PROM. Rasch analysis is the gold standard for validation of PROMs, yet PROMs used for ACL reconstruction have not been validated using Rasch analysis. We used Rasch analysis to investigate the psychometric properties of the Knee Numeric-Entity Evaluation Score (KNEES-ACL), a newly developed PROM for patients treated for ACL deficiency. Two-hundred forty-two patients pre- and post-ACL reconstruction completed the pilot PROM. Rasch models were used to assess the psychometric properties (e.g., unidimensionality, local response dependency, and differential item functioning). Forty-one items distributed across seven unidimensional constructs measuring impairment, functional limitations, and psychosocial consequences were confirmed to fit Rasch models. Fourteen items were removed because of statistical lack of fit and inadequate face validity. Local response dependency and differential item functioning were identified and adjusted. The KNEES-ACL is the first Rasch-validated condition-specific PROM constructed for patients with ACL deficiency and patients with ACL reconstruction. Thus, this instrument can be used for within- and between-group comparisons. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
2017-01-01
Background The Center for Epidemiologic Studies Depression Scale (CES-D) is a measure of depressive symptomatology which is widely used internationally. Though previous attempts were made to shorten the CES-D scale, few have attempted to develop a Computerized Adaptive Test (CAT) version for the CES-D. Objective The aim of this study was to provide evidence on the efficiency and accuracy of the CES-D when administered using CAT using an American sample group. Methods We obtained a sample of 2060 responses to the CESD-D from US participants using the myPersonality application. The average age of participants was 26 years (range 19-77). We randomly split the sample into two groups to evaluate and validate the psychometric models. We used evaluation group data (n=1018) to assess dimensionality with both confirmatory factor and Mokken analysis. We conducted further psychometric assessments using item response theory (IRT), including assessments of item and scale fit to Samejima’s graded response model (GRM), local dependency and differential item functioning. We subsequently conducted two CAT simulations to evaluate the CES-D CAT using the validation group (n=1042). Results Initial CFA results indicated a poor fit to the model and Mokken analysis revealed 3 items which did not conform to the same dimension as the rest of the items. We removed the 3 items and fit the remaining 17 items to GRM. We found no evidence of differential item functioning (DIF) between age and gender groups. Estimates of the level of CES-D trait score provided by the simulated CAT algorithm and the original CES-D trait score derived from original scale were correlated highly. The second CAT simulation conducted using real participant data demonstrated higher precision at the higher levels of depression spectrum. Conclusions Depression assessments using the CES-D CAT can be more accurate and efficient than those made using the fixed-length assessment. PMID:28931496
Kertesz, Stefan. G.; Pollio, David E.; Jones, Richard N.; Steward, Jocelyn; Stringfellow, Erin J.; Gordon, Adam J.; Johnson, Nancy K.; Kim, Theresa A.; Granstaff, Unita; Austin, Erika L.; Young, Alexander S.; Golden, Joya; Davis, Lori L.; Roth, David L.; Holt, Cheryl L.
2015-01-01
Background Homeless patients face unique challenges in obtaining primary care responsive to their needs and context. Patient experience questionnaires could permit assessment of patient-centered medical homes for this population, but standard instruments may not reflect homeless patients' priorities and concerns. Objectives This report describes (a) the content and psychometric properties of a new primary care questionnaire for homeless patients and (b) the methods utilized in its development. Methods Starting with quality-related constructs from the Institute of Medicine, we identified relevant themes by interviewing homeless patients and experts in their care. A multidisciplinary team drafted a preliminary set of 78 items. This was administered to homeless-experienced clients (n=563) across 3 VA facilities and 1 non-VA Health Care for the Homeless Program. Using Item Response Theory, we examined Test Information Function curves to eliminate less informative items and devise plausibly distinct subscales. Results The resulting 33-item instrument (Primary Care Quality-Homeless, PCQ-H) has four subscales: Patient-Clinician Relationship (15 items), Cooperation among Clinicians (3 items), Access/Coordination (11 items) and Homeless-Specific Needs (4 items). Evidence for divergent and convergent validity is provided. Test Information Function (TIF) graphs showed adequate informational value to permit inferences about groups for 3 subscales (Relationship, Cooperation and Access/Coordination). The 3-item Cooperation subscale had lower informational value (TIF<5) but had good internal consistency (alpha=0.75) and patients frequently reported problems in this aspect of care. Conclusions Systematic application of qualitative and quantitative methods supported the development of a brief patient-reported questionnaire focused on the primary care of homeless patients and offers guidance for future population-specific instrument development. PMID:25023918
Kiltz, U; van der Heijde, D; Boonen, A; Bautista-Molano, W; Burgos-Vargas, R; Chiowchanwisawakit, P; Duruoz, T; El-Zorkany, B; Essers, I; Gaydukova, I; Géher, P; Gossec, L; Grazio, S; Gu, J; Khan, M A; Kim, T J; Maksymowych, W P; Marzo-Ortega, H; Navarro-Compán, V; Olivieri, I; Patrikos, D; Pimentel-Santos, F M; Schirmer, M; van den Bosch, F; Weber, U; Zochling, J; Braun, J
2016-01-01
The Assessments of SpondyloArthritis international society Health Index (ASAS HI) measures functioning and health in patients with spondyloarthritis (SpA) across 17 aspects of health and 9 environmental factors (EF). The objective was to translate and adapt the original English version of the ASAS HI, including the EF Item Set, cross-culturally into 15 languages. Translation and cross-cultural adaptation has been carried out following the forward-backward procedure. In the cognitive debriefing, 10 patients/country across a broad spectrum of sociodemographic background, were included. The ASAS HI and the EF Item Set were translated into Arabic, Chinese, Croatian, Dutch, French, German, Greek, Hungarian, Italian, Korean, Portuguese, Russian, Spanish, Thai and Turkish. Some difficulties were experienced with translation of the contextual factors indicating that these concepts may be more culturally-dependent. A total of 215 patients with axial SpA across 23 countries (62.3% men, mean (SD) age 42.4 (13.9) years) participated in the field test. Cognitive debriefing showed that items of the ASAS HI and EF Item Set are clear, relevant and comprehensive. All versions were accepted with minor modifications with respect to item wording and response option. The wording of three items had to be adapted to improve clarity. As a result of cognitive debriefing, a new response option 'not applicable' was added to two items of the ASAS HI to improve appropriateness. This study showed that the items of the ASAS HI including the EFs were readily adaptable throughout all countries, indicating that the concepts covered were comprehensive, clear and meaningful in different cultures.
Kiltz, U; van der Heijde, D; Boonen, A; Bautista-Molano, W; Burgos-Vargas, R; Chiowchanwisawakit, P; Duruoz, T; El-Zorkany, B; Essers, I; Gaydukova, I; Géher, P; Gossec, L; Grazio, S; Gu, J; Khan, M A; Kim, T J; Maksymowych, W P; Marzo-Ortega, H; Navarro-Compán, V; Olivieri, I; Patrikos, D; Pimentel-Santos, F M; Schirmer, M; van den Bosch, F; Weber, U; Zochling, J; Braun, J
2016-01-01
Introduction The Assessments of SpondyloArthritis international society Health Index (ASAS HI) measures functioning and health in patients with spondyloarthritis (SpA) across 17 aspects of health and 9 environmental factors (EF). The objective was to translate and adapt the original English version of the ASAS HI, including the EF Item Set, cross-culturally into 15 languages. Methods Translation and cross-cultural adaptation has been carried out following the forward–backward procedure. In the cognitive debriefing, 10 patients/country across a broad spectrum of sociodemographic background, were included. Results The ASAS HI and the EF Item Set were translated into Arabic, Chinese, Croatian, Dutch, French, German, Greek, Hungarian, Italian, Korean, Portuguese, Russian, Spanish, Thai and Turkish. Some difficulties were experienced with translation of the contextual factors indicating that these concepts may be more culturally-dependent. A total of 215 patients with axial SpA across 23 countries (62.3% men, mean (SD) age 42.4 (13.9) years) participated in the field test. Cognitive debriefing showed that items of the ASAS HI and EF Item Set are clear, relevant and comprehensive. All versions were accepted with minor modifications with respect to item wording and response option. The wording of three items had to be adapted to improve clarity. As a result of cognitive debriefing, a new response option ‘not applicable’ was added to two items of the ASAS HI to improve appropriateness. Discussion This study showed that the items of the ASAS HI including the EFs were readily adaptable throughout all countries, indicating that the concepts covered were comprehensive, clear and meaningful in different cultures. PMID:27752358
A modular approach for item response theory modeling with the R package flirt.
Jeon, Minjeong; Rijmen, Frank
2016-06-01
The new R package flirt is introduced for flexible item response theory (IRT) modeling of psychological, educational, and behavior assessment data. flirt integrates a generalized linear and nonlinear mixed modeling framework with graphical model theory. The graphical model framework allows for efficient maximum likelihood estimation. The key feature of flirt is its modular approach to facilitate convenient and flexible model specifications. Researchers can construct customized IRT models by simply selecting various modeling modules, such as parametric forms, number of dimensions, item and person covariates, person groups, link functions, etc. In this paper, we describe major features of flirt and provide examples to illustrate how flirt works in practice.
NASA Astrophysics Data System (ADS)
Gönülateş, Emre; Kortemeyer, Gerd
2017-04-01
Homework is an important component of most physics courses. One of the functions it serves is to provide meaningful formative assessment in preparation for examinations. However, correlations between homework and examination scores tend to be low, likely due to unproductive student behavior such as copying and random guessing of answers. In this study, we attempt to model these two counterproductive learner behaviors within the framework of Item Response Theory in order to provide an ability measurement that strongly correlates with examination scores. We find that introducing additional item parameters leads to worse predictions of examination grades, while introducing additional learner traits is a more promising approach.
ERIC Educational Resources Information Center
Wu, Li-Tzy; Ringwalt, Christopher L.; Yang, Chongming; Reeve, Bryce B.; Pan, Jeng-Jong; Blazer, Dan G.
2009-01-01
DSM-IV's hierarchical distinction between abuse of and dependence on prescription opioids is not supported since the symptoms of abuse in adolescents are not less severe than dependence. The finding is based on the examination of the DSM-IV criteria for opioid use disorders using item response theory.
ERIC Educational Resources Information Center
Edelen, Maria Orlando; McCaffrey, Daniel F.; Marshall, Grant N.; Jaycox, Lisa H.
2009-01-01
Accurate assessment of attitudes about intimate partner violence is important for evaluation of prevention and early intervention programs. Assessment of attitudes about cross-gender interactions is particularly susceptible to bias because it requires specifying the gender of the perpetrator and the victim. As it is likely that respondents will…
USDA-ARS?s Scientific Manuscript database
This study aimed to evaluate the psychometric properties of four self-efficacy scales (i.e., self-efficacy for fruit (FSE), vegetable (VSE), and water (WSE) intakes, and physical activity (PASE)) and to investigate their differences in item functioning across sex, age, and body weight status groups ...
A Comparison of Uniform DIF Effect Size Estimators under the MIMIC and Rasch Models
ERIC Educational Resources Information Center
Jin, Ying; Myers, Nicholas D.; Ahn, Soyeon; Penfield, Randall D.
2013-01-01
The Rasch model, a member of a larger group of models within item response theory, is widely used in empirical studies. Detection of uniform differential item functioning (DIF) within the Rasch model typically employs null hypothesis testing with a concomitant consideration of effect size (e.g., signed area [SA]). Parametric equivalence between…
Measuring Social Well-Being in People with Chronic Illness
ERIC Educational Resources Information Center
Hahn, Elizabeth A.; Cella, David; Bode, Rita K.; Hanrahan, Rachel T.
2010-01-01
Although social well-being (SWB) is recognized as an integral component of health, it is rarely included in health-related quality of life (HRQL) instruments. Two SWB dimensions were identified by literature review: social support (SWB-SS) and social function (SWB-SF). As part of a larger project to develop item response theory-derived item banks…
ERIC Educational Resources Information Center
Lee, Soo; Suh, Youngsuk
2018-01-01
Lord's Wald test for differential item functioning (DIF) has not been studied extensively in the context of the multidimensional item response theory (MIRT) framework. In this article, Lord's Wald test was implemented using two estimation approaches, marginal maximum likelihood estimation and Bayesian Markov chain Monte Carlo estimation, to detect…
Development of the NIH PROMIS ® Sexual Function and Satisfaction measures in patients with cancer.
Flynn, Kathryn E; Lin, Li; Cyranowski, Jill M; Reeve, Bryce B; Reese, Jennifer Barsky; Jeffery, Diana D; Smith, Ashley Wilder; Porter, Laura S; Dombeck, Carrie B; Bruner, Deborah Watkins; Keefe, Francis J; Weinfurt, Kevin P
2013-02-01
We describe the development and validation of the Patient-Reported Outcomes Measurement Information System(®) Sexual Function and Satisfaction (PROMIS(®) SexFS; National Institutes of Health) measures, version 1.0, for cancer populations. To develop a customizable self-report measure of sexual function and satisfaction as part of the U.S. National Institutes of Health PROMIS Network. Our multidisciplinary working group followed a comprehensive protocol for developing psychometrically robust patient-reported outcome measures including qualitative (scale development) and quantitative (psychometric evaluation) development. We performed an extensive literature review, conducted 16 focus groups with cancer patients and multiple discussions with clinicians, and evaluated candidate items in cognitive testing with patients. We administered items to 819 cancer patients. Items were calibrated using item-response theory and evaluated for reliability and validity. The PROMIS SexFS measures, version 1.0, include 81 items in 11 domains: Interest in Sexual Activity, Lubrication, Vaginal Discomfort, Erectile Function, Global Satisfaction with Sex Life, Orgasm, Anal Discomfort, Therapeutic Aids, Sexual Activities, Interfering Factors, and Screener Questions. In addition to content validity (patients indicate that items cover important aspects of their experiences) and face validity (patients indicate that items measure sexual function and satisfaction), the measure shows evidence for discriminant validity (domains discriminate between groups expected to be different) and convergent validity (strong correlations between scores on PROMIS and scores on conceptually similar older measures of sexual function), as well as favorable test-retest reliability among people not expected to change (interclass correlations from two administrations of the instrument, 1 month apart). The PROMIS SexFS offers researchers a reliable and valid set of tools to measure self-reported sexual function and satisfaction among diverse men and women. The measures are customizable; researchers can select the relevant domains and items comprising those domains for their study. © 2013 International Society for Sexual Medicine.
Marfeo, Elizabeth E; Ni, Pengsheng; Haley, Stephen M; Bogusz, Kara; Meterko, Mark; McDonough, Christine M; Chan, Leighton; Rasch, Elizabeth K; Brandt, Diane E; Jette, Alan M
2013-09-01
To use item response theory (IRT) data simulations to construct and perform initial psychometric testing of a newly developed instrument, the Social Security Administration Behavioral Health Function (SSA-BH) instrument, that aims to assess behavioral health functioning relevant to the context of work. Cross-sectional survey followed by IRT calibration data simulations. Community. Sample of individuals applying for Social Security Administration disability benefits: claimants (n=1015) and a normative comparative sample of U.S. adults (n=1000). None. SSA-BH measurement instrument. IRT analyses supported the unidimensionality of 4 SSA-BH scales: mood and emotions (35 items), self-efficacy (23 items), social interactions (6 items), and behavioral control (15 items). All SSA-BH scales demonstrated strong psychometric properties including reliability, accuracy, and breadth of coverage. High correlations of the simulated 5- or 10-item computer adaptive tests with the full item bank indicated robust ability of the computer adaptive testing approach to comprehensively characterize behavioral health function along 4 distinct dimensions. Initial testing and evaluation of the SSA-BH instrument demonstrated good accuracy, reliability, and content coverage along all 4 scales. Behavioral function profiles of Social Security Administration claimants were generated and compared with age- and sex-matched norms along 4 scales: mood and emotions, behavioral control, social interactions, and self-efficacy. Using the computer adaptive test-based approach offers the ability to collect standardized, comprehensive functional information about claimants in an efficient way, which may prove useful in the context of the Social Security Administration's work disability programs. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Paap, Muirne C S; Kroeze, Karel A; Terwee, Caroline B; van der Palen, Job; Veldkamp, Bernard P
2017-11-01
Examining item usage is an important step in evaluating the performance of a computerized adaptive test (CAT). We study item usage for a newly developed multidimensional CAT which draws items from three PROMIS domains, as well as a disease-specific one. The multidimensional item bank used in the current study contained 194 items from four domains: the PROMIS domains fatigue, physical function, and ability to participate in social roles and activities, and a disease-specific domain (the COPD-SIB). The item bank was calibrated using the multidimensional graded response model and data of 795 patients with chronic obstructive pulmonary disease. To evaluate the item usage rates of all individual items in our item bank, CAT simulations were performed on responses generated based on a multivariate uniform distribution. The outcome variables included active bank size and item overuse (usage rate larger than the expected item usage rate). For average θ-values, the overall active bank size was 9-10%; this number quickly increased as θ-values became more extreme. For values of -2 and +2, the overall active bank size equaled 39-40%. There was 78% overlap between overused items and active bank size for average θ-values. For more extreme θ-values, the overused items made up a much smaller part of the active bank size: here the overlap was only 35%. Our results strengthen the claim that relatively short item banks may suffice when using polytomous items (and no content constraints/exposure control mechanisms), especially when using MCAT.
Validation of a condition-specific measure for women having an abnormal screening mammography.
Brodersen, John; Thorsen, Hanne; Kreiner, Svend
2007-01-01
The aim of this study is to assess the validity of a new condition-specific instrument measuring psychosocial consequences of abnormal screening mammography (PCQ-DK33). The draft version of the PCQ-DK33 was completed on two occasions by 184 women who had received an abnormal screening mammography and on one occasion by 240 women who had received a normal screening result. Item Response Theories and Classical Test Theories were used to analyze data. Construct validity, concurrent validity, known group validity, objectivity and reliability were established by item analysis examining the fit between item responses and Rasch models. Six dimensions covering anxiety, behavioral impact, sense of dejection, impact on sleep, breast examination, and sexuality were identified. One item belonging to the dejection dimension had uniform differential item functioning. Two items not fitting the Rasch models were retained because of high face validity. A sick leave item added useful information when measuring side effects and socioeconomic consequences of breast cancer screening. Five "poor items" were identified and should be deleted from the final instrument. Preliminary evidence for a valid and reliable condition-specific measure for women having an abnormal screening mammography was established. The measure includes 27 "good" items measuring different attributes of the same overall latent structure-the psychosocial consequences of abnormal screening mammography.
Bernstein, Ira H.; Rush, A. John; Carmody, Thomas J.; Woo, Ada; Trivedi, Madhukar H.
2007-01-01
Objectives Recent work using classical test theory (CTT) and item response theory (IRT) has found that the self-report (QIDS-SR16) and clinician-rated (QIDS-C16) versions of the 16-item Quick Inventory of Depressive Symptomatology were generally comparable in outpatients with nonpsychotic major depressive disorder (MDD). This report extends this comparison to a less well-educated, more treatment-resistant sample that included more ethnic/racial minorities using IRT and selected classical test analyses. Methods The QIDS-SR16 and QIDS-C16 were obtained in a sample of 441 outpatients with nonpsychotic MDD seen in the public sector in the Texas Medication Algorithm Project (TMAP). The Samejima graded response IRT model was used to compare the QIDS-SR16 and QIDS-C16. Results The nine symptom domains in the QIDS-SR16 and QIDS-C16 related well to overall depression. The slopes of the item response functions a), which index the strength of relationship between overall depression and each symptom, were extremely similar with the two measures. Likewise, the CTT and IRT indices of symptom frequency (item means and locations of the item response functions, bi) were also similar with these two measures. For example, sad mood and difficulty with concentration/decision making were highly related to the overall depression severity with both the QIDS-C16 and QIDS-SR16. Likewise, sleeping difficulties were commonly reported, even though they were not as strongly related to overall magnitude of depression. Conclusion In this less educated, socially disadvantaged sample, differences between the QIDS-C16 and QIDS-SR16 were minor. The QIDS-SR16 is a satisfactory substitute for the more time-consuming QIDS-C16 in a broad range of adult, nonpsychotic, depressed outpatients. PMID:16716351
Bernstein, Ira H; Rush, A John; Carmody, Thomas J; Woo, Ada; Trivedi, Madhukar H
2007-01-01
Recent work using classical test theory (CTT) and item response theory (IRT) has found that the self-report (QIDS-SR(16)) and clinician-rated (QIDS-C(16)) versions of the 16-item quick inventory of depressive symptomatology were generally comparable in outpatients with nonpsychotic major depressive disorder (MDD). This report extends this comparison to a less well-educated, more treatment-resistant sample that included more ethnic/racial minorities using IRT and selected classical test analyses. The QIDS-SR(16) and QIDS-C(16) were obtained in a sample of 441 outpatients with nonpsychotic MDD seen in the public sector in the Texas Medication Algorithm Project (TMAP). The Samejima graded response IRT model was used to compare the QIDS-SR(16) and QIDS-C(16). The nine symptom domains in the QIDS-SR(16) and QIDS-C(16) related well to overall depression. The slopes of the item response functions, a, which index the strength of relationship between overall depression and each symptom, were extremely similar with the two measures. Likewise, the CTT and IRT indices of symptom frequency (item means and locations of the item response functions, b(i) were also similar with these two measures. For example, sad mood and difficulty with concentration/decision making were highly related to the overall depression severity with both the QIDS-C(16) and QIDS-SR(16). Likewise, sleeping difficulties were commonly reported, even though they were not as strongly related to overall magnitude of depression. In this less educated, socially disadvantaged sample, differences between the QIDS-C(16) and QIDS-SR(16) were minor. The QIDS-SR(16) is a satisfactory substitute for the more time-consuming QIDS-C(16) in a broad range of adult, nonpsychotic, depressed outpatients.
Constantine, Melissa L; Pauls, Rachel N; Rogers, Rebecca R; Rockwood, Todd H
2017-12-01
The Prolapse/Incontinence Sexual Questionnaire-International Urogynecology Association (IUGA) Revised (PISQ-IR) measures sexual function in women with pelvic floor disorders (PFDs) yet is unwieldy, with six individual subscale scores for sexually active women and four for women who are not. We hypothesized that a valid and responsive summary score could be created for the PISQ-IR. Item response data from participating women who completed a revised version of the PISQ-IR at three clinical sites were used to generate item weights using a magnitude estimation (ME) and Q-sort (Q) approaches. Item weights were applied to data from the original PISQ-IR validation to generate summary scores. Correlation and factor analysis methods were used to evaluate validity and responsiveness of summary scores. Weighted and nonweighted summary scores for the sexually active PISQ-IR demonstrated good criterion validity with condition-specific measures: Incontinence Severity Index = 0.12, 0.11, 0.11; Pelvic Floor Distress Inventory-20 = 0.39, 0.39, 0.12; Epidemiology of Prolapse and Incontinence Questionnaire-Q35 = 0.26 0,.25, 0.40); Female Sexual Functioning Index subscale total score = 0.72, 0.75, 0.72 for nonweighted, ME, and Q summary scores, respectively. Responsiveness evaluation showed weighted and nonweighted summary scores detected moderate effect sizes (Cohen's d > 0.5). Weighted items for those NSA demonstrated significant floor effects and did not meet criterion validity. A PISQ-IR summary score for use with sexually active women, nonweighted or calculated with ME or Q item weights, is a valid and reliable measure for clinical use. The summary scores provide value for assesing clinical treatment of pelvic floor disorders.
Vision and Quality of Life Index: validation of the Indian version using Rasch analysis.
Gothwal, Vijaya K; Bagga, Deepak K
2013-07-18
A multi-attribute utility instrument (MAUI) consists of a descriptive system in which the items and responses seek information about a concept of the universe of health-related quality of life (QoL), and responses to these items then are weighted and combined to produce the index. To our knowledge, the 6-item Vision and Quality of Life Index (VisQoL) is the only available vision-related MAUI, developed and validated in Australia, specifically for visually impaired (VI) populations. To our knowledge, the psychometric properties of the VisQoL have not yet been investigated in an Indian VI sample; this was the aim of our study. The Indian VisQoL was administered to 349 VI adults face-to-face by a trained interviewer at the Vision Rehabilitation Centres of a tertiary eye care facility, South India. Rasch analysis was used to assess the psychometric properties. Rescoring was necessary for all except one item before ordered thresholds were obtained. All items fit the Rasch model and unidimensionality was confirmed. Person separation was acceptable (2.01), indicating that the instrument can discriminate among three strata of participants" vision-related QoL (VRQoL). The VisQoL items were targeted substantially to the participants" VRQoL (-0.69 logits). One item ("ability to have friendships") demonstrated large differential item functioning by work status; working participants reported the item to be more difficult (-1.13 logits) relative to other items when compared to the nonworking participants. The 6-item Indian VisQoL satisfies unidimensional Rasch model expectations in VI patients. Disordering of response categories was evident; replication is required before a common rescoring option should be considered.
Claesson, Margareta; Armitage, W John; Byström, Berit; Montan, Per; Samolov, Branka; Stenvi, Ulf; Lundström, Mats
2017-09-01
Catquest-9SF is a 9-item visual disability questionnaire developed for evaluating patient-reported outcome measures after cataract surgery. The aim of this study was to use Rasch analysis to determine the responsiveness of Catquest-9SF for corneal transplant patients. Patients who underwent corneal transplantation primarily to improve vision were included. One group (n = 199) completed the Catquest-9SF questionnaire before corneal transplantation and a second independent group (n = 199) completed the questionnaire 2 years after surgery. All patients were recorded in the Swedish Cornea Registry, which provided clinical and demographic data for the study. Winsteps software v.3.91.0 (Winsteps.com, Beaverton, OR) was used to assess the fit of the Catquest-9SF data to the Rasch model. Rasch analysis showed that Catquest-9SF applied to corneal transplant patients was unidimensional (infit range, 0.73-1.32; outfit range, 0.81-1.35), and therefore, measured a single underlying construct (visual disability). The Rasch model explained 68.5% of raw variance. The response categories of the 9-item questionnaire were ordered, and the category thresholds were well defined. Item difficulty matched the level of patients' ability (0.36 logit difference between the means). Precision in terms of person separation (3.09) and person reliability (0.91) was good. Differential item functioning was notable for only 1 item (satisfaction with vision), which had a differential item functioning contrast of 1.08 logit. Rasch analysis showed that Catquest-9SF is a valid instrument for measuring visual disability in patients who have undergone corneal transplantation primarily to improve vision.
Heinemann, Allen W; Kisala, Pamela A; Hahn, Elizabeth A; Tulsky, David S
2015-05-01
To develop a spinal cord injury (SCI)-focused version of PROMIS and Neuro-QOL social domain item banks; evaluate the psychometric properties of items developed for adults with SCI; and report information to facilitate clinical and research use. We used a mixed-methods design to develop and evaluate Ability to Participate in Social Roles and Activities and Satisfaction with Social Roles and Activities items. Focus groups helped define the constructs; cognitive interviews helped revise items; and confirmatory factor analysis and item response theory methods helped calibrate item banks and evaluate differential item functioning related to demographic and injury characteristics. Five SCI Model System sites and one Veterans Administration medical center. The calibration sample consisted of 641 individuals; a reliability sample consisted of 245 individuals residing in the community. A subset of 27 Ability to Participate and 35 Satisfaction items demonstrated good measurement properties and negligible differential item functioning related to demographic and injury characteristics. The SCI-specific measures correlate strongly with the PROMIS and Neuro-QOL versions. Ten item short forms correlate >0.96 with the full banks. Variable-length CATs with a minimum of 4 items, variable-length CATs with a minimum of 8 items, fixed-length CATs of 10 items, and the 10-item short forms demonstrate construct coverage and measurement error that is comparable to the full item bank. The Ability to Participate and Satisfaction with Social Roles and Activities CATs and short forms demonstrate excellent psychometric properties and are suitable for clinical and research applications.
Prado, Jérôme; Noveck, Ira A
2007-04-01
Participants experience difficulty detecting that an item depicting an H-in-a-square confirms the logical rule, "If there is not a T then there is not a circle." Indeed, there is a perceptual conflict between the items mentioned in the rule (T and circle) and in the test item (H and square). Much evidence supports the claim that correct responding depends on detecting and resolving such conflicts. One aim of this study is to find more precise neurological evidence in support of this claim by using a parametric event-related functional magnetic resonance imaging (fMRI) paradigm. We scanned 20 participants while they were required to judge whether or not a conditional rule was verified (or falsified) by a corresponding target item. We found that the right middorsolateral prefrontal cortex (mid-DLPFC) was specifically engaged, together with the medial frontal (anterior cingulate and presupplementary motor area [pre-SMA]) and parietal cortices, when mismatching was present. Activity in these regions was also linearly correlated with the level of mismatch between the rule and the test item. Furthermore, a psychophysiological interaction analysis revealed that activation of the mid-DLPFC, which increases as mismatching does, was accompanied by a decrease in functional integration with the bilateral primary visual cortex and an increase in functional integration with the right parietal cortex. This indicates a need to break away from perceptual cues in order to select an appropriate logical response. These findings strongly indicate that the regions involved in inhibitory control (including the right mid-DLPFC and the medial frontal cortex) are engaged when participants have to overcome perceptual mismatches in order to provide a logical response. These findings are also consistent with neuroimaging studies investigating the belief bias, where prior beliefs similarly interfere with logical reasoning.
Improving Measurement Efficiency of the Inner EAR Scale with Item Response Theory.
Jessen, Annika; Ho, Andrew D; Corrales, C Eduardo; Yueh, Bevan; Shin, Jennifer J
2018-02-01
Objectives (1) To assess the 11-item Inner Effectiveness of Auditory Rehabilitation (Inner EAR) instrument with item response theory (IRT). (2) To determine whether the underlying latent ability could also be accurately represented by a subset of the items for use in high-volume clinical scenarios. (3) To determine whether the Inner EAR instrument correlates with pure tone thresholds and word recognition scores. Design IRT evaluation of prospective cohort data. Setting Tertiary care academic ambulatory otolaryngology clinic. Subjects and Methods Modern psychometric methods, including factor analysis and IRT, were used to assess unidimensionality and item properties. Regression methods were used to assess prediction of word recognition and pure tone audiometry scores. Results The Inner EAR scale is unidimensional, and items varied in their location and information. Information parameter estimates ranged from 1.63 to 4.52, with higher values indicating more useful items. The IRT model provided a basis for identifying 2 sets of items with relatively lower information parameters. Item information functions demonstrated which items added insubstantial value over and above other items and were removed in stages, creating a 8- and 3-item Inner EAR scale for more efficient assessment. The 8-item version accurately reflected the underlying construct. All versions correlated moderately with word recognition scores and pure tone averages. Conclusion The 11-, 8-, and 3-item versions of the Inner EAR scale have strong psychometric properties, and there is correlational validity evidence for the observed scores. Modern psychometric methods can help streamline care delivery by maximizing relevant information per item administered.
Suzukamo, Yoshimi; Oshika, Tetsuro; Yuzawa, Mitsuko; Tokuda, Yoshihiro; Tomidokoro, Atsuo; Oki, Kotaro; Mangione, Carol M; Green, Joseph; Fukuhara, Shunichi
2005-10-26
The importance of evaluating the outcomes of health care from the standpoint of the patient is now widely recognized. The purpose of this study is to develop and test a Japanese version of the National Eye Institute Visual Function Questionnaire (NEI VFQ-25). A Japanese version was developed with a previously standardized method. The questionnaire and optional items were completed by 245 patients with cataracts, glaucoma, or age-related macular degeneration, by 110 others before and after cataract surgery, and by a reference group (n = 31). We computed rates of missing data, measured reproducibility and internal consistency reliability, and tested for convergent and discriminant validity, concurrent validity, known-groups validity, factor structure, and responsiveness to change. Based on information from the participants, some items were changed to 2-step items (asking if an activity was done, and if it was done, then asking how difficult it was). The near-vision and distance-vision subscales each had 1 item that was endorsed by very few participants, so these items were replaced with items that were optional in the English version. For example, more than 60% of participants did not drive, so the driving question was excluded. Reliability and validity were adequate for all subscales except driving, ocular pain, color vision, and peripheral vision. With cataract surgery, most scores improved by at least 20 points. With minor modifications from the English version, the Japanese NEI VFQ-25 can give reliable, valid, responsive data on vision-related quality of life, for group-level comparisons or for tracking therapeutic outcomes.
The Information Function for the One-Parameter Logistic Model: Is it Reliability?
ERIC Educational Resources Information Center
Doran, Harold C.
2005-01-01
The information function is an important statistic in item response theory (IRT) applications. Although the information function is often described as the IRT version of reliability, it differs from the classical notion of reliability from a critical perspective: replication. This article first explores the information function for the…
Forrest, Christopher B; Meltzer, Lisa J; Marcus, Carole L; de la Motte, Anna; Kratchman, Amy; Buysse, Daniel J; Pilkonis, Paul A; Becker, Brandon D; Bevans, Katherine B
2018-03-13
To develop and evaluate the measurement properties of child-report and parent-proxy versions of the PROMIS ® Pediatric Sleep Disturbance and Sleep-Related Impairment item banks. A national sample of 1,104 children (8-17 years-old) and 1,477 parents of children 5-17 years-old was recruited from an internet panel to evaluate the psychometric properties of 43 sleep health items. A convenience sample of children and parents recruited from a pediatric sleep clinic was obtained to provide evidence of the measures' validity; polysomnography data were collected from a subgroup of these children. Factor analyses suggested two dimensions: sleep disturbance and daytime sleep-related impairment. The final item banks included 15 items for Sleep Disturbance and 13 for Sleep-Related Impairment. Items were calibrated using the graded response model from item response theory. Of the 28 items, 16 are included in the parallel PROMIS adult sleep health measures. Reliability of the measures exceeded 0.90. Validity was supported by correlations with existing measures of pediatric sleep health and higher sleep disturbance and sleep-related impairment scores for children with sleep problems and those with chronic and neurodevelopmental disorders. The sleep health measures were not correlated with results from polysomnography. The PROMIS Pediatric Sleep Disturbance and Sleep-Related Impairment item banks provide subjective assessments of a child's difficulties falling and staying asleep as well as daytime sleepiness and its impact on functioning. They may prove useful in the future for clinical research and practice. Future research should evaluate their responsiveness to clinical change in diverse patient populations.
Kalpakjian, Claire Z.; Tulsky, David S.; Kisala, Pamela A.; Bombardier, Charles H.
2015-01-01
Objective To develop an item response theory (IRT) calibrated Grief and Loss item bank as part of the Spinal Cord Injury – Quality of Life (SCI-QOL) measurement system. Design A literature review guided framework development of grief/loss. New items were created from focus groups. Items were revised based on expert review and patient feedback and were then field tested. Analyses included confirmatory factor analysis (CFA), graded response IRT modeling and evaluation of differential item functioning (DIF). Setting We tested a 20-item pool at several rehabilitation centers across the United States, including the University of Michigan, Kessler Foundation, Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs hospital. Participants A total of 717 individuals with SCI answered the grief and loss questions. Results The final calibrated item bank resulted in 17 retained items. A unidimensional model was observed (CFI = 0.976; RMSEA = 0.078) and measurement precision was good (theta range between −1.48 to 2.48). Ten items were flagged for DIF, however, after examination of effect sizes found this to be negligible with little practical impact on score estimates. Conclusions This study indicates that the SCI-QOL Grief and Loss item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010969
The Effects of Item Format and Cognitive Domain on Students' Science Performance in TIMSS 2011
NASA Astrophysics Data System (ADS)
Liou, Pey-Yan; Bulut, Okan
2017-12-01
The purpose of this study was to examine eighth-grade students' science performance in terms of two test design components, item format, and cognitive domain. The portion of Taiwanese data came from the 2011 administration of the Trends in International Mathematics and Science Study (TIMSS), one of the major international large-scale assessments in science. The item difficulty analysis was initially applied to show the proportion of correct items. A regression-based cumulative link mixed modeling (CLMM) approach was further utilized to estimate the impact of item format, cognitive domain, and their interaction on the students' science scores. The results of the proportion-correct statistics showed that constructed-response items were more difficult than multiple-choice items, and that the reasoning cognitive domain items were more difficult compared to the items in the applying and knowing domains. In terms of the CLMM results, students tended to obtain higher scores when answering constructed-response items as well as items in the applying cognitive domain. When the two predictors and the interaction term were included together, the directions and magnitudes of the predictors on student science performance changed substantially. Plausible explanations for the complex nature of the effects of the two test-design predictors on student science performance are discussed. The results provide practical, empirical-based evidence for test developers, teachers, and stakeholders to be aware of the differential function of item format, cognitive domain, and their interaction in students' science performance.
On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis
ERIC Educational Resources Information Center
Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas
2011-01-01
The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…
Petrillo, Jennifer; Cano, Stefan J; McLeod, Lori D; Coon, Cheryl D
2015-01-01
To provide comparisons and a worked example of item- and scale-level evaluations based on three psychometric methods used in patient-reported outcome development-classical test theory (CTT), item response theory (IRT), and Rasch measurement theory (RMT)-in an analysis of the National Eye Institute Visual Functioning Questionnaire (VFQ-25). Baseline VFQ-25 data from 240 participants with diabetic macular edema from a randomized, double-masked, multicenter clinical trial were used to evaluate the VFQ at the total score level. CTT, RMT, and IRT evaluations were conducted, and results were assessed in a head-to-head comparison. Results were similar across the three methods, with IRT and RMT providing more detailed diagnostic information on how to improve the scale. CTT led to the identification of two problematic items that threaten the validity of the overall scale score, sets of redundant items, and skewed response categories. IRT and RMT additionally identified poor fit for one item, many locally dependent items, poor targeting, and disordering of over half the response categories. Selection of a psychometric approach depends on many factors. Researchers should justify their evaluation method and consider the intended audience. If the instrument is being developed for descriptive purposes and on a restricted budget, a cursory examination of the CTT-based psychometric properties may be all that is possible. In a high-stakes situation, such as the development of a patient-reported outcome instrument for consideration in pharmaceutical labeling, however, a thorough psychometric evaluation including IRT or RMT should be considered, with final item-level decisions made on the basis of both quantitative and qualitative results. Copyright © 2015. Published by Elsevier Inc.
Examining the Measurement Precision and Invariance of the Revised Get Ready to Read!
Farrington, Amber L.; Lonigan, Christopher J.
2016-01-01
Children's emergent literacy skills are highly predictive of later reading abilities. To determine which children have weaker emergent literacy skills and are in need of intervention, it is necessary to assess emergent literacy skills accurately and reliably. In this study, 1,351 children were administered the Revised Get Ready to Read! (GRTR-R), and an item response theory analysis was used to evaluate the item-level reliability of the measure. Differential item functioning (DIF) analyses were conducted to examine whether items function similarly between subpopulations of children. The GRTR-R had acceptable reliability for children whose ability level was just below the mean. DIF for a small number of items was present for only two comparisons—children who were older versus younger and children who were White versus African American. These results demonstrate that the GRTR-R has acceptable reliability and limited DIF, enabling the screener to identify those at risk for developing reading problems. PMID:23851136
ERIC Educational Resources Information Center
Wei, Tianlan; Chesnut, Steven R.; Barnard-Brak, Lucy; Stevens, Tara; Olivárez, Arturo, Jr.
2014-01-01
As the United States has begun to lag behind other developed countries in performance on mathematics and science, researchers have sought to explain this with theories of teaching, knowledge, and motivation. We expand this examination by further analyzing a measure of interest that has been linked to student performance in mathematics and…
Cross-cultural adaptation of the Work Role Functioning Questionnaire 2.0 to Norwegian and Danish.
Johansen, Thomas; Lund, Thomas; Jensen, Chris; Momsen, Anne-Mette Hedeager; Eftedal, Monica; Øyeflaten, Irene; Braathen, Tore N; Stapelfeldt, Christina M; Amick, Ben; Labriola, Merete
2018-01-01
A healthy and productive working life has attracted attention owing to future employment and demographic challenges. The aim was to translate and adapt the Work Role Functioning Questionnaire (WRFQ) 2.0 to Norwegian and Danish. The WRFQ is a self-administered tool developed to identify health-related work limitations. Standardised cross-cultural adaptation procedures were followed in both countries' translation processes. Direct translation, synthesis, back translation and consolidation were carried out successfully. A pre-test among 78 employees who had returned to work after sickness absence found idiomatic issues requiring reformulation in the instructions, four items in the Norwegian version, and three items in the Danish version, respectively. In the final versions, seven items were adjusted in each country. Psychometric properties were analysed for the Norwegian sample (n = 40) and preliminary Cronbach's alpha coefficients were satisfactory. A final consensus process was performed to achieve similar titles and introductions. The WRFQ 2.0 cross-cultural adaptation to Norwegian and Danish was performed and consensus was obtained. Future validation studies will examine validity, reliability, responsiveness and differential item response. The WRFQ can be used to elucidate both individual and work environmental factors leading to a more holistic approach in work rehabilitation.
Crins, Martine H P; van der Wees, Philip J; Klausch, Thomas; van Dulmen, Simone A; Roorda, Leo D; Terwee, Caroline B
2018-01-01
The Patient-Reported Outcomes Measurement Information System (PROMIS) is a universally applicable set of instruments, including item banks, short forms and computer adaptive tests (CATs), measuring patient-reported health across different patient populations. PROMIS CATs are highly efficient and the use in practice is considered feasible with little administration time, offering standardized and routine patient monitoring. Before an item bank can be used as CAT, the psychometric properties of the item bank have to be examined. Therefore, the objective was to assess the psychometric properties of the Dutch-Flemish PROMIS Physical Function item bank (DF-PROMIS-PF) in Dutch patients receiving physical therapy. Cross-sectional study. 805 patients >18 years, who received any kind of physical therapy in primary care in the past year, completed the full DF-PROMIS-PF (121 items). Unidimensionality was examined by Confirmatory Factor Analysis and local dependence and monotonicity were evaluated. A Graded Response Model was fitted. Construct validity was examined with correlations between DF-PROMIS-PF T-scores and scores on two legacy instruments (SF-36 Health Survey Physical Functioning scale [SF36-PF10] and the Health Assessment Questionnaire Disability-Index [HAQ-DI]). Reliability (standard errors of theta) was assessed. The results for unidimensionality were mixed (scaled CFI = 0.924, TLI = 0.923, RMSEA = 0.045, 1th factor explained 61.5% of variance). Some local dependence was found (8.2% of item pairs). The item bank showed a broad coverage of the physical function construct (threshold-parameters range: -4.28-2.33) and good construct validity (correlation with SF36-PF10 = 0.84 and HAQ-DI = -0.85). Furthermore, the DF-PROMIS-PF showed greater reliability over a broader score-range than the SF36-PF10 and HAQ-DI. The psychometric properties of the DF-PROMIS-PF item bank are sufficient. The DF-PROMIS-PF can now be used as short forms or CAT to measure the level of physical function of physiotherapy patients.
Pilcher, June J; Switzer, Fred S; Munc, Alec; Donnelly, Janet; Jellen, Julia C; Lamm, Claus
2018-04-01
The purpose of this study is to examine the psychometric properties of the Epworth Sleepiness Scale (ESS) in two languages, German and English. Students from a university in Austria (N = 292; 55 males; mean age = 18.71 ± 1.71 years; 237 females; mean age = 18.24 ± 0.88 years) and a university in the US (N = 329; 128 males; mean age = 18.71 ± 0.88 years; 201 females; mean age = 21.59 ± 2.27 years) completed the ESS. An exploratory-factor analysis was completed to examine dimensionality of the ESS. Item response theory (IRT) analyses were used to provide information about the response rates on the items on the ESS and provide differential item functioning (DIF) analyses to examine whether the items were interpreted differently between the two languages. The factor analyses suggest that the ESS measures two distinct sleepiness constructs. These constructs indicate that the ESS is probing sleepiness in settings requiring active versus passive responding. The IRT analyses found that overall, the items on the ESS perform well as a measure of sleepiness. However, Item 8 and to a lesser extent Item 6 were being interpreted differently by respondents in comparison to the other items. In addition, the DIF analyses showed that the responses between German and English were very similar indicating that there are only minor measurement differences between the two language versions of the ESS. These findings suggest that the ESS provides a reliable measure of propensity to sleepiness; however, it does convey a two-factor approach to sleepiness. Researchers and clinicians can use the German and English versions of the ESS but may wish to exclude Item 8 when calculating a total sleepiness score.
Monclús Cols, Ester; Nicolás Ocejo, David; Sánchez Sánchez, Miquel; Ortega Romero, Mar
2015-02-01
To detect the problems hospital emergency room staff have when prescribing and administering antibiotics. A 14-item questionnaire was designed to assess staff members' knowledge of the importance of starting antibiotic treatment promptly, assigning appropriate dosing intervals, adjusting for renal function, and switching to oral therapy. Agreement with each item was expressed on a 5-point Likert scale. Items with a rate of appropriate response of less than 75% were targeted for specific attention. Two hundred questionnaires were distributed to the staff and 150 were returned completed (response rate, 75%). The following items were targeted for attention based on rates of appropriate response of less than 75%: clear medical orders (65%), understanding the implication of early empirical antibiotic therapy on prognosis in serious infections (67%), estimation of the prevalence of renal insufficiency (42%), assumption that a creatinine serum level under < 1.6 mg/dL is safe (33%), use of glomerular filtration rate to adjust dose according to renal function (47%), and an understanding of switching from intravenous to oral treatment (60%). This study revealed the difficulties medical and nursing staff have in prescribing and administering antibiotics in a hospital emergency department. The results can facilitate improvements in antibiotic therapy by pinpointing areas to target for specific training interventions or the design of electronic prescribing aids.
Reeve, Bryce B; Stover, Angela M; Alfano, Catherine M; Smith, Ashley Wilder; Ballard-Barbash, Rachel; Bernstein, Leslie; McTiernan, Anne; Baumgartner, Kathy B; Piper, Barbara F
2012-11-01
Brief, valid measures of fatigue, a prevalent and distressing cancer symptom, are needed for use in research. This study's primary aim was to create a shortened version of the revised Piper Fatigue Scale (PFS-R) based on data from a diverse cohort of breast cancer survivors. A secondary aim was to determine whether the PFS captured multiple distinct aspects of fatigue (a multidimensional model) or a single overall fatigue factor (a unidimensional model). Breast cancer survivors (n = 799; stages in situ through IIIa; ages 29-86 years) were recruited through three SEER registries (New Mexico, Western Washington, and Los Angeles, CA) as part of the Health, Eating, Activity, and Lifestyle (HEAL) study. Fatigue was measured approximately 3 years post-diagnosis using the 22-item PFS-R that has four subscales (Behavior, Affect, Sensory, and Cognition). Confirmatory factor analysis was used to compare unidimensional and multidimensional models. Six criteria were used to make item selections to shorten the PFS-R: scale's content validity, items' relationship with fatigue, content redundancy, differential item functioning by race and/or education, scale reliability, and literacy demand. Factor analyses supported the original 4-factor structure. There was also evidence from the bi-factor model for a dominant underlying fatigue factor. Six items tested positive for differential item functioning between African-American and Caucasian survivors. Four additional items either showed poor association, local dependence, or content validity concerns. After removing these 10 items, the reliability of the PFS-12 subscales ranged from 0.87 to 0.89, compared to 0.90-0.94 prior to item removal. The newly developed PFS-12 can be used to assess fatigue in African-American and Caucasian breast cancer survivors and reduces response burden without compromising reliability or validity. This is the first study to determine PFS literacy demand and to compare PFS-R responses in African-Americans and Caucasian breast cancer survivors. Further testing in diverse populations is warranted.
Hassett, Afton L; Li, Tracy; Buyske, Steven; Savage, Shantal V; Gignac, Monique A M
2008-05-01
To consider the feasibility of assessing multiple facets of independence in rheumatoid arthritis (RA) using a measure developed from existing items and examining its face validity, construct validity and responsiveness to change. The ATTAIN (Abatacept Trial in Treatment of Anti-tumor necrosis factor [TNF] Inadequate responders) database was used. Patients with RA were randomized 2:1, abatacept (n = 258) and placebo (n = 133). A multi-faceted scale to measure physical and psychosocial independence was constructed using items from the Health Assessment Questionnaire (HAQ) and Short Form 36 Health Survey (SF-36). Questions assessing activity limitations and need for outside caregiver help were also examined. Interviews with 20 RA patients assessed face validity. Item Response Theory analysis yielded two traits - 'Psychosocial Independence', derived from the number of days with activity limitations plus the Role Emotional, Social Functioning and Role Physical subscale items from the SF-36; and 'Physical Independence', derived from 15 HAQ items assessing need for help from another. The two traits showed no significant differential item functioning for age or gender and demonstrated good face validity. Changes over 169 days on Psychosocial Independence were greater (mean 0.46 units, 95% confidence interval [CI]: 0.17-0.75) for the abatacept group than for placebo (p = 0.002). Changes in Physical Independence were greater (mean 0.59 units, 95% CI: 0.35-0.82) for the abatacept group than for placebo (p < 0.001). The multi-faceted assessment of independence in RA based on items from commonly used instruments is feasible suggesting promise for evaluating independence in future clinical trials. This approach demonstrated good face and construct validity and responsiveness in RA patients who had previously failed anti-TNF therapy. However, we caution against an interpretation that these data suggest that abatacept improves independence because the component parts of this assessment came from instruments used in the ATTAIN trial where data had been previously analyzed.
A mixed-effects regression model for longitudinal multivariate ordinal data.
Liu, Li C; Hedeker, Donald
2006-03-01
A mixed-effects item response theory model that allows for three-level multivariate ordinal outcomes and accommodates multiple random subject effects is proposed for analysis of multivariate ordinal outcomes in longitudinal studies. This model allows for the estimation of different item factor loadings (item discrimination parameters) for the multiple outcomes. The covariates in the model do not have to follow the proportional odds assumption and can be at any level. Assuming either a probit or logistic response function, maximum marginal likelihood estimation is proposed utilizing multidimensional Gauss-Hermite quadrature for integration of the random effects. An iterative Fisher scoring solution, which provides standard errors for all model parameters, is used. An analysis of a longitudinal substance use data set, where four items of substance use behavior (cigarette use, alcohol use, marijuana use, and getting drunk or high) are repeatedly measured over time, is used to illustrate application of the proposed model.
Testing manifest monotonicity using order-constrained statistical inference.
Tijmstra, Jesper; Hessen, David J; van der Heijden, Peter G M; Sijtsma, Klaas
2013-01-01
Most dichotomous item response models share the assumption of latent monotonicity, which states that the probability of a positive response to an item is a nondecreasing function of a latent variable intended to be measured. Latent monotonicity cannot be evaluated directly, but it implies manifest monotonicity across a variety of observed scores, such as the restscore, a single item score, and in some cases the total score. In this study, we show that manifest monotonicity can be tested by means of the order-constrained statistical inference framework. We propose a procedure that uses this framework to determine whether manifest monotonicity should be rejected for specific items. This approach provides a likelihood ratio test for which the p-value can be approximated through simulation. A simulation study is presented that evaluates the Type I error rate and power of the test, and the procedure is applied to empirical data.
Measuring Workplace Climate in Community Clinics and Health Centers.
Friedberg, Mark W; Rodriguez, Hector P; Martsolf, Grant R; Edelen, Maria O; Vargas Bustamante, Arturo
2016-10-01
The effectiveness of community clinics and health centers' efforts to improve the quality of care might be modified by clinics' workplace climates. Several surveys to measure workplace climate exist, but their relationships to each other and to distinguishable dimensions of workplace climate are unknown. To assess the psychometric properties of a survey instrument combining items from several existing surveys of workplace climate and to generate a shorter instrument for future use. We fielded a 106-item survey, which included items from 9 existing instruments, to all clinicians and staff members (n=781) working in 30 California community clinics and health centers, receiving 628 responses (80% response rate). We performed exploratory factor analysis of survey responses, followed by confirmatory factor analysis of 200 reserved survey responses. We generated a new, shorter survey instrument of items with strong factor loadings. Six factors, including 44 survey items, emerged from the exploratory analysis. Two factors (Clinic Workload and Teamwork) were independent from the others. The remaining 4 factors (staff relationships, quality improvement orientation, managerial readiness for change, and staff readiness for change) were highly correlated, indicating that these represented dimensions of a higher-order factor we called "Clinic Functionality." This 2-level, 6-factor model fit the data well in the exploratory and confirmatory samples. For all but 1 factor, fewer than 20 survey responses were needed to achieve clinic-level reliability >0.7. Survey instruments designed to measure workplace climate have substantial overlap. The relatively parsimonious item set we identified might help target and tailor clinics' quality improvement efforts.
Measuring Workplace Climate in Community Clinics and Health Centers
Friedberg, Mark W.; Rodriguez, Hector P.; Martsolf, Grant; Edelen, Maria Orlando; Vargas-Bustamante, Arturo
2018-01-01
Background The effectiveness of community clinics and health centers’ efforts to improve the quality of care might be modified by clinics’ workplace climates. Several surveys to measure workplace climate exist, but their relationships to each other and to distinguishable dimensions of workplace climate are unknown. Objective To assess the psychometric properties of a survey instrument combining items from several existing surveys of workplace climate and to generate a shorter instrument for future use. Methods We fielded a 106-item survey, which included items from 9 existing instruments, to all clinicians and staff members (n=781) working in 30 California community clinics and health centers, receiving 628 responses (80% response rate). We performed exploratory factor analysis of survey responses, followed by confirmatory factor analysis of 200 reserved survey responses. We generated a new, shorter survey instrument of items with strong factor loadings. Results Six factors, including 44 survey items, emerged from the exploratory analysis. Two factors (Clinic Workload and Teamwork) were independent from the others. The remaining 4 factors (Staff Relationships, Quality Improvement Orientation, Managerial Readiness for Change, and Staff Readiness for Change) were highly correlated, indicating that these represented dimensions of a higher-order factor we called “Clinic Functionality.” This two-level, six-factor model fit the data well in the exploratory and confirmatory samples. For all but one factor, fewer than 20 survey responses were needed to achieve clinic-level reliability >0.7. Conclusion Survey instruments designed to measure workplace climate have substantial overlap. The relatively parsimonious item set we identified might help target and tailor clinics’ quality improvement efforts. PMID:27326549
Janulis, Patrick; Newcomb, Michael E; Sullivan, Patrick; Mustanski, Brian
2018-01-01
Knowledge about the transmission, prevention, and treatment of HIV remains a critical element in psychosocial models of HIV risk behavior and is commonly used as an outcome in HIV prevention interventions. However, most HIV knowledge questions have not undergone rigorous psychometric testing such as using item response theory. The current study used data from six studies of men who have sex with men (MSM; n = 3565) to (1) examine the item properties of HIV knowledge questions, (2) test for differential item functioning on commonly studied characteristics (i.e., age, race/ethnicity, and HIV risk behavior), (3) select items with the optimal item characteristics, and (4) leverage this combined dataset to examine the potential moderating effect of age on the relationship between condomless anal sex (CAS) and HIV knowledge. Findings indicated that existing questions tend to poorly differentiate those with higher levels of HIV knowledge, but items were relatively robust across diverse individuals. Furthermore, age moderated the relationship between CAS and HIV knowledge with older MSM having the strongest association. These findings suggest that additional items are required in order to capture a more nuanced understanding of HIV knowledge and that the association between CAS and HIV knowledge may vary by age.
Acculturation and the Center For Epidemiological Studies-Depression Scale for Hispanic women.
McCabe, Brian E; Vermeesch, Amber L; Hall, Rosemary F; Peragallo, Nilda P; Mitrani, Victoria B
2011-01-01
Culturally valid measures of depression for Spanish-speaking Hispanic women are important for developing and implementing effective interventions to reduce health disparities. The Center for Epidemiological Studies-Depression Scale (CES-D) is a widely used measure of depression. Differential item functioning has been studied using language preference as a proxy for acculturation, but it is unknown if the results were due to acculturation or the language of administration. The aim of this study was to evaluate the relationship of acculturation, defined with a dimensional measure, to Spanish CES-D item responses. Spanish-speaking Hispanic women (n = 504) were recruited for a randomized controlled trial of Salud, Educación, Prevención y Autocuidado (Health, Education, Prevention, and Self-Care). Acculturation, an important dimension of variation within the diverse U.S. Hispanic community, was defined by high or low scores on the Americanism subscale of the Bidimensional Acculturation Scale. Differential item functioning for each of the 20 CES-D items between more acculturated and less acculturated women was tested using ordinal logistic regression. No items on the Depressed Affect, Somatic Activity, or Positive Affect subscales showed meaningful differential item functioning, but 1 item ("People were unfriendly") on the Interpersonal subscale had small results (R = 1.1%). The majority of CES-D items performed similarly for Spanish-speaking Hispanic women with high and low acculturation. Less acculturated women responded more positively to "People were unfriendly," despite having an equivalent level of depression, than did more acculturated women. Possibilities for improving this item are proposed.
Electronic Quality of Life Assessment Using Computer-Adaptive Testing
2016-01-01
Background Quality of life (QoL) questionnaires are desirable for clinical practice but can be time-consuming to administer and interpret, making their widespread adoption difficult. Objective Our aim was to assess the performance of the World Health Organization Quality of Life (WHOQOL)-100 questionnaire as four item banks to facilitate adaptive testing using simulated computer adaptive tests (CATs) for physical, psychological, social, and environmental QoL. Methods We used data from the UK WHOQOL-100 questionnaire (N=320) to calibrate item banks using item response theory, which included psychometric assessments of differential item functioning, local dependency, unidimensionality, and reliability. We simulated CATs to assess the number of items administered before prespecified levels of reliability was met. Results The item banks (40 items) all displayed good model fit (P>.01) and were unidimensional (fewer than 5% of t tests significant), reliable (Person Separation Index>.70), and free from differential item functioning (no significant analysis of variance interaction) or local dependency (residual correlations < +.20). When matched for reliability, the item banks were between 45% and 75% shorter than paper-based WHOQOL measures. Across the four domains, a high standard of reliability (alpha>.90) could be gained with a median of 9 items. Conclusions Using CAT, simulated assessments were as reliable as paper-based forms of the WHOQOL with a fraction of the number of items. These properties suggest that these item banks are suitable for computerized adaptive assessment. These item banks have the potential for international development using existing alternative language versions of the WHOQOL items. PMID:27694100
Teresi, Jeanne A.; Ocepek-Welikson, Katja; Kleinman, Marjorie; Ramirez, Mildred; Kim, Giyeon
2017-01-01
Short form measures from the Patient Reported Outcomes Measurement Information System® (PROMIS®) are used widely. The present study was among the first to examine differential item functioning (DIF) in the PROMIS Depression short form scales in a sample of over 5000 racially/ethnically diverse patients with cancer. DIF analyses were conducted across different racial/ethnic, educational, age, gender and language groups. Methods DIF hypotheses, generated by content experts, informed the evaluation of the DIF analyses. The graded item response theory (IRT) model was used to evaluate the five-level ordinal items. The primary tests of DIF were Wald tests; sensitivity analyses were conducted using the IRT ordinal logistic regression procedure. Magnitude was evaluated using expected item score functions, and the non-compensatory differential item functioning (NCDIF) and T1 indexes, both based on group differences in the item curves. Aggregate impact was evaluated with expected scale score (test) response functions; individual impact was assessed through examination of differences in DIF adjusted and unadjusted depression estimates. Results Many items evidenced DIF; however, only a few had slightly elevated magnitude. No items evidenced salient DIF with respect to NCDIF and the scale-level impact was minimal for all group comparisons. The following short form items might be targeted for further study because they were also hypothesized to evidence DIF. One item showed slightly higher magnitude of DIF for age: nothing to look forward to; conditional on depression, this item was more likely to be endorsed in the depressed direction by individuals in older groups as contrasted with the cohort aged 21 to 49. This item was also hypothesized to show age DIF. Only one item (failure) showed DIF of slightly higher magnitude (just above threshold) for Whites vs. Asians/Pacific Islanders in the direction of higher likelihood of endorsement for Asians/Pacific Islanders. This item was also hypothesized to show DIF for minority groups. The impact of DIF was negligible. Conditional on depression, the items, worthless and hopeless were more likely to be endorsed in the depressed direction by respondents with less than high school education vs. those with a graduate degree; the magnitude of DIF was slightly above the T1 threshold, but not that of NCDIF. These items were also hypothesized to show DIF in the direction of more feelings of worthlessness by groups with lower education. While the magnitude and aggregate impact of DIF was small, in a few instances, individual impact was observed. Information provided was relatively high, particularly in the middle upper (depressed) tail of the distribution. Reliability estimates were high (> 0.90) across all studied groups, regardless of estimation method. Conclusions This was the first study to evaluate measurement equivalence of the PROMIS Depression short forms across large samples of ethnically diverse groups. There were few items with DIF, and none of high magnitude, thus supporting the use of PROMIS Depression short form measures across such groups. These results could be informative for those using the short forms in minority populations or clinicians evaluating individuals with the depression short forms. PMID:28553573
Item response analysis of the Positive and Negative Syndrome Scale
Santor, Darcy A; Ascher-Svanum, Haya; Lindenmayer, Jean-Pierre; Obenchain, Robert L
2007-01-01
Background Statistical models based on item response theory were used to examine (a) the performance of individual Positive and Negative Syndrome Scale (PANSS) items and their options, (b) the effectiveness of various subscales to discriminate among individual differences in symptom severity, and (c) the appropriateness of cutoff scores recently recommended by Andreasen and her colleagues (2005) to establish symptom remission. Methods Option characteristic curves were estimated using a nonparametric item response model to examine the probability of endorsing each of 7 options within each of 30 PANSS items as a function of standardized, overall symptom severity. Our data were baseline PANSS scores from 9205 patients with schizophrenia or schizoaffective disorder who were enrolled between 1995 and 2003 in either a large, naturalistic, observational study or else in 1 of 12 randomized, double-blind, clinical trials comparing olanzapine to other antipsychotic drugs. Results Our analyses show that the majority of items forming the Positive and Negative subscales of the PANSS perform very well. We also identified key areas for improvement or revision in items and options within the General Psychopathology subscale. The Positive and Negative subscale scores are not only more discriminating of individual differences in symptom severity than the General Psychopathology subscale score, but are also more efficient on average than the 30-item total score. Of the 8 items recently recommended to establish symptom remission, 1 performed markedly different from the 7 others and should either be deleted or rescored requiring that patients achieve a lower score of 2 (rather than 3) to signal remission. Conclusion This first item response analysis of the PANSS supports its sound psychometric properties; most PANSS items were either very good or good at assessing overall severity of illness. These analyses did identify some items which might be further improved for measuring individual severity differences or for defining remission thresholds. Findings also suggest that the Positive and Negative subscales are more sensitive to change than the PANSS total score and, thus, may constitute a "mini PANSS" that may be more reliable, require shorter administration and training time, and possibly reduce sample sizes needed for future research. PMID:18005449
Self-reported walking ability predicts functional mobility performance in frail older adults.
Alexander, N B; Guire, K E; Thelen, D G; Ashton-Miller, J A; Schultz, A B; Grunawalt, J C; Giordani, B
2000-11-01
To determine how self-reported physical function relates to performance in each of three mobility domains: walking, stance maintenance, and rising from chairs. Cross-sectional analysis of older adults. University-based laboratory and community-based congregate housing facilities. Two hundred twenty-one older adults (mean age, 79.9 years; range, 60-102 years) without clinical evidence of dementia (mean Folstein Mini-Mental State score, 28; range, 24-30). We compared the responses of these older adults on a questionnaire battery used by the Established Populations for the Epidemiologic Study of the Elderly (EPESE) project, to performance on mobility tasks of graded difficulty. Responses to the EPESE battery included: (1) whether assistance was required to perform seven Katz activities of daily living (ADL) items, specifically with walking and transferring; (2) three Rosow-Breslau items, including the ability to walk up stairs and walk a half mile; and (3) five Nagi items, including difficulty stooping, reaching, and lifting objects. The performance measures included the ability to perform, and time taken to perform, tasks in three summary score domains: (1) walking ("Walking," seven tasks, including walking with an assistive device, turning, stair climbing, tandem walking); (2) stance maintenance ("Stance," six tasks, including unipedal, bipedal, tandem, and maximum lean); and (3) chair rise ("Chair Rise," six tasks, including rising from a variety of seat heights with and without the use of hands for assistance). A total score combines scores in each Walking, Stance, and Chair Rise domain. We also analyzed how cognitive/ behavioral factors such as depression and self-efficacy related to the residuals from the self-report and performance-based ANOVA models. Rosow-Breslau items have the strongest relationship with the three performance domains, Walking, Stance, and Chair Rise (eta-squared ranging from 0.21 to 0.44). These three performance domains are as strongly related to one Katz ADL item, walking (eta-squared ranging from 0.15 to 0.33) as all of the Katz ADL items combined (eta-squared ranging from 0.21 to 0.35). Tests of problem solving and psychomotor speed, the Trails A and Trails B tests, are significantly correlated with the residuals from the self-report and performance-based ANOVA models. Compared with the rest of the EPESE self-report items, self-report items related to walking (such as Katz walking and Rosow-Breslau items) are better predictors of functional mobility performance on tasks involving walking, stance maintenance, and rising from chairs. Compared with other self-report items, self-reported walking ability may be the best predictor of overall functional mobility.
1981-11-01
i very little effort has been put upon the model validation, which is essential in any scientific research. T’-- -rientation we aim at in the present...better than the former to the target function. This implies that, although the interval of ability e of our interest is even a little smaller than [-3.0...approaches turned out to be similar, with some deviations, i.e., some of them are a little closer to the theoretical density function, and some of
Profile-likelihood Confidence Intervals in Item Response Theory Models.
Chalmers, R Philip; Pek, Jolynn; Liu, Yang
2017-01-01
Confidence intervals (CIs) are fundamental inferential devices which quantify the sampling variability of parameter estimates. In item response theory, CIs have been primarily obtained from large-sample Wald-type approaches based on standard error estimates, derived from the observed or expected information matrix, after parameters have been estimated via maximum likelihood. An alternative approach to constructing CIs is to quantify sampling variability directly from the likelihood function with a technique known as profile-likelihood confidence intervals (PL CIs). In this article, we introduce PL CIs for item response theory models, compare PL CIs to classical large-sample Wald-type CIs, and demonstrate important distinctions among these CIs. CIs are then constructed for parameters directly estimated in the specified model and for transformed parameters which are often obtained post-estimation. Monte Carlo simulation results suggest that PL CIs perform consistently better than Wald-type CIs for both non-transformed and transformed parameters.
On the Complexity of Item Response Theory Models.
Bonifay, Wes; Cai, Li
2017-01-01
Complexity in item response theory (IRT) has traditionally been quantified by simply counting the number of freely estimated parameters in the model. However, complexity is also contingent upon the functional form of the model. We examined four popular IRT models-exploratory factor analytic, bifactor, DINA, and DINO-with different functional forms but the same number of free parameters. In comparison, a simpler (unidimensional 3PL) model was specified such that it had 1 more parameter than the previous models. All models were then evaluated according to the minimum description length principle. Specifically, each model was fit to 1,000 data sets that were randomly and uniformly sampled from the complete data space and then assessed using global and item-level fit and diagnostic measures. The findings revealed that the factor analytic and bifactor models possess a strong tendency to fit any possible data. The unidimensional 3PL model displayed minimal fitting propensity, despite the fact that it included an additional free parameter. The DINA and DINO models did not demonstrate a proclivity to fit any possible data, but they did fit well to distinct data patterns. Applied researchers and psychometricians should therefore consider functional form-and not goodness-of-fit alone-when selecting an IRT model.
Mulcahey, M J; Merenda, Lisa; Tian, Feng; Kozin, Scott; James, Michelle; Gogola, Gloria; Ni, Pengsheng
2013-01-01
This study examined the psychometric properties of item pools relevant to upper-extremity function and activity performance and evaluated simulated 5-, 10-, and 15-item computer adaptive tests (CATs). In a multicenter, cross-sectional study of 200 children and youth with brachial plexus birth palsy (BPBP), parents responded to upper-extremity (n = 52) and activity (n = 34) items using a 5-point response scale. We used confirmatory and exploratory factor analysis, ordinal logistic regression, item maps, and standard errors to evaluate the psychometric properties of the item banks. Validity was evaluated using analysis of variance and Pearson correlation coefficients. Results show that the two item pools have acceptable model fit, scaled well for children and youth with BPBP, and had good validity, content range, and precision. Simulated CATs performed comparably to the full item banks, suggesting that a reduced number of items provide similar information to the entire set of items. Copyright © 2013 by the American Occupational Therapy Association, Inc.
ERIC Educational Resources Information Center
Ögretmen, Tuncay
2015-01-01
The purpose of this study is to carry out differential item functioning (DIF) analysis for content areas of a reading comprehension subtest using four area indices within Item Response Theory (IRT) framework. The differences in the magnitudes of the area indices were compared based on the subject areas. The DIF analysis was carried out across…
The Work Instability Scale for Rheumatoid Arthritis (RA-WIS): Does it work in osteoarthritis?
Tang, Kenneth; Beaton, Dorcas E; Lacaille, Diane; Gignac, Monique A M; Zhang, Wei; Anis, Aslam H; Bombardier, Claire
2010-09-01
To validate the 23-item Work Instability Scale for Rheumatoid Arthritis (RA-WIS) for use in osteoarthritis (OA) using both classical test theory and item response theory approaches. Baseline and 12-month follow-up data were collected from workers with OA recruited from community and clinical settings (n = 130). Fit of RA-WIS data to the Rasch model was evaluated by item- and person-fit statistics (size of residual, chi-sq), assessments of differential item functioning, and tests of unidimensionality and local independence. Internal consistency was assessed by KR-20. Convergent construct validity (Spearman r, known-groups) was evaluated against theoretical constructs that assess impact of health on work. Responsiveness to global indicators of change was assessed by standardized response means (SRM) and area under the receiver operating characteristic curves. Data structure of the RA-WIS showed adequate fit to the Rasch model (chi-sq = 83.2, P = 0.03) after addressing local dependency in three item pairs by creating testlets. High internal consistency (KR-20 = 0.93) and convergent validity with work-oriented constructs (|r| = 0.55-0.77) were evident. The RA-WIS correlated most strongly with the concept of illness intrusiveness (r = 0.77) and was highly responsive to changes (SRM = 1.05 [deterioration]; -0.78 [improvement]). Although developed for RA, the RA-WIS is psychometrically sound for OA and demonstrates interval-level property.
Obbarius, Nina; Fischer, Felix; Obbarius, Alexander; Nolte, Sandra; Liegl, Gregor; Rose, Matthias
2018-04-10
To develop the first item bank to measure Stress Resilience (SR) in clinical populations. Qualitative item development resulted in an initial pool of 131 items covering a broad theoretical SR concept. These items were tested in n=521 patients at a psychosomatic outpatient clinic. Exploratory and Confirmatory Factor Analysis (CFA), as well as other state-of-the-art item analyses and IRT were used for item evaluation and calibration of the final item bank. Out of the initial item pool of 131 items, we excluded 64 items (54 factor loading <.5, 4 residual correlations >.3, 2 non-discriminative Item Response Curves, 4 Differential Item Functioning). The final set of 67 items indicated sufficient model fit in CFA and IRT analyses. Additionally, a 10-item short form with high measurement precision (SE≤.32 in a theta range between -1.8 and +1.5) was derived. Both the SR item bank and the SR short form were highly correlated with an existing static legacy tool (Connor-Davidson Resilience Scale). The final SR item bank and 10-item short form showed good psychometric properties. When further validated, they will be ready to be used within a framework of Computer-Adaptive Tests for a comprehensive assessment of the Stress-Construct. Copyright © 2018. Published by Elsevier Inc.
Tulsky, David S; Kisala, Pamela A; Tate, Denise G; Spungen, Ann M; Kirshblum, Steven C
2015-05-01
To describe the development and psychometric properties of the Spinal Cord Injury--Quality of Life (SCI-QOL) Bladder Management Difficulties and Bowel Management Difficulties item banks and Bladder Complications scale. Using a mixed-methods design, a pool of items assessing bladder and bowel-related concerns were developed using focus groups with individuals with spinal cord injury (SCI) and SCI clinicians, cognitive interviews, and item response theory (IRT) analytic approaches, including tests of model fit and differential item functioning. Thirty-eight bladder items and 52 bowel items were tested at the University of Michigan, Kessler Foundation Research Center, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters VA Medical Center, Bronx, NY. Seven hundred fifty-seven adults with traumatic SCI. The final item banks demonstrated unidimensionality (Bladder Management Difficulties CFI=0.965; RMSEA=0.093; Bowel Management Difficulties CFI=0.955; RMSEA=0.078) and acceptable fit to a graded response IRT model. The final calibrated Bladder Management Difficulties bank includes 15 items, and the final Bowel Management Difficulties item bank consists of 26 items. Additionally, 5 items related to urinary tract infections (UTI) did not fit with the larger Bladder Management Difficulties item bank but performed relatively well independently (CFI=0.992, RMSEA=0.050) and were thus retained as a separate scale. The SCI-QOL Bladder Management Difficulties and Bowel Management Difficulties item banks are psychometrically robust and are available as computer adaptive tests or short forms. The SCI-QOL Bladder Complications scale is a brief, fixed-length outcomes instrument for individuals with a UTI.
Chuang, I-Ching; Lin, Keh-Chung; Wu, Ching-Yi; Hsieh, Yu-Wei; Liu, Chien-Ting; Chen, Chia-Ling
2017-10-01
The Motor Activity Log (MAL) and Lower-Functioning MAL (LF-MAL) are used to assess the amount of use of the more impaired arm and the quality of movement during activities in real-life situations for patients with stroke. This study used Rasch analysis to examine the psychometric properties of the MAL and LF-MAL in patients with stroke. This is a methodological study. The MAL and LF-MAL include 2 scales: the amount of use (AOU) and the quality of movement (QOM). Rasch analysis was used to examine the unidimensionality, item difficulty hierarchy, targeting, reliability, and differential item functioning (DIF) of the MAL and LF-MAL. A total of 403 patients with mild or moderate stroke completed the MAL, and 134 patients with moderate/severe stroke finished the LF-MAL. Evidence of disordered thresholds and poor model fit were found both in the MAL and LF-MAL. After the rating categories were collapsed and misfit items were deleted, all items of the revised MAL and LF-MAL exhibited ordering and constituted unidimensional constructs. The person-item map showed that these assessments were difficult for our participants. The person reliability coefficients of these assessments ranged from .79 to .87. No items in the revised MAL and LF-MAL exhibited bias related to patients' characteristics. One limitation is the recruited patients, who have relatively high-functioning ability in the LF-MAL. The revised MAL and LF-MAL are unidimensional scales and have good reliability. The categories function well, and responses to all items in these assessments are not biased by patients' characteristics. However, the revised MAL and LF-MAL both showed floor effect. Further study might add easy items for assessing the performance of activity in real-life situations for patients with stroke. © 2017 American Physical Therapy Association
Edjolo, Arlette; Proust-Lima, Cécile; Delva, Fleur; Dartigues, Jean-François; Pérès, Karine
2016-02-15
We aimed to describe the hierarchical structure of Instrumental Activities of Daily Living (IADL) and basic Activities of Daily Living (ADL) and trajectories of dependency before death in an elderly population using item response theory methodology. Data were obtained from a population-based French cohort study, the Personnes Agées QUID (PAQUID) Study, of persons aged ≥65 years at baseline in 1988 who were recruited from 75 randomly selected areas in Gironde and Dordogne. We evaluated IADL and ADL data collected at home every 2-3 years over a 24-year period (1988-2012) for 3,238 deceased participants (43.9% men). We used a longitudinal item response theory model to investigate the item sequence of 11 IADL and ADL combined into a single scale and functional trajectories adjusted for education, sex, and age at death. The findings confirmed the earliest losses in IADL (shopping, transporting, finances) at the partial limitation level, and then an overlapping of concomitant IADL and ADL, with bathing and dressing being the earliest ADL losses, and finally total losses for toileting, continence, eating, and transferring. Functional trajectories were sex-specific, with a benefit of high education that persisted until death in men but was only transient in women. An in-depth understanding of this sequence provides an early warning of functional decline for better adaptation of medical and social care in the elderly. © The Author 2016. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Should the SCOPA-COG be modified? A Rasch analysis perspective.
Forjaz, M J; Frades-Payo, B; Rodriguez-Blazquez, C; Ayala, A; Martinez-Martin, P
2010-02-01
The SCales for Outcomes in PArkinson's disease-Cognition (SCOPA-COG) is a specific measure of cognitive function for Parkinson's disease (PD) patients. Previous studies, under the frame of the classic test theory, indicate satisfactory psychometric properties. The Rasch model, an item response theory approach, provides new information about the scale, as well as results in a linear scale. This study aims at analysing the SCOPA-COG according to the Rasch model and, on the basis of results, suggesting modification to the SCOPA-COG. Fit to the Rasch model was analysed using a sample of 384 PD patients. A good fit was obtained after rescoring for disordered thresholds. The person separation index, a reliability measure, was 0.83. Differential item functioning was observed by age for three items and by gender for one item. The SCOPA-COG is a unidimensional measure of global cognitive function in PD patients, with good scale targeting and no empirical evidence for use of the subscale scores. Its adequate reliability and internal construct validity were supported. The SCOPA-COG, with the proposed scoring scheme, generates true linear interval scores.
Developing an item bank and short forms that assess the impact of asthma on quality of life.
Stucky, Brian D; Edelen, Maria Orlando; Sherbourne, Cathy D; Eberhart, Nicole K; Lara, Marielena
2014-02-01
The present work describes the process of developing an item bank and short forms that measure the impact of asthma on quality of life (QoL) that avoids confounding QoL with asthma symptomatology and functional impairment. Using a diverse national sample of adults with asthma (N = 2032) we conducted exploratory and confirmatory factor analyses, and item response theory and differential item functioning analyses to develop a 65-item unidimensional item bank and separate short form assessments. A psychometric evaluation of the RAND Impact of Asthma on QoL item bank (RAND-IAQL) suggests that though the concept of asthma impact on QoL is multi-faceted, it may be measured as a single underlying construct. The performance of the bank was then evaluated with a real-data simulated computer adaptive test. From the RAND-IAQL item bank we then developed two short forms consisting of 4 and 12 items (reliability = 0.86 and 0.93, respectively). A real-data simulated computer adaptive test suggests that as few as 4-5 items from the bank are needed to obtain highly precise scores. Preliminary validity results indicate that the RAND-IAQL measures distinguish between levels of asthma control. To measure the impact of asthma on QoL, users of these items may choose from two highly reliable short forms, computer adaptive test administration, or content-specific subsets of items from the bank tailored to their specific needs. Copyright © 2013 Elsevier Ltd. All rights reserved.
Magis, David
2014-11-01
In item response theory, the classical estimators of ability are highly sensitive to response disturbances and can return strongly biased estimates of the true underlying ability level. Robust methods were introduced to lessen the impact of such aberrant responses on the estimation process. The computation of asymptotic (i.e., large-sample) standard errors (ASE) for these robust estimators, however, has not yet been fully considered. This paper focuses on a broad class of robust ability estimators, defined by an appropriate selection of the weight function and the residual measure, for which the ASE is derived from the theory of estimating equations. The maximum likelihood (ML) and the robust estimators, together with their estimated ASEs, are then compared in a simulation study by generating random guessing disturbances. It is concluded that both the estimators and their ASE perform similarly in the absence of random guessing, while the robust estimator and its estimated ASE are less biased and outperform their ML counterparts in the presence of random guessing with large impact on the item response process. © 2013 The British Psychological Society.
College Board Response to "Harvard Educational Review" Article by Santelices and Wilson
ERIC Educational Resources Information Center
College Board, 2010
2010-01-01
This is the College Board's response to a research article by Drs. Maria Veronica Santelices and Mark Wilson in the Harvard Educational Review, entitled "Unfair Treatment? The Case of Freedle, the SAT, and the Standardization Approach to Differential Item Functioning" (see EJ930622).
2000-12-01
A SKIP FLAG INDICATING THE RESULT OF CHECKING THE RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP...RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP PATTERN. SEE TABLE D-5, NOTE 2, IN APPENDIX D. G-52...RESULT OF CHECKING THE RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP PATTERN. SEE TABLE D-5
Robust Measurement via A Fused Latent and Graphical Item Response Theory Model.
Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Ying, Zhiliang
2018-03-12
Item response theory (IRT) plays an important role in psychological and educational measurement. Unlike the classical testing theory, IRT models aggregate the item level information, yielding more accurate measurements. Most IRT models assume local independence, an assumption not likely to be satisfied in practice, especially when the number of items is large. Results in the literature and simulation studies in this paper reveal that misspecifying the local independence assumption may result in inaccurate measurements and differential item functioning. To provide more robust measurements, we propose an integrated approach by adding a graphical component to a multidimensional IRT model that can offset the effect of unknown local dependence. The new model contains a confirmatory latent variable component, which measures the targeted latent traits, and a graphical component, which captures the local dependence. An efficient proximal algorithm is proposed for the parameter estimation and structure learning of the local dependence. This approach can substantially improve the measurement, given no prior information on the local dependence structure. The model can be applied to measure both a unidimensional latent trait and multidimensional latent traits.
Fajrianthi; Zein, Rizqy Amelia
2017-01-01
This study aimed to develop an emotional intelligence (EI) test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA]) was designed to measure three EI domains: 1) emotional appraisal, 2) emotional recognition, and 3) emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT) approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA) and item response theory (IRT) were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF) was 3.414 (ability level = 0) for subset 1, 12.183 for subset 2 (ability level = -2), and 2.398 for subset 3 (level of ability = -2). It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA's item analysis and dimensionality test of each TKEA subset.
Huang, Chun-Jen; Chen, Cheng-Chung
2018-01-01
Abstract Background The burden of major depressive disorder includes suffering due to symptom severity, functional impairment, and quality of life deficits. The aim of this study was to compare the differences between electroconvulsive therapy and pharmacotherapy in reducing such burdens. Methods This was a pooled analysis study including 2 open-label trials for major depressive disorder inpatients receiving either standard bitemporal and modified electroconvulsive therapy with a maximum of 12 sessions or 20 mg/d of fluoxetine for 6 weeks. Symptom severity, functioning, and quality of life were assessed using the 17-item Hamilton Rating Scale for Depression, the Modified Work and Social Adjustment Scale, and SF-36. Side effects following treatment, including subjective memory impairment, nausea/vomiting, and headache, were recorded. The differences between these 2 groups in 17-item Hamilton Rating Scale for Depression, Modified Work and Social Adjustment Scale, quality of life, side effects, and time to response (at least a 50% reduction of 17-item Hamilton Rating Scale for Depression) and remission (17-item Hamilton Rating Scale for Depression ≤7) following treatment were analyzed. Results Electroconvulsive therapy (n=116) showed a significantly greater reduction in 17-item Hamilton Rating Scale for Depression, Modified Work and Social Adjustment Scale, and quality of life deficits and had significantly shorter time to response/remission than fluoxetine (n=126). However, the electroconvulsive therapy group was more likely to experience subjective memory impairment and headache. Conclusions Compared with fluoxetine, electroconvulsive therapy was more effective in alleviating the burden of major depressive disorder and had a substantially increased speed of response/remission in the acute phase. Increased education and information about electroconvulsive therapy for clinicians, patients, and their families and the general public is warranted. PMID:29228200
Hart, Dennis L; Werneke, Mark W; George, Steven Z; Matheson, James W; Wang, Ying-Chih; Cook, Karon F; Mioduski, Jerome E; Choi, Seung W
2009-08-01
Screening people for elevated levels of fear-avoidance beliefs is uncommon, but elevated levels of fear could worsen outcomes. Developing short screening tools might reduce the data collection burden and facilitate screening, which could prompt further testing or management strategy modifications to improve outcomes. The purpose of this study was to develop efficient yet accurate screening methods for identifying elevated levels of fear-avoidance beliefs regarding work or physical activities in people receiving outpatient rehabilitation. A secondary analysis of data collected prospectively from people with a variety of common neuromusculoskeletal diagnoses was conducted. Intake Fear-Avoidance Beliefs Questionnaire (FABQ) data were collected from 17,804 people who had common neuromusculoskeletal conditions and were receiving outpatient rehabilitation in 121 clinics in 26 states (in the United States). Item response theory (IRT) methods were used to analyze the FABQ data, with particular emphasis on differential item functioning among clinically logical groups of subjects, and to identify screening items. The accuracy of screening items for identifying subjects with elevated levels of fear was assessed with receiver operating characteristic analyses. Three items for fear of physical activities and 10 items for fear of work activities represented unidimensional scales with adequate IRT model fit. Differential item functioning was negligible for variables known to affect functional status outcomes: sex, age, symptom acuity, surgical history, pain intensity, condition severity, and impairment. Items that provided maximum information at the median for the FABQ scales were selected as screening items to dichotomize subjects by high versus low levels of fear. The accuracy of the screening items was supported for both scales. This study represents a retrospective analysis, which should be replicated using prospective designs. Future prospective studies should assess the reliability and validity of using one FABQ item to screen people for high levels of fear-avoidance beliefs. The lack of differential item functioning in the FABQ scales in the sample tested in this study suggested that FABQ screening could be useful in routine clinical practice and allowed the development of single-item screening for fear-avoidance beliefs that accurately identified subjects with elevated levels of fear. Because screening was accurate and efficient, single IRT-based FABQ screening items are recommended to facilitate improved evaluation and care of heterogeneous populations of people receiving outpatient rehabilitation.
Item response theory and the measurement of motor behavior.
Safrit, M J; Cohen, A S; Costa, M G
1989-12-01
Item response theory (IRT) has been the focus of intense research and development activity in educational and psychological measurement during the past decade. Because this theory can provide more precise information about test items than other theories usually used in measuring motor behavior, the application of IRT in physical education and exercise science merits investigation. In IRT, the difficulty level of each item (e.g., trial or task) can be estimated and placed on the same scale as the ability of the examinee. Using this information, the test developer can determine the ability levels at which the test functions best. Equating the scores of individuals on two or more items or tests can be handled efficiently by applying IRT. The precision of the identification of performance standards in a mastery test context can be enhanced, as can adaptive testing procedures. In this tutorial, several potential benefits of applying IRT to the measurement of motor behavior were described. An example is provided using bowling data and applying the graded-response form of the Rasch IRT model. The data were calibrated and the goodness of fit was examined. This analysis is described in a step-by-step approach. Limitations to using an IRT model with a test consisting of repeated measures were noted.
An item response curves analysis of the Force Concept Inventory
NASA Astrophysics Data System (ADS)
Morris, Gary A.; Harshman, Nathan; Branum-Martin, Lee; Mazur, Eric; Mzoughi, Taha; Baker, Stephen D.
2012-09-01
Several years ago, we introduced the idea of item response curves (IRC), a simplistic form of item response theory (IRT), to the physics education research community as a way to examine item performance on diagnostic instruments such as the Force Concept Inventory (FCI). We noted that a full-blown analysis using IRT would be a next logical step, which several authors have since taken. In this paper, we show that our simple approach not only yields similar conclusions in the analysis of the performance of items on the FCI to the more sophisticated and complex IRT analyses but also permits additional insights by characterizing both the correct and incorrect answer choices. Our IRC approach can be applied to a variety of multiple-choice assessments but, as applied to a carefully designed instrument such as the FCI, allows us to probe student understanding as a function of ability level through an examination of each answer choice. We imagine that physics teachers could use IRC analysis to identify prominent misconceptions and tailor their instruction to combat those misconceptions, fulfilling the FCI authors' original intentions for its use. Furthermore, the IRC analysis can assist test designers to improve their assessments by identifying nonfunctioning distractors that can be replaced with distractors attractive to students at various ability levels.
Khan, Anzalee; Lindenmayer, Jean-Pierre; Opler, Mark; Yavorsky, Christian; Rothman, Brian; Lucic, Luka
2013-10-01
Debate persists with regard to how best to categorize the syndromal dimension of negative symptoms in schizophrenia. The aim was to first review published Principle Components Analysis (PCA) of the PANSS, and extract items most frequently included in the negative domain, and secondly, to examine the quality of items using Item Response Theory (IRT) to select items that best represent a measurable dimension (or dimensions) of negative symptoms. First, 22 factor analyses and PCA met were included. Second, using a large dataset (n=7187) of participants in clinical trials with chronic schizophrenia, we extracted items loading on one or more PCA. Third, items not loading with a value of ≥ 0.5, or loading on more than one component with values of ≥ 0.5 were discarded. Fourth, resulting items were included in a non-parametric IRT and retained based on Option Characteristic Curves (OCCs) and Item Characteristic Curves (ICCs). 15 items loaded on a negative domain in at least one study, with Emotional Withdrawal loading on all studies. Non-parametric IRT retained nine items as an Integrated Negative Factor: Emotional Withdrawal, Blunted Affect, Passive/Apathetic Social Withdrawal, Poor Rapport, Lack of Spontaneity/Conversation Flow, Active Social Avoidance, Disturbance of Volition, Stereotyped Thinking and Difficulty in Abstract Thinking. This is the first study to use a psychometric IRT process to arrive at a set of negative symptom items. Future steps will include further examination of these nine items in terms of their stability, sensitivity to change, and correlations with functional and cognitive outcomes. © 2013 Elsevier B.V. All rights reserved.
Dikken, Jeroen; Hoogerduijn, Jita G; Kruitwagen, Cas; Schuurmans, Marieke J
2016-11-01
To assess the content validity and psychometric characteristics of the Knowledge about Older Patients Quiz (KOP-Q), which measures nurses' knowledge regarding older hospitalized adults and their certainty regarding this knowledge. Cross-sectional. Content validity: general hospitals. Psychometric characteristics: nursing school and general hospitals in the Netherlands. Content validity: 12 nurse specialists in geriatrics. Psychometric characteristics: 107 first-year and 78 final-year bachelor of nursing students, 148 registered nurses, and 20 nurse specialists in geriatrics. Content validity: The nurse specialists rated each item of the initial KOP-Q (52 items) on relevance. Ratings were used to calculate Item-Content Validity Index and average Scale-Content Validity Index (S-CVI/ave) scores. Items with insufficient content validity were removed. Psychometric characteristics: Ratings of students, nurses, and nurse specialists were used to test for different item functioning (DIF) and unidimensionality before item characteristics (discrimination and difficulty) were examined using Item Response Theory. Finally, norm references were calculated and nomological validity was assessed. Content validity: Forty-three items remained after assessing content validity (S-CVI/ave = 0.90). Psychometric characteristics: Of the 43 items, two demonstrating ceiling effects and 11 distorting ability estimates (DIF) were subsequently excluded. Item characteristics were assessed for the remaining 30 items, all of which demonstrated good discrimination and difficulty parameters. Knowledge was positively correlated with certainty about this knowledge. The final 30-item KOP-Q is a valid, psychometrically sound, comprehensive instrument that can be used to assess the knowledge of nursing students, hospital nurses, and nurse specialists in geriatrics regarding older hospitalized adults. It can identify knowledge and certainty deficits for research purposes or serve as a tool in educational or quality improvement programs. © 2016, Copyright the Authors Journal compilation © 2016, The American Geriatrics Society.
Modeling Fan Effects on the Time Course of Associative Recognition
Schneider, Darryl W.; Anderson, John R.
2011-01-01
We investigated the time course of associative recognition using the response signal procedure, whereby a stimulus is presented and followed after a variable lag by a signal indicating that an immediate response is required. More specifically, we examined the effects of associative fan (the number of associations that an item has with other items in memory) on speed–accuracy tradeoff functions obtained in a previous response signal experiment involving briefly studied materials and in a new experiment involving well-learned materials. High fan lowered asymptotic accuracy or the rate of rise in accuracy across lags, or both. We developed an Adaptive Control of Thought–Rational (ACT-R) model for the response signal procedure to explain these effects. The model assumes that high fan results in weak associative activation that slows memory retrieval, thereby decreasing the probability that retrieval finishes in time and producing a speed–accuracy tradeoff function. The ACT-R model provided an excellent account of the data, yielding quantitative fits that were as good as those of the best descriptive model for response signal data. PMID:22197797
Nielsen, Marie Germund; Ørnbøl, Eva; Vestergaard, Mogens; Bech, Per; Christensen, Kaj Sparle
2017-06-01
We aimed to assess the measurement properties of the ten-item Major Depression Inventory when used on clinical suspicion in general practice by performing a Rasch analysis. General practitioners asked consecutive persons to respond to the web-based Major Depression Inventory on clinical suspicion of depression. We included 22 practices and 245 persons. Rasch analysis was performed using RUMM2030 software. The Rasch model fit suggests that all items contribute to a single underlying trait (defined as internal construct validity). Mokken analysis was used to test dimensionality and scalability. Our Rasch analysis showed misfit concerning the sleep and appetite items (items 9 and 10). The response categories were disordered for eight items. After modifying the original six-point to a four-point scoring system for all items, we achieved ordered response categories for all ten items. The person separation reliability was acceptable (0.82) for the initial model. Dimensionality testing did not support combining the ten items to create a total score. The scale appeared to be well targeted to this clinical sample. No significant differential item functioning was observed for gender, age, work status and education. The Rasch and Mokken analyses revealed two dimensions, but the Major Depression Inventory showed fit to one scale if items 9 and 10 were excluded. Our study indicated scalability problems in the current version of the Major Depression Inventory. The conducted analysis revealed better statistical fit when items 9 and 10 were excluded. Copyright © 2017 Elsevier Inc. All rights reserved.
78 FR 18349 - Federal Acquisition Regulation; Information Collection; Commercial Item Acquisitions
Federal Register 2010, 2011, 2012, 2013, 2014
2013-03-26
... Certification Application (ORCA) function of the System for Award Management (SAM) database. Because an offeror..., use of the ORCA function by prospective contractors decreases the number of responses per respondent per year for purposes of this information collection. ORCA was developed to eliminate the...
The Curiosity and Exploration Inventory-II: Development, Factor Structure, and Psychometrics
Kashdan, Todd B.; Gallagher, Matthew W.; Silvia, Paul J.; Winterstein, Beate P.; Breen, William E.; Terhar, Daniel; Steger, Michael F.
2009-01-01
Given curiosity’s fundamental role in motivation, learning, and well-being, we sought to refine the measurement of trait curiosity with an improved version of the Curiosity and Exploration Inventory (CEI; Kashdan, Rose, & Fincham, 2004). A preliminary pool of 36 items was administered to 311 undergraduate students, who also completed measures of emotion, emotion regulation, personality, and well-being. Factor analyses indicated a two factor model—motivation to seek out knowledge and new experiences (Stretching; 5 items) and a willingness to embrace the novel, uncertain, and unpredictable nature of everyday life (Embracing; 5 items). In two additional samples (ns = 150 and 119), we cross-validated this factor structure and provided initial evidence for construct validity. This includes positive correlations with personal growth, openness to experience, autonomy, purpose in life, self-acceptance, psychological flexibility, positive affect, and positive social relations, among others. Applying item response theory (IRT) to these samples (n = 578), we showed that the items have good discrimination and a desirable breadth of difficulty. The item information functions and test information function were centered near zero, indicating that the scale assesses the mid-range of the latent curiosity trait most reliably. The findings thus far provide good evidence for the psychometric properties of the 10-item CEI-II. PMID:20160913
Jafari, Peyman; Bagheri, Zahra; Hashemi, Seyyedeh Zahra; Shalileh, Keivan
2013-06-06
Limited studies have examined the effect of differential item functioning (DIF) on comparing health related quality of life (HRQoL) scores across child self-reports and parent proxy-reports. This study aims to determine whether parents and children respond differently to the items in the Persian version of the PedsQoLTM 4.0 measure. The PedsQLTM 4.0 Generic Core Scales was completed by 938 child-parent dyads. The graded response model (GRM) was used to detect DIF between parents and children. The IRT analyses were conducted using IRTPRO 2.1.On the whole, our findings showed that 50% (4 out of 8) of the items in the physical subscale and 40% (2 out of 5) in both emotional and school subscales were flagged with DIF. Among the DIF items, 62.5% (5 out of 8) were uniform and the remaining 37.5% (3 out of 8) were non-uniform. Parents and children interpret certain items of the PedsQLTM 4.0 in a different ways, except for the social subscale. Hence, we should be cautious about using parent proxy-report as a substitute for a child's ratings.
ERIC Educational Resources Information Center
van der Linden, Wim J.; Boekkooi-Timminga, Ellen
A "maximin" model for item response theory based test design is proposed. In this model only the relative shape of the target test information function is specified. It serves as a constraint subject to which a linear programming algorithm maximizes the information in the test. In the practice of test construction there may be several…
ERIC Educational Resources Information Center
Choi, Youn-Jeng; Alexeev, Natalia; Cohen, Allan S.
2015-01-01
The purpose of this study was to explore what may be contributing to differences in performance in mathematics on the Trends in International Mathematics and Science Study 2007. This was done by using a mixture item response theory modeling approach to first detect latent classes in the data and then to examine differences in performance on items…
Examing the Validity of the Adapted Alabama Parenting Questionnaire Parent Global Report Version
Maguin, Eugene; Nochajski, Thomas; Dewit, David; Safyer, Andrew
2015-01-01
The purpose of the present study was to comprehensively examine the validity of an adapted version of the parent global report form of the Alabama Parenting Questionnaire (APQ) with respect to its factor structure, relationships with demographic and response style covariates, and differential item functioning (DIF). The APQ was adapted by omitting the Corporal Punishment and the other discipline items. The sample consisted of 674 Canadian and United States families having a 9–12 year old child and at least one parent-figure who had received treatment within the past five years for alcohol problems or met criteria for alcohol abuse or dependence. The primary parent in each family completed the APQ. The four factor CFA model of the four published scales used and the three factor CFA model of those scales from prior research were rejected. Exploratory structural equation modeling was then used. The final three factor model combined the author-defined Involvement and Positive Parenting scales and retained the original Poor Monitoring/Supervision and Inconsistent Discipline scales. However, there were substantial numbers of moderate magnitude cross-loadings and large magnitude residual covariances. Differential item functioning (DIF) was observed for a number of APQ items. Controlling for DIF, response style and demographic variables were related significantly to the factors. PMID:26348028
Molenaar, Dylan; de Boeck, Paul
2018-06-01
In item response theory modeling of responses and response times, it is commonly assumed that the item responses have the same characteristics across the response times. However, heterogeneity might arise in the data if subjects resort to different response processes when solving the test items. These differences may be within-subject effects, that is, a subject might use a certain process on some of the items and a different process with different item characteristics on the other items. If the probability of using one process over the other process depends on the subject's response time, within-subject heterogeneity of the item characteristics across the response times arises. In this paper, the method of response mixture modeling is presented to account for such heterogeneity. Contrary to traditional mixture modeling where the full response vectors are classified, response mixture modeling involves classification of the individual elements in the response vector. In a simulation study, the response mixture model is shown to be viable in terms of parameter recovery. In addition, the response mixture model is applied to a real dataset to illustrate its use in investigating within-subject heterogeneity in the item characteristics across response times.
Kennedy, Vanessa; Abramsohn, Emily; Makelarski, Jennifer; Barber, Rachel; Wroblewski, Kristen; Tenney, Meaghan; Lee, Nita Karnik; Yamada, S. Diane; Lindau, Stacy Tessler
2015-01-01
Objectives To describe patterns of response to, and assess sexual function and activity elicited by, a self-administered assessment incorporated into a new patient intake form for gynecologic oncology consultation. Methods A cross-sectional study of patients presenting to a single urban academic medical center between January 2010 and September 2012. New patients completed a self-administered intake form, including six brief sexual activity and function items. These items, along with abstracted medical record data, were descriptively analyzed. Logistic regression was used to assess the association between sexual activity and function and disease status, adjusting for age. Results Median age was 50 years (range 18–91, N = 499); more than half had a final diagnosis of cancer. Most patients completed all sex-related items on the intake form; 98% answered at least one. Among patients who were sexually active in the prior 12 months (57% with cancer, 64% with benign disease), 52% indicated on the intake form having, during that period, a sexual problem lasting several months or more. Of these, 15% had physician documentation of the sexual problem. Eighteen women were referred for care. Providers reported no patient complaints about the inclusion of sexual items on the intake form. Conclusions Nearly all new patients presenting for gynecologic oncology consultation answered self-administered items to assess sexual activity and function. Further study is needed to determine the role of pretreatment identification of sexual function concerns in improving sexual outcomes associated with cancer diagnosis and treatment. PMID:25582823
Kennedy, Vanessa; Abramsohn, Emily; Makelarski, Jennifer; Barber, Rachel; Wroblewski, Kristen; Tenney, Meaghan; Lee, Nita Karnik; Yamada, S Diane; Lindau, Stacy Tessler
2015-04-01
To describe patterns of response to, and assess sexual function and activity elicited by, a self-administered assessment incorporated into a new patient intake form for gynecologic oncology consultation. A cross-sectional study of patients presenting to a single urban academic medical center between January 2010 and September 2012. New patients completed a self-administered intake form, including six brief sexual activity and function items. These items, along with abstracted medical record data, were descriptively analyzed. Logistic regression was used to assess the association between sexual activity and function and disease status, adjusting for age. Median age was 50 years (range 18-91, N=499); more than half had a final diagnosis of cancer. Most patients completed all sex-related items on the intake form; 98% answered at least one. Among patients who were sexually active in the prior 12 months (57% with cancer, 64% with benign disease), 52% indicated on the intake form having, during that period, a sexual problem lasting several months or more. Of these, 15% had physician documentation of the sexual problem. Eighteen women were referred for care. Providers reported no patient complaints about the inclusion of sexual items on the intake form. Nearly all new patients presenting for gynecologic oncology consultation answered self-administered items to assess sexual activity and function. Further study is needed to determine the role of pre-treatment identification of sexual function concerns in improving sexual outcomes associated with cancer diagnosis and treatment. Copyright © 2015 Elsevier Inc. All rights reserved.
Mamikonian-Zarpas, Ani; Laganá, Luciana
2016-01-01
Functional status is often defined by cumulative scores across indices of independence in performing basic and instrumental activities of daily living (ADL/IADL), but little is known about the unique relationship of each daily activity item with the fall outcome. The purpose of this retrospective study was to examine the level of relative risk for a future fall associated with difficulty with performing various tasks of normal daily functioning among older adults who had fallen at least once in the past 12 months. The sample was comprised of community-dwelling individuals 70 years and older from the 1984–1990 Longitudinal Study of Aging by Kovar, Fitti, and Chyba (1992). Risk analysis was performed on individual items quantifying 6 ADLs and 7 IADLs, as well as 10 items related to mobility limitations. Within a subsample of 1,675 older adults with a history of at least one fall within the past year, the responses of individuals who reported multiple falls were compared to the responses of participants who had a single fall and reported 1) difficulty with walking and/or balance (FRAIL group, n = 413) vs. 2) no difficulty with walking or dizziness (NDW+ND group, n = 415). The items that had the strongest relationships and highest risk ratios for the FRAIL group (which had the highest probabilities for a future fall) included difficulty with: eating (73%); managing money (70%); biting or chewing food (66%); walking a quarter of a mile (65%); using fingers to grasp (65%); and dressing without help (65%). For the NDW+ND group, the most noteworthy items included difficulty with: bathing or showering (79%); managing money (77%); shopping for personal items (75%); walking up 10 steps without rest (72%); difficulty with walking a quarter of a mile (72%); and stooping/crouching/kneeling (70%). These findings suggest that individual items quantifying specific ADLs and IADLs have substantive relationships with the fall outcome among older adults who have difficulty with walking and balance, as well as among older individuals without dizziness or difficulty with walking. Furthermore, the examination of the relationships between items that are related to more challenging activities and the fall outcome revealed that higher functioning older adults who reported difficulty with the 6 items that yielded the highest risk ratios may also be at elevated risk for a fall. PMID:27200366
Answering the call: a tool that measures functional breast cancer literacy.
Williams, Karen Patricia; Templin, Thomas N; Hines, Resche D
2013-01-01
There is a need for health care providers and health care educators to ensure that the messages they communicate are understood. The purpose of this research was to test the reliability and validity, in a culturally diverse sample of women, of a revised Breast Cancer Literacy Assessment Tool (Breast-CLAT) designed to measure functional understanding of breast cancer in English, Spanish, and Arabic. Community health workers verbally administered the 35-item Breast-CLAT to 543 Black, Latina, and Arab American women. A confirmatory factor analysis using a 2-parameter item response theory model was used to test the proposed 3-factor Breast-CLAT (awareness, screening and knowledge, and prevention and control). The confirmatory factor analysis using a 2-parameter item response theory model had a good fit (TLI = .91, RMSEA = .04) to the proposed 3-factor structure. The total scale reliability ranged from .80 for Black participants to .73 for total culturally diverse sample. The three subscales were differentially predictive of family history of cancer. The revised Breast-CLAT scales demonstrated internal consistency reliability and validity in this multiethnic, community-based sample.
ERIC Educational Resources Information Center
Magis, David
2015-01-01
The purpose of this note is to study the equivalence of observed and expected (Fisher) information functions with polytomous item response theory (IRT) models. It is established that observed and expected information functions are equivalent for the class of divide-by-total models (including partial credit, generalized partial credit, rating…
The second version of the L. V. Prasad-functional vision questionnaire.
Gothwal, Vijaya K; Sumalini, Rebecca; Bharani, Seelam; Reddy, Shailaja P; Bagga, Deepak K
2012-11-01
The L. V. Prasad-Functional Vision Questionnaire (LVP-FVQ) was developed using Rasch analysis to assess self-reported difficulties in performing daily tasks in school children with visual impairment (VI) in India. However, the LVP-FVQ has psychometric problems of inadequate measurement precision and lack of detailed assessment of dimensionality. Furthermore, items pertaining to use of technology are lacking. The aim of this study was to present the development and validation of the second version of LVP-FVQ (LVP-FVQ II). Development of LVP-FVQ II involved extracting items from other similar questionnaires (albeit developed for Western populations) and focus group discussions of children with VI and their parents that resulted in a 32-item pilot questionnaire. Overall, six items from the LVP-FVQ were retained. The questionnaire underwent pilot testing in 25 such children, following which a 27-item LVP-FVQ II emerged, and this was administered to 150 children with VI. Response to each item was rated on a three-category scale. Rasch analysis was used to validate the LVP-FVQ II. Rating scale was used by participants as was intended to. Four mobility-related items required deletion, as these did not contribute toward measurement of a single construct, indicating a secondary dimension. Deletion of the four items resulted in the 23-item unidimensional LVP-FVQ II, with good measurement precision, effective targeting of item difficulty to participant ability, and lack of notable differential item functioning. The LVP-FVQ II has high reliability, indicating that it is effectively able to discriminate between visual disability of school children in India, and is valid across age, gender, duration of VI, and location of residence. Given the superior measurement properties and the interval-level scores, the LVP-FVQ II appears to offer advantages over LVP-FVQ in assessment of difficulties in performing daily tasks in this population. It can be adapted for use in other developing countries.
Garcia-Barrera, Mauricio A; Karr, Justin E; Duran, Victor; Direnfeld, Esther; Pineda, David A
2015-12-01
Garcia-Barrera, Kamphaus, and Bandalos (2011) derived a 25-item executive functioning screener from the Behavior Assessment System for Children (BASC), measuring 4 latent executive constructs: problem solving, attentional control, behavioral control, and emotional control. The current study included a cross-cultural examination of this screener in Colombian children with and without attention-deficit/hyperactivity disorder (ADHD). BASC teacher ratings were collected for Colombian children ages 6-11 years (848 healthy children [53% boys] and 155 children with ADHD [76% boys]). To examine the psychometric properties of the screener, a multistep procedure was implemented, including (a) confirmatory factor analysis (CFA) and factorial invariance testing across gender, age group (6-8 years, 9-11 years), and ADHD status to replicate and extend the original derivation; (b) item response theory (IRT) analysis to evaluate the information provided by individual items; and (c) given IRT results, a repeated CFA and invariance testing after the exclusion of 1 item from the problem-solving factor. The 24-item 4-factor model fit was adequate for controls and for ADHD participants. Results support the use of the 24-item executive functioning screener in a cross-cultural context. In turn, in supplemental material, normative data for the Colombian sample are reported along with bilingual guidelines (i.e., Spanish/English) for implementing the screener in clinical practice. Even though the screener is useful when examining executive functions, it was not designed as a diagnostic measure for developmental disorders such as ADHD; as such, it should only inform about status of executive functioning. (c) 2015 APA, all rights reserved).
Kisala, Pamela A.; Victorson, David; Pace, Natalie; Heinemann, Allen W.; Choi, Seung W.; Tulsky, David S.
2015-01-01
Objective To describe the development and psychometric properties of the SCI-QOL Psychological Trauma item bank and short form. Design Using a mixed-methods design, we developed and tested a Psychological Trauma item bank with patient and provider focus groups, cognitive interviews, and item response theory based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. Setting We tested a 31-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Veterans Administration hospital. Participants A total of 716 individuals with SCI completed the trauma items Results The 31 items fit a unidimensional model (CFI=0.952; RMSEA=0.061) and demonstrated good precision (theta range between 0.6 and 2.5). Nine items demonstrated negligible DIF with little impact on score estimates. The final calibrated item bank contains 19 items Conclusion The SCI-QOL Psychological Trauma item bank is a psychometrically robust measurement tool from which a short form and a computer adaptive test (CAT) version are available. PMID:26010967
Conjunctive and Disjunctive Item Response Functions.
1984-10-01
fed set ofvaluesof a, b, AI , B1 A2 2 . 2 A3 , and 13 , the f ’. g ’a. nd h’a in (7) are fied. Equation (7) must still hold for S - e19029e3,..* . Thus...for Item I Is -- b ?(a:1 , b1 ,O) (1 + ’)(I + e4 (22 where a and pi are arbitrary constants. These constants mst be the sam for all Items In a given...NETHERLIS I E3I1 Focility-Acquisitions 4133 Rugby Avnue 1 Lee Cronbach Bethesda, NO 20014 16 Laburnue Road Atherton, CA 94205 1 Dr. Benjamin A. Fairbank
ERIC Educational Resources Information Center
Webster, Raymond E.
1980-01-01
A significant two-way input modality by output modality interaction suggested that short term memory capacity among the groups differed as a function of the modality used to present the items in combination with the output response required. (Author/CL)
Introduction to bifactor polytomous item response theory analysis.
Toland, Michael D; Sulis, Isabella; Giambona, Francesca; Porcu, Mariano; Campbell, Jonathan M
2017-02-01
A bifactor item response theory model can be used to aid in the interpretation of the dimensionality of a multifaceted questionnaire that assumes continuous latent variables underlying the propensity to respond to items. This model can be used to describe the locations of people on a general continuous latent variable as well as on continuous orthogonal specific traits that characterize responses to groups of items. The bifactor graded response (bifac-GR) model is presented in contrast to a correlated traits (or multidimensional GR model) and unidimensional GR model. Bifac-GR model specification, assumptions, estimation, and interpretation are demonstrated with a reanalysis of data (Campbell, 2008) on the Shared Activities Questionnaire. We also show the importance of marginalizing the slopes for interpretation purposes and we extend the concept to the interpretation of the information function. To go along with the illustrative example analyses, we have made available supplementary files that include command file (syntax) examples and outputs from flexMIRT, IRTPRO, R, Mplus, and STATA. Supplementary data to this article can be found online at http://dx.doi.org/10.1016/j.jsp.2016.11.001. Data needed to reproduce analyses in this article are available as supplemental materials (online only) in the Appendix of this article. Copyright © 2016 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.
Psychometric properties of a revised version of the Assisting Hand Assessment (Kids-AHA 5.0).
Holmefur, Marie M; Krumlinde-Sundholm, Lena
2016-06-01
The aim of this study was to scrutinize the Assisting Hand Assessment (AHA) version 4.4 for possible improvements and to evaluate the psychometric properties regarding internal scale validity and aspects of reliability of a revised version of the AHA. In collaboration with experts, scoring criteria were changed for four items, and one fully new item was constructed. Twenty-two original, one new, and four revised items were scored for 164 assessments of children with unilateral cerebral palsy aged 18 months to 12 years. Rasch measurement analysis was used to evaluate internal scale validity by exploring rating-scale functioning, item and person goodness-of-fit, and principal component analysis. Targeting and scale reliability were also evaluated. After removal of misfitting items, a 20-item scale showed satisfactory goodness-of-fit. Unidimensionality was confirmed by principal component analysis. The rating scale functioned well for the 20 items, and the item difficulty was well suited to the ability level of the sample. The person reliability coefficient was 0.98, indicating high separation ability of the scale. A conversion table of AHA scores between the previous version (4.4) and the new version (5.0) was constructed. The new, 20-item version of the Kids-AHA (version 5.0), demonstrated excellent internal scale validity, suggesting improved responsiveness to changes and shortened scoring time. For comparison of scores from version 4.4 to 5.0, a transformation table is presented. © 2015 Mac Keith Press.
Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald
2006-11-01
We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.
Development of the Parent Responses to School Functioning Questionnaire.
Barber Garcia, Brittany N; Gray, Laura S; Simons, Laura E; Logan, Deirdre E
2017-10-01
Parents play an important role in supporting school functioning in youth with chronic pain, but no validated tools exists to assess parental responses to child and adolescent pain behaviors in the school context. Such a tool would be useful in identifying targets of change to reduce pain-related school impairment. The goal of this study was to develop and preliminarily validate the Parent Responses to School Functioning Questionnaire (PRSF), a parent self-report measure of this construct. After initial expert review and pilot testing, the measure was administered to 418 parents of children (ages 6-17 years) seen for initial multidisciplinary chronic pain clinic evaluation. The final 16-item PRSF showed evidence of good internal consistency (α = .82) and 2-week test-retest reliability (intraclass correlation coefficient = .87). Criterion validity was demonstrated by significant correlations with school absence rates and overall school functioning, and construct validity was demonstrated by correlations with general parental responses to pain. Three subscales emerged capturing parents' personal distress, parents' level of distrust of the school, and parents' expectations and behaviors related to their child's management of challenging school situations. These results provide preliminary support for the PRSF as a psychometrically sound tool to assess parents' responses to child pain in the school setting. The 16-item PRSF measures parental responses to their child's chronic pain in the school context. The clinically useful measure can inform interventions aimed reducing functional disability in children with chronic pain by enhancing parents' ability to respond adaptively to child pain behaviors. Copyright © 2017 American Pain Society. Published by Elsevier Inc. All rights reserved.
Kozlowski, Allan J; Singh, Ritika; Victorson, David; Miskovic, Ana; Lai, Jin-Shei; Harvey, Richard L; Cella, David; Heinemann, Allen W
2015-11-01
To examine agreement between patient and proxy responses on the Quality of Life in Neurological Disorders (Neuro-QoL) instruments after stroke. Cross-sectional observational substudy of the longitudinal, multisite, multicondition Neuro-QoL validation study. In-person, interview-guided, patient-reported outcomes. Convenience sample of dyads (N=86) of community-dwelling persons with stroke and their proxy respondents. Not applicable. Dyads concurrently completed short forms of 8 or 9 items for the 13 Neuro-QoL adult domains using the patient-proxy perspective. Agreement was examined at the scale-level with difference scores, intraclass correlation coefficients (ICCs), effect size statistics, and Bland-Altman plots, and at the item-level with kappa coefficients. We found no mean differences between patients and proxies on the Applied Cognition-General Concerns, Depression, Satisfaction With Social Roles and Activities, Stigma, and Upper Extremity Function (Fine Motor, activities of daily living) short forms. Patients rated themselves more favorably on the Applied Cognition-Executive Function, Ability to Participate in Social Roles and Activities, Lower Extremity Function (Mobility), Positive Affect and Well-Being, Anxiety, Emotional and Behavioral Dyscontrol, and Fatigue short forms. The largest mean patient-proxy difference observed was 3 T-score points on the Lower Extremity Function (Mobility). ICCs ranged from .34 to .59. However, limits of agreement showed dyad differences exceeding ±20 T-score points, and item-level agreement ranged from not significant to weighted kappa=.34. Proxy responses on Neuro-QoL short forms can complement responses of moderate- to high-functioning community-dwelling persons with stroke and augment group-level analyses, but do not substitute for individual patient ratings. Validation is needed for other stroke populations. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Paap, Muirne C S; Braeken, Johan; Pedersen, Geir; Urnes, Øyvind; Karterud, Sigmund; Wilberg, Theresa; Hummelen, Benjamin
2017-12-01
This study aims at evaluating the psychometric properties of the antisocial personality disorder (ASPD) criteria in a large sample of patients, most of whom had one or more personality disorders (PD). PD diagnoses were assessed by experienced clinicians using the Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders, 4th edition, Axis II PDs. Analyses were performed within an item response theory framework. Results of the analyses indicated that ASPD is a unidimensional construct that can be measured reliably at the upper range of the latent trait scale. Differential item functioning across gender was restricted to two criteria and had little impact on the latent ASPD trait level. Patients fulfilling both the adult ASPD criteria and the conduct disorder criteria had similar latent trait distributions as patients fulfilling only the adult ASPD criteria. Overall, the ASPD items fit the purpose of a diagnostic instrument well, that is, distinguishing patients with moderate from those with high antisocial personality scores.
Using item response theory to address vulnerabilities in FFQ.
Kazman, Josh B; Scott, Jonathan M; Deuster, Patricia A
2017-09-01
The limitations for self-reporting of dietary patterns are widely recognised as a major vulnerability of FFQ and the dietary screeners/scales derived from FFQ. Such instruments can yield inconsistent results to produce questionable interpretations. The present article discusses the value of psychometric approaches and standards in addressing these drawbacks for instruments used to estimate dietary habits and nutrient intake. We argue that a FFQ or screener that treats diet as a 'latent construct' can be optimised for both internal consistency and the value of the research results. Latent constructs, a foundation for item response theory (IRT)-based scales (e.g. Patient Reported Outcomes Measurement Information System) are typically introduced in the design stage of an instrument to elicit critical factors that cannot be observed or measured directly. We propose an iterative approach that uses such modelling to refine FFQ and similar instruments. To that end, we illustrate the benefits of psychometric modelling by using items and data from a sample of 12 370 Soldiers who completed the 2012 US Army Global Assessment Tool (GAT). We used factor analysis to build the scale incorporating five out of eleven survey items. An IRT-driven assessment of response category properties indicates likely problems in the ordering or wording of several response categories. Group comparisons, examined with differential item functioning (DIF), provided evidence of scale validity across each Army sub-population (sex, service component and officer status). Such an approach holds promise for future FFQ.
Shaw, Amanda M; Rogge, Ronald D
2016-02-01
This study took a critical look at the construct of sexual quality. The 65 items of four well-validated self-report measures of sexual satisfaction (the Index of Sexual Satisfaction [ISS], Hudson, Harrison, & Crosscup, 1981; the Global Measure of Sexual Satisfaction [GMSEX], Lawrance & Byers, 1995; the Pinney Sexual Satisfaction Inventory [PSSI], Pinney, Gerrard, & Denney, 1987; the Young Sexual Satisfaction Scale [YSSS], Young, Denny, Luquis, & Young, 1998) and an additional 74 potential sexual quality items were given to 3060 online participants. Using Item Response Theory (IRT), we demonstrated that the ISS, YSSS, and PSSI scales provided suboptimal levels of precision in assessing sexual quality, particularly given the length of those scales. Exploratory factor analyses, IRT, differential item functioning analyses, and longitudinal responsiveness analyses were used to develop and evaluate the Quality of Sex Inventory. Results suggested that, in comparison to existing scales, the QSI (1) offers investigators and clinicians more theoretically focused scales, (2) distinguishes sexual satisfaction from sexual dissatisfaction, and (3) offers greater precision and power for detecting differences with (4) comparably high levels of responsiveness for detecting change over time despite being notably shorter than most of the existing scales. The QSI-satisfaction subscales demonstrated strong convergent validity with other measures of sexual satisfaction and excellent construct validity with anchor scales from the nomological net surrounding that construct, suggesting that they continue to assess the same theoretical construct as prior scales. Implications for research are discussed.
Tasca, Giorgio A; Cabrera, Christine; Kristjansson, Elizabeth; MacNair-Semands, Rebecca; Joyce, Anthony S; Ogrodniczuk, John S
2016-01-01
We tested a very brief version of the 23-item Therapeutic Factors Inventory-Short Form (TFI-S), and describe the use of Item Response Theory (IRT) for the purpose of developing short and reliable scales for group psychotherapy. Group therapy patients (N = 578) completed the TFI-S on one occasion, and their data were used for the IRT analysis. Of those, 304 completed the TFI-S and other measures on more than one occasion to assess sensitivity to change, concurrent, and predictive validity of the brief version. Results suggest that the new TFI-8 is a brief, reliable, and valid measure of a higher-order group therapeutic factor. The TFI-8 may be used for continuous process measurement and feedback to improve the functioning of therapy groups.
A psychometric evaluation of the Arm Motor Ability Test.
O'Dell, Michael W; Kim, Grace; Rivera, Lisa; Fieo, Robert; Christos, Paul; Polistena, Caitlin; Fitzgerald, Kerri; Gorga, Delia
2013-06-01
To further examine the psychometric properties of a 9-item version of the Arm Motor Ability Test (AMAT-9) in persons with stroke. Thirty-two community-dwelling persons > 6 months post-stroke undergoing robotics treatment (mean age = 56.0 years, time post-stroke = 4.1 years, National Institutes of Health Stroke Scale score = 4.1, and AMAT-9 score = 1.22). Construct validity (including Rasch analyses) used baseline data prior to treatment (n = 32). Standardized response mean was calculated for subjects completing the protocol (n = 29). The Wolf Motor Function Test (WMFT), Fugl-Meyer Assessment (FMA), Action Research Arm Test (ARAT), and Stroke Impact Scale (SIS) were also administered. Spearman-rank correlation coefficients between AMAT-9 and the WMFT, FMA, and ARAT were strong (0.78-0.79, all p < 0.001). The correlation between the AMAT-9 and SIS Hand Function sub-score was stronger than that between the AMAT-9 and the Communication sub-score (0.40, p = 0.025 and -0.16, p = 0.39, respectively). Rasch analyses provided evidence for an appropriate hierarchical structure of item difficulties, unidimensionality, and good reliability. The AMAT demonstrated a comparable standardized response mean of 0.98. The AMAT-9 is valid and responsive among subjects scoring in the lower range of the scale. It has the advantage of assessing function and by eliminating the standing item from the previous iteration, it may be more easily used with severely impaired patients.
Goetz, Christopher G; Liu, Yuanyuan; Stebbins, Glenn T; Wang, Lu; Tilley, Barbara C; Teresi, Jeanne A; Merkitch, Douglas; Luo, Sheng
2016-12-01
Assess MDS-UPDRS items for gender-, age-, and race/ethnicity-based differential item functioning. Assessing differential item functioning is a core rating scale validation step. For the MDS-UPDRS, differential item functioning occurs if item-score probability among people with similar levels of parkinsonism differ according to selected covariates (gender, age, race/ethnicity). If the magnitude of differential item functioning is clinically relevant, item-score interpretation must consider influences by these covariates. Differential item functioning can be nonuniform (covariate variably influences an item-score across different levels of parkinsonism) or uniform (covariate influences an item-score consistently over all levels of parkinsonism). Using the MDS-UPDRS translation database of more than 5,000 PD patients from 14 languages, we tested gender-, age-, and race/ethnicity-based differential item functioning. To designate an item as having clinically relevant differential item functioning, we required statistical confirmation by 2 independent methods, along with a McFadden pseudo-R 2 magnitude statistic greater than "negligible." Most items showed no gender-, age- or race/ethnicity-based differential item functioning. When differential item functioning was identified, the magnitude statistic was always in the "negligible" range, and the scale-level impact was minimal. The absence of clinically relevant differential item functioning across all items and all parts of the MDS-UPDRS is strong evidence that the scale can be used confidently. As studies of Parkinson's disease increasingly involve multinational efforts and the MDS-UPDRS has several validated non-English translations, the findings support the scale's broad applicability in populations with varying gender, age, and race/ethnicity distributions. © 2016 International Parkinson and Movement Disorder Society. © 2016 International Parkinson and Movement Disorder Society.
ERIC Educational Resources Information Center
Kim, Sooyeon; Walker, Michael E.
2011-01-01
This study examines the use of subpopulation invariance indices to evaluate the appropriateness of using a multiple-choice (MC) item anchor in mixed-format tests, which include both MC and constructed-response (CR) items. Linking functions were derived in the nonequivalent groups with anchor test (NEAT) design using an MC-only anchor set for 4…
Derivation of the MMPI-2-RF Henry-Heilbronner Index-r (HHI-r) scale.
Henry, George K; Heilbronner, Robert L; Algina, James; Kaya, Yasemin
2013-01-01
The 15-item Henry-Heilbronner Index (HHI) was published in 2006 as an MMPI-2 embedded measure of psychological response validity. When the MMPI-2 was revised in 2008 only 11 of the 15 original HHI items were retained on the MMPI-2-RF, prohibiting use of the HHI as an embedded validity indicator on the MMPI-2-RF. Using the original HHI sample an 11-item version of the HHI, the HHI-r, was evaluated for use as an embedded measure of psychological response validity for the MMPI-2-RF. The 11-item HHI-r was very similar to the HHI in classification accuracy. An HHI-r cutoff score of ≥7 was associated with a classification accuracy rate of 84.0%, good sensitivity (68.9%), and high specificity (93.2%) in identifying symptom exaggeration in personal injury and disability litigants versus non-litigating head-injured patients. These preliminary results suggest the HHI-r functions in a manner similar to the original HHI as a measure of psychological response validity, and may be used by psychologists and neuropsychologists as an MMPI-2-RF embedded validity indicator.
Listen to their answers! Response behaviour in the measurement of physical and role functioning
Hak, Tony; Sprangers, Mirjam A. G.; Groen, Harry J. M.; van der Wal, Gerrit; The, Anne-Mei
2008-01-01
Background Quality of life (QoL) is considered to be an indispensable outcome measure of curative and palliative treatment. However, QoL research often yields findings that raise questions about what QoL measurement instruments actually assess and how the scores should be interpreted. Objective To investigate how patients interpret and respond to questions on the EORTC-QLQ-C30 over time and to find explanations to account for counterintuitive findings in QoL measurement. Methods Qualitative investigation was made of the response behaviour of small-cell lung cancer patients (n = 23) in the measurement of QoL with the European Organization for Research and Treatment of Cancer Core Quality of Life Questionnaire (EORTC QLQ-C30). Focus was on physical functioning (PF, items 1 to 5), role functioning (RF, items 6 and 7), global health and QoL rating (GH/QOL, items 29 and 30). Interviews were held at four points: at the start of the chemotherapy, 4 weeks later, at the end, and 6 weeks after the end of chemotherapy. Patients were asked to ‘think aloud’ when filling in the questionnaire. Results Patients used various response strategies when answering questions about problems and limitations in functioning, which impacted the accuracy of the scale. Patients had scores suggesting they were less limited than they actually were by taking the wording of questions literally, by guessing their functioning in activities that they did not perform, and by ignoring or excluding certain activities that they could not perform. Conclusion Terminally ill patients evaluate their functioning in terms of what they perceive to be normal under the circumstances. Their answers can be interpreted in terms of change in the appraisal process (Rapkin and Schwartz 2004; Health and Quality of Life Outcomes, 2, 14). More care should be taken in assessing the quality of a set of questions about physical and role functioning. PMID:18389384
Listen to their answers! Response behaviour in the measurement of physical and role functioning.
Westerman, Marjan J; Hak, Tony; Sprangers, Mirjam A G; Groen, Harry J M; van der Wal, Gerrit; The, Anne-Mei
2008-05-01
Quality of life (QoL) is considered to be an indispensable outcome measure of curative and palliative treatment. However, QoL research often yields findings that raise questions about what QoL measurement instruments actually assess and how the scores should be interpreted. To investigate how patients interpret and respond to questions on the EORTC-QLQ-C30 over time and to find explanations to account for counterintuitive findings in QoL measurement. Qualitative investigation was made of the response behaviour of small-cell lung cancer patients (n = 23) in the measurement of QoL with the European Organization for Research and Treatment of Cancer Core Quality of Life Questionnaire (EORTC QLQ-C30). Focus was on physical functioning (PF, items 1 to 5), role functioning (RF, items 6 and 7), global health and QoL rating (GH/QOL, items 29 and 30). Interviews were held at four points: at the start of the chemotherapy, 4 weeks later, at the end, and 6 weeks after the end of chemotherapy. Patients were asked to 'think aloud' when filling in the questionnaire. Patients used various response strategies when answering questions about problems and limitations in functioning, which impacted the accuracy of the scale. Patients had scores suggesting they were less limited than they actually were by taking the wording of questions literally, by guessing their functioning in activities that they did not perform, and by ignoring or excluding certain activities that they could not perform. Terminally ill patients evaluate their functioning in terms of what they perceive to be normal under the circumstances. Their answers can be interpreted in terms of change in the appraisal process (Rapkin and Schwartz 2004; Health and Quality of Life Outcomes, 2, 14). More care should be taken in assessing the quality of a set of questions about physical and role functioning.
Khan, Anzalee; Lewis, Charles; Lindenmayer, Jean-Pierre
2011-11-16
Nonparametric item response theory (IRT) was used to examine (a) the performance of the 30 Positive and Negative Syndrome Scale (PANSS) items and their options ((levels of severity), (b) the effectiveness of various subscales to discriminate among differences in symptom severity, and (c) the development of an abbreviated PANSS (Mini-PANSS) based on IRT and a method to link scores to the original PANSS. Baseline PANSS scores from 7,187 patients with Schizophrenia or Schizoaffective disorder who were enrolled between 1995 and 2005 in psychopharmacology trials were obtained. Option characteristic curves (OCCs) and Item Characteristic Curves (ICCs) were constructed to examine the probability of rating each of seven options within each of 30 PANSS items as a function of subscale severity, and summed-score linking was applied to items selected for the Mini-PANSS. The majority of items forming the Positive and Negative subscales (i.e. 19 items) performed very well and discriminate better along symptom severity compared to the General Psychopathology subscale. Six of the seven Positive Symptom items, six of the seven Negative Symptom items, and seven out of the 16 General Psychopathology items were retained for inclusion in the Mini-PANSS. Summed score linking and linear interpolation was able to produce a translation table for comparing total subscale scores of the Mini-PANSS to total subscale scores on the original PANSS. Results show scores on the subscales of the Mini-PANSS can be linked to scores on the original PANSS subscales, with very little bias. The study demonstrated the utility of non-parametric IRT in examining the item properties of the PANSS and to allow selection of items for an abbreviated PANSS scale. The comparisons between the 30-item PANSS and the Mini-PANSS revealed that the shorter version is comparable to the 30-item PANSS, but when applying IRT, the Mini-PANSS is also a good indicator of illness severity.
Hagman, Brett T; Kuerbis, Alexis N; Morgenstern, Jon; Bux, Donald A; Parsons, Jeffrey T; Heidinger, Bram E
2009-11-01
The Short Inventory of Problems-Alcohol and Drugs (SIP-AD) is a 15-item measure that assesses concurrently negative consequences associated with alcohol and illicit drug use. Current psychometric evaluation has been limited to classical test theory (CTT) statistics, and it has not been validated among non-treatment seeking men-who-have-sex-with-men (MSM). Methods from Item Response Theory (IRT) can improve upon CTT by providing an in-depth analysis of how each item performs across the underlying latent trait that it is purported to measure. The present study examined the psychometric properties of the SIP-AD using methods from both IRT and CTT among a non-treatment seeking MSM sample (N=469). Participants were recruited from the New York City area and were asked to participate in a series of studies examining club drug use. Results indicated that five items on the SIP-AD demonstrated poor item misfit or significant differential item functioning (DIF) across race/ethnicity and HIV status. These five items were dropped and two-parameter IRT analyses were conducted on the remaining 10 items, which indicated a restricted range of item location parameters (-.15 to -.99) plotted at the lower end of the latent negative consequences severity continuum, and reasonably high discrimination parameters (1.30 to 2.22). Additional CTT statistics were compared between the original 15-item SIP-AD and the refined 10-item SIP-AD and suggest that the differences were negligible with the refined 10-item SIP-AD indicating a high degree of reliability and validity. Findings suggest the SIP-AD can be shortened to 10 items and appears to be a non-biased reliable and valid measure among non-treatment seeking MSM.
2011-01-01
Background Nonparametric item response theory (IRT) was used to examine (a) the performance of the 30 Positive and Negative Syndrome Scale (PANSS) items and their options ((levels of severity), (b) the effectiveness of various subscales to discriminate among differences in symptom severity, and (c) the development of an abbreviated PANSS (Mini-PANSS) based on IRT and a method to link scores to the original PANSS. Methods Baseline PANSS scores from 7,187 patients with Schizophrenia or Schizoaffective disorder who were enrolled between 1995 and 2005 in psychopharmacology trials were obtained. Option characteristic curves (OCCs) and Item Characteristic Curves (ICCs) were constructed to examine the probability of rating each of seven options within each of 30 PANSS items as a function of subscale severity, and summed-score linking was applied to items selected for the Mini-PANSS. Results The majority of items forming the Positive and Negative subscales (i.e. 19 items) performed very well and discriminate better along symptom severity compared to the General Psychopathology subscale. Six of the seven Positive Symptom items, six of the seven Negative Symptom items, and seven out of the 16 General Psychopathology items were retained for inclusion in the Mini-PANSS. Summed score linking and linear interpolation was able to produce a translation table for comparing total subscale scores of the Mini-PANSS to total subscale scores on the original PANSS. Results show scores on the subscales of the Mini-PANSS can be linked to scores on the original PANSS subscales, with very little bias. Conclusions The study demonstrated the utility of non-parametric IRT in examining the item properties of the PANSS and to allow selection of items for an abbreviated PANSS scale. The comparisons between the 30-item PANSS and the Mini-PANSS revealed that the shorter version is comparable to the 30-item PANSS, but when applying IRT, the Mini-PANSS is also a good indicator of illness severity. PMID:22087503
Health- and vision-related quality of life in intellectually disabled children.
Cui, Yu; Stapleton, Fiona; Suttle, Catherine; Bundy, Anita
2010-01-01
To investigate the psychometric properties of instruments for the assessment of self-reported functional vision performance and health-related quality of life in children with intellectual disabilities (IDs). Two instruments [Autoquestionnaire Enfant Image (AUQUEI), LV Prasad-Functional Vision Questionnaire (LVP-FVQ)] designed for the assessment of functional vision and health-related quality of life were adapted and administered to 168 school children with ID, aged 8 to 18 years. Rasch analysis was used to determine the appropriateness of the rating scales of these instruments and to identify any redundant items. Redundant items were excluded based on descriptive statistics and Rasch analysis, leaving 17 of 23 items in the revised AUQUEI and 16 of 22 in the LVP-FVQ. The AUQUEI items showed disordered thresholds on the rating scale. A modified step calibration (collapsed from four categories to three categories) resulted in ordered response thresholds for all items. The adjusted instrument produced an overall fit to the model (mean item infit = 1.06, SD = 0.32; mean item outfit = 1.11, SD = 0.35), indicating good construct validity. After Rasch analysis, the AUQUEI showed good content validity (person separation = 2.18; item reliability = 0.99; Cronbach alpha = 0.89). Increased similarity of person and item means and SDs on the logit scale after modification would indicate that the instrument was more applicable to the target population in its modified form. In contrast, the LVP-FVQ had a low person separation (1.35), suggesting that a more appropriate instrument is needed for assessment of vision-related quality of life in children with ID. The psychometric properties of two instruments were explored using Rasch analysis. By rescaling and reduction of items, the instruments were modified for use in a population of children with at least mild to moderate ID. However, an alternative instrument is needed for the assessment of vision-related quality of life in intellectually disabled children with normal vision or mild visual abnormalities.
Brod, Meryl; Højbjerre, Lise; Adalsteinsson, Johan Erpur; Rasmussen, Michael Højby
2014-04-01
Approximately 50 000 adults in the United States are diagnosed with GH deficiency, which has negative impacts on cognitive functioning, psychological well-being, and quality of life. This paper presents development and validation of a patient-reported outcome measure (PRO), the Treatment-Related Impact Measure-Adult Growth Hormone Deficiency (TRIM-AGHD). The TRIM-AGHD was developed to measure the impact of GH deficiency and its treatment. The development and validation of the TRIM-AGHD was conducted according to the Food and Drug Administration guidance on the development of PROs. Concept elicitation, conducted in three countries included interviews with patients, clinical experts, and literature review. Qualitative data were analyzed based on grounded theory principles, and draft items were cognitively debriefed. The measure underwent psychometric validation in a US clinic-based population. An a priori statistical analysis plan included assessment of the measurement model, reliability, and validity. Item functioning was reviewed using item response theory analyses. Forty-eight patients and six clinical experts participated in concept elicitation and 169 patients completed the validation study. TRIM-AGHD was measured. Factor analysis resulted in four domains: energy level, physical health, emotional health, and cognitive ability. The item response theory confirmed adequate item fit and placement within their domain. Internal consistency ranged from 0.82 to 0.95 and test-retest ranged from 0.80 to 0.92. All prespecified hypotheses for convergent validity and all but two for discriminant validity were met. The final 26-item TRIM-AGHD can be considered a reliable and valid PRO of the impact of disease and treatment for adult GH deficiency.
Functional recovery in patients with schizophrenia: recommendations from a panel of experts.
Lahera, Guillermo; Gálvez, José L; Sánchez, Pedro; Martínez-Roig, Miguel; Pérez-Fuster, J V; García-Portilla, Paz; Herrera, Berta; Roca, Miquel
2018-06-05
The management of schizophrenia is evolving towards a more comprehensive model based on functional recovery. The concept of functional recovery goes beyond clinical remission and encompasses multiple aspects of the patient's life, making it difficult to settle on a definition and to develop reliable assessment criteria. In this consensus process based on a panel of experts in schizophrenia, we aimed to provide useful insights on functional recovery and its involvement in clinical practice and clinical research. After a literature review of functional recovery in schizophrenia, a scientific committee of 8 members prepared a 75-item questionnaire, including 6 sections: (I) the concept of functional recovery (9 items), (II) assessment of functional recovery (23 items), (III) factors influencing functional recovery (16 items), (IV) psychosocial interventions and functional recovery (8 items), (V) pharmacological treatment and functional recovery (14 items), and (VI) the perspective of patients and their relatives on functional recovery (5 items). The questionnaire was sent to a panel of 53 experts, who rated each item on a 9-point Likert scale. Consensus was achieved in a 2-round Delphi dynamics, using the median (interquartile range) scores to consider consensus in either agreement (scores 7-9) or disagreement (scores 1-3). Items not achieving consensus in the first round were sent back to the experts for a second consideration. After the two recursive rounds, consensus was achieved in 64 items (85.3%): 61 items (81.3%) in agreement and 3 (4.0%) in disagreement, all of them from section II (assessment of functional recovery). Items not reaching consensus were related to the concepts of functional recovery (1 item, 1.3%), functional assessment (5 items, 6.7%), factors influencing functional recovery (3 items, 4.0%), and psychosocial interventions (2 items, 5.6%). Despite the lack of a well-defined concept of functional recovery, we identified a trend towards a common archetype of the definition and factors associated with functional recovery, as well as its applicability in clinical practice and clinical research.
Covic, Tanya; Pallant, Julie F; Conaghan, Philip G; Tennant, Alan
2007-01-01
Background The aim of this study was to test the internal validity of the total Center for Epidemiologic Studies-Depression (CES-D) scale using Rasch analysis in a rheumatoid arthritis (RA) population. Methods CES-D was administered to 157 patients with RA over three time points within a 12 month period. Rasch analysis was applied using RUMM2020 software to assess the overall fit of the model, the response scale used, individual item fit, differential item functioning (DIF) and person separation. Results Pooled data across three time points was shown to fit the Rasch model with removal of seven items from the original 20-item CES-D scale. It was necessary to rescore the response format from four to three categories in order to improve the scale's fit. Two items demonstrated some DIF for age and gender but were retained within the 13-item CES-D scale. A new cut point for depression score of 9 was found to correspond to the original cut point score of 16 in the full CES-D scale. Conclusion This Rasch analysis of the CES-D in a longstanding RA cohort resulted in the construction of a modified 13-item scale with good internal validity. Further validation of the modified scale is recommended particularly in relation to the new cut point for depression. PMID:17629902
Psychometric Principles in Measurement for Geoscience Education Research: A Climate Change Example
NASA Astrophysics Data System (ADS)
Libarkin, J. C.; Gold, A. U.; Harris, S. E.; McNeal, K.; Bowles, R.
2015-12-01
Understanding learning in geoscience classrooms requires that we use valid and reliable instruments aligned with intended learning outcomes. Nearly one hundred instruments assessing conceptual understanding in undergraduate science and engineering classrooms (often called concept inventories) have been published and are actively being used to investigate learning. The techniques used to develop these instruments vary widely, often with little attention to psychometric principles of measurement. This paper will discuss the importance of using psychometric principles to design, evaluate, and revise research instruments, with particular attention to the validity and reliability steps that must be undertaken to ensure that research instruments are providing meaningful measurement. An example from a climate change inventory developed by the authors will be used to exemplify the importance of validity and reliability, including the value of item response theory for instrument development. A 24-item instrument was developed based on published items, conceptions research, and instructor experience. Rasch analysis of over 1000 responses provided evidence for the removal of 5 items for misfit and one item for potential bias as measured via differential item functioning. The resulting 18-item instrument can be considered a valid and reliable measure based on pre- and post-implementation metrics. Consideration of the relationship between respondent demographics and concept inventory scores provides unique insight into the relationship between gender, religiosity, values and climate change understanding.
Debast, Inge; Rossi, Gina; Feenstra, Dineke; Hutsebaut, Joost
2017-04-01
Criterion D of the Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM-5 ; American Psychiatric Association [APA], 2013) refers to a possible onset of personality disorders (PDs) in adolescence and in Section II the development/course in adolescence is described by some typical characteristics for several PDs. Yet, age-specific expressions of PDs are lacking in Section III. We urgently need a developmentally sensitive assessment instrument that differentiates developmental and contextual changes on the one hand from expressions of personality pathology on the other hand. Therefore we investigated which items of the Severity Indices for Personality Problems-118 (SIPP-118) were developmentally sensitive throughout adolescence and adulthood and which could be considered more age-specific markers requiring other content or thresholds over age groups. Applying item response theory (IRT) we detected differential item functioning (DIF) in 36% of the items in matched samples of 639 adolescents versus 639 adults. The DIF across age groups mainly reflected a different degree of symptom expressions for the same underlying level of functioning. The threshold for exhibiting symptoms given a certain degree of personality dysfunction was lower in adolescence for areas of personality functioning related to the Self and Interpersonal domains. Some items also measured a latent construct of personality functioning differently across adolescents and adults. This suggests that several facets of the SIPP-118 do not solely measure aspects of personality pathology in adolescents, but likely include more developmental issues. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Saltychev, Mikhail; Mattie, Ryan; McCormick, Zachary; Laimi, Katri
2017-05-13
The Neck Disability Index (NDI) is commonly used for clinical and research assessment for chronic neck pain, yet the original version of this tool has not undergone significant validity testing, and in particular, there has been minimal assessment using Item Response Theory. The goal of the present study was to investigate the psychometric properties of the original version of the NDI in a large sample of individuals with chronic neck pain by defining its internal consistency, construct structure and validity, and its ability to discriminate between different degrees of functional limitation. This is a cross-sectional cohort study of 585 consecutive patients with chronic neck pain seen in a university hospital rehabilitation clinic. Internal consistency was evaluated using Cronbach's alpha, construct structure was evaluated by exploratory factor analysis, and discrimination ability was determined by Item Response Theory. The NDI demonstrated good internal consistency assessed by Cronbach's alpha (0.87). The exploratory factor analysis identified only one factor with eigenvalue considered significant (cutoff 1.0). When analyzed by Item Response Theory, eight out of 10 items demonstrated almost ideal difficulty parameter estimates. In addition, eight out of 10 items showed high to perfect estimates of discrimination ability (overall range 0.8 to 2.9). Amongst patients with chronic neck pain, the NDI was found to have good internal consistency, have unidimensional properties, and an excellent ability to distinguish patients with different levels of perceived disability. Implications for Rehabilitation The Neck Disability Index has good internal consistency, unidimensional properties, and an excellent ability to distinguish patients with different levels of perceived disability. The Neck Disability Index is recommended for use when selecting patients for rehabilitation, setting rehabilitation goals, and measuring the outcome of intervention.
The Effect of Missing Data Treatment on Mantel-Haenszel DIF Detection
ERIC Educational Resources Information Center
Emenogu, Barnabas C.; Falenchuk, Olesya; Childs, Ruth A.
2010-01-01
Most implementations of the Mantel-Haenszel differential item functioning procedure delete records with missing responses or replace missing responses with scores of 0. These treatments of missing data make strong assumptions about the causes of the missing data. Such assumptions may be particularly problematic when groups differ in their patterns…
Locally Dependent Latent Trait Model and the Dutch Identity Revisited.
ERIC Educational Resources Information Center
Ip, Edward H.
2002-01-01
Proposes a class of locally dependent latent trait models for responses to psychological and educational tests. Focuses on models based on a family of conditional distributions, or kernel, that describes joint multiple item responses as a function of student latent trait, not assuming conditional independence. Also proposes an EM algorithm for…
Bravini, Elisabetta; Franchignoni, Franco; Giordano, Andrea; Sartorio, Francesco; Ferriero, Giorgio; Vercelli, Stefano; Foti, Calogero
2015-01-01
To perform a comprehensive analysis of the psychometric properties and dimensionality of the Upper Limb Functional Index (ULFI) using both classical test theory and Rasch analysis (RA). Prospective, single-group observational design. Freestanding rehabilitation center. Convenience sample of Italian-speaking subjects with upper limb musculoskeletal disorders (N=174). Not applicable. The Italian version of the ULFI. Data were analyzed using parallel analysis, exploratory factor analysis, and RA for evaluating dimensionality, functioning of rating scale categories, item fit, hierarchy of item difficulties, and reliability indices. Parallel analysis revealed 2 factors explaining 32.5% and 10.7% of the response variance. RA confirmed the failure of the unidimensionality assumption, and 6 items out of the 25 misfitted the Rasch model. When the analysis was rerun excluding the misfitting items, the scale showed acceptable fit values, loading meaningfully to a single factor. Item separation reliability and person separation reliability were .98 and .89, respectively. Cronbach alpha was .92. RA revealed weakness of the scale concerning dimensionality and internal construct validity. However, a set of 19 ULFI items defined through the statistical process demonstrated a unidimensional structure, good psychometric properties, and clinical meaningfulness. These findings represent a useful starting point for further analyses of the tool (based on modern psychometric approaches and confirmatory factor analysis) in larger samples, including different patient populations and nationalities. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
An NCME Instructional Module on Polytomous Item Response Theory Models
ERIC Educational Resources Information Center
Penfield, Randall David
2014-01-01
A polytomous item is one for which the responses are scored according to three or more categories. Given the increasing use of polytomous items in assessment practices, item response theory (IRT) models specialized for polytomous items are becoming increasingly common. The purpose of this ITEMS module is to provide an accessible overview of…
Johansson, Mikael; Mecklinger, Axel
2003-10-01
The focus of the present paper is a late posterior negative slow wave (LPN) that has frequently been reported in event-related potential (ERP) studies of memory. An overview of these studies suggests that two broad classes of experimental conditions tend to elicit this component: (a) item recognition tasks associated with enhanced action monitoring demands arising from response conflict and (b) memory tasks that require the binding of items with contextual information specifying the study episode. A combined stimulus- and response-locked analysis of data from two studies mapping onto these classes allowed a temporal and functional decomposition of the LPN. While only the LPN observed in the item recognition task could be attributed to the involvement of a posteriorly distributed response-locked error-related negativity (or error negativity; ERN/Ne) occurring immediately after the response, the source-memory task was associated with a stimulus-locked negative slow wave occurring prior and during response execution that was evident when data were matched for response latencies. We argue that the presence of the former reflects action monitoring due to high levels of response conflict, whereas the latter reflects retrieval processes that may act to reconstruct the prior study episode when task-relevant attribute conjunctions are not readily recovered or need continued evaluation.
Sharp, J L; Gough, K; Pascoe, M C; Drosdowsky, A; Chang, V T; Schofield, P
2018-07-01
The Memorial Symptom Assessment Scale Short Form (MSAS-SF) is a widely used symptom assessment instrument. Patients who self-complete the MSAS-SF have difficulty following the two-part response format, resulting in incorrectly completed responses. We describe modifications to the response format to improve useability, and rational scoring rules for incorrectly completed items. The modified MSAS-SF was completed by 311 women in our Peer and Nurse support Trial to Assist women in Gynaecological Oncology; the PeNTAGOn study. Descriptive statistics were used to summarise completion of the modified MSAS-SF, and provide symptom statistics before and after applying the rational scoring rules. Spearman's correlations with the Functional Assessment for Cancer Therapy-General (FACT-G) and Hospital Anxiety and Depression Scale (HADS) were assessed. Correct completion of the modified MSAS-SF items ranged from 91.5 to 98.7%. The rational scoring rules increased the percentage of useable responses on average 4% across all symptoms. MSAS-SF item statistics were similar with and without the scoring rules. The pattern of correlations with FACT-G and HADS was compatible with prior research. The modified MSAS-SF was useable for self-completion and responses demonstrated validity. The rational scoring rules can minimise loss of data from incorrectly completed responses. Further investigation is recommended.
Development and validation of a measure of pediatric oral health-related quality of life: the POQL
Huntington, Noelle L; Spetter, Dante; Jones, Judith A.; Rich, Sharon E.; Garcia, Raul I.; Spiro, Avron
2011-01-01
Objective To develop a brief measure of oral health-related quality of life in children and demonstrate its reliability and validity in a diverse population. Methods We administered the initial 20-item POQL to children (Child Self-Report) and parents (Parent Report on Child) from diverse populations in both school-based and clinic-based settings. Clinical oral health status was measured on a subset of children. We used factor analysis to determine the underlying scales and then reduced the measure to 10 items based on several considerations. Multitrait analysis on the resulting 10-item POQL was used to reaffirm the discrimination of scales and assess the measure’s internal consistency and interscale correlations. We established discriminant and convergent validity with clinical status, perceived oral health and responses on the PedsQL and determined sensitivity to change with children undergoing ECC surgical repair. Results Factor analysis returned a four-scale solution for the initial items – Physical Functioning, Role Functioning, Social Functioning and Emotional Functioning. The reduced items represented the same four scales – two each on Physical and Role and three each on Social and Emotional. Good reliability and validity were shown for the POQL as a whole and for each of the scales. Conclusions The POQL is a valid and reliable measure of oral health-related quality of life for use in pre-school and school-aged children, with high utility for both clinical assessments and large-scale population studies. PMID:21972458
Development and validation of a measure of pediatric oral health-related quality of life: the POQL.
Huntington, Noelle L; Spetter, Dante; Jones, Judith A; Rich, Sharron E; Garcia, Raul I; Spiro, Avron
2011-01-01
To develop a brief measure of oral health-related quality of life (OHQL) in children and demonstrate its reliability and validity in a diverse population. We administered the initial 20-item Pediatric Oral Health-Related Quality of Life (POQL) to children (Child Self-Report) and parents (Parent Report on Child) from diverse populations in both school-based and clinic-based settings. Clinical oral health status was measured on a subset of children. We used factor analysis to determine the underlying scales and then reduced the measure to 10 items based on several considerations. Multitrait analysis on the resulting 10-item POQL was used to reaffirm the discrimination of scales and assess the measure's internal consistency and interscale correlations. We established discriminant and convergent validity with clinical status, perceived oral health and responses on the PedsQL, and determined sensitivity to change with children undergoing ECC surgical repair. Factor analysis returned a four-scale solution for the initial items--Physical Functioning, Role Functioning, Social Functioning, and Emotional Functioning. The reduced items represented the same four scales--two each on Physical and Role and three each on Social and Emotional. Good reliability and validity were shown for the POQL as a whole and for each of the scales. The POQL is a valid and reliable measure of OHQL for use in preschool and school-aged children, with high utility for both clinical assessments and large-scale population studies.
Nayak, Madhabika B; Bond, Jason C; Greenfield, Thomas K
2015-01-01
Efficient alcohol screening measures are important to prevent or treat alcohol use disorders (AUDs). We studied different versions of the Alcohol Use Disorders Identification Test (AUDIT) comparing their performance to the full AUDIT and an AUD measure as screeners for alcohol use problems in Goa, India. Data from a general population study on 743 male drinkers aged 18-49 years are reported. Drinkers completed the AUDIT and an AUD measure. We created shorter versions of the AUDIT by (a) collapsing AUDIT item responses into three and two categories and (b) deleting two items with the lowest factor loadings. Each version was evaluated using factor, reliability and validity, and differential item functioning (DIF) analysis by age, education, standard of living index (SLI), and area of residence. A single factor solution was found for each version with lower factor loadings for items on guilt and concern. There were no significant differences among the different AUDIT versions in predicting AUD. No significant DIF was found by education, SLI or area of residence. DIF was observed for the alcohol frequency item by age. The AUDIT may be used with dichotomized response options without loss of predictive validity. A shortened eight-item dichotomized scale can adequately screen for AUDs in Goa when brevity is of paramount importance, although with lower predictive validity. Although the frequency item was endorsed more by older men, there is no evidence that the AUDIT items perform differently in other groups of male drinkers in Goa.
Ramsay-Curve Item Response Theory for the Three-Parameter Logistic Item Response Model
ERIC Educational Resources Information Center
Woods, Carol M.
2008-01-01
In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters of a unidimensional item response model using marginal maximum likelihood estimation. This study evaluates RC-IRT for the three-parameter logistic (3PL) model with comparisons to the normal model and to the empirical…
ERIC Educational Resources Information Center
Preston, Kathleen; Reise, Steven; Cai, Li; Hays, Ron D.
2011-01-01
The authors used a nominal response item response theory model to estimate category boundary discrimination (CBD) parameters for items drawn from the Emotional Distress item pools (Depression, Anxiety, and Anger) developed in the Patient-Reported Outcomes Measurement Information Systems (PROMIS) project. For polytomous items with ordered response…
Three Classes of Nonparametric Differential Step Functioning Effect Estimators
ERIC Educational Resources Information Center
Penfield, Randall D.
2008-01-01
The examination of measurement invariance in polytomous items is complicated by the possibility that the magnitude and sign of lack of invariance may vary across the steps underlying the set of polytomous response options, a concept referred to as differential step functioning (DSF). This article describes three classes of nonparametric DSF effect…
An Alternative Methodology for Creating Parallel Test Forms Using the IRT Information Function.
ERIC Educational Resources Information Center
Ackerman, Terry A.
The purpose of this paper is to report results on the development of a new computer-assisted methodology for creating parallel test forms using the item response theory (IRT) information function. Recently, several researchers have approached test construction from a mathematical programming perspective. However, these procedures require…
Fajrianthi; Zein, Rizqy Amelia
2017-01-01
This study aimed to develop an emotional intelligence (EI) test that is suitable to the Indonesian workplace context. Airlangga Emotional Intelligence Test (Tes Kecerdasan Emosi Airlangga [TKEA]) was designed to measure three EI domains: 1) emotional appraisal, 2) emotional recognition, and 3) emotional regulation. TKEA consisted of 120 items with 40 items for each subset. TKEA was developed based on the Situational Judgment Test (SJT) approach. To ensure its psychometric qualities, categorical confirmatory factor analysis (CCFA) and item response theory (IRT) were applied to test its validity and reliability. The study was conducted on 752 participants, and the results showed that test information function (TIF) was 3.414 (ability level = 0) for subset 1, 12.183 for subset 2 (ability level = −2), and 2.398 for subset 3 (level of ability = −2). It is concluded that TKEA performs very well to measure individuals with a low level of EI ability. It is worth to note that TKEA is currently at the development stage; therefore, in this study, we investigated TKEA’s item analysis and dimensionality test of each TKEA subset. PMID:29238234
Computer-adaptive test to measure community reintegration of Veterans.
Resnik, Linda; Tian, Feng; Ni, Pengsheng; Jette, Alan
2012-01-01
The Community Reintegration of Injured Service Members (CRIS) measure consists of three scales measuring extent of, perceived limitations in, and satisfaction with community reintegration. Length of the CRIS may be a barrier to its widespread use. Using item response theory (IRT) and computer-adaptive test (CAT) methodologies, this study developed and evaluated a briefer community reintegration measure called the CRIS-CAT. Large item banks for each CRIS scale were constructed. A convenience sample of 517 Veterans responded to all items. Exploratory and confirmatory factor analyses (CFAs) were used to identify the dimensionality within each domain, and IRT methods were used to calibrate items. Accuracy and precision of CATs of different lengths were compared with the full-item bank, and data were examined for differential item functioning (DIF). CFAs supported unidimensionality of scales. Acceptable item fit statistics were found for final models. Accuracy of 10-, 15-, 20-, and variable-item CATs for all three scales was 0.88 or above. CAT precision increased with number of items administered and decreased at the upper ranges of each scale. Three items exhibited moderate DIF by sex. The CRIS-CAT demonstrated promising measurement properties and is recommended for use in community reintegration assessment.
Reeve, Bryce B.; Stover, Angela M.; Alfano, Catherine M.; Smith, Ashley Wilder; Ballard-Barbash, Rachel; Bernstein, Leslie; McTiernan, Anne; Baumgartner, Kathy B.; Piper, Barbara F.
2013-01-01
Purpose Brief, valid measures of fatigue, a prevalent and distressing cancer symptom, are needed for use in research. This study’s primary aim was to create a shortened version of the revised Piper Fatigue Scale (PFS-R) based on data from a diverse cohort of breast cancer survivors. A secondary aim was to determine whether the PFS captured multiple distinct aspects of fatigue (a multidimensional model) or a single overall fatigue factor (a unidimensional model). Methods Breast cancer survivors (n=799; stages in situ through IIIa; ages 29–86 yrs) were recruited through 3 SEER registries (New Mexico, Western Washington, and Los Angeles, CA) as part of the Health, Eating, Activity, and Lifestyle (HEAL) study. Fatigue was measured approximately 3 years post-diagnosis using the 22-item PFS-R that has 4 subscales (Behavior, Affect, Sensory, and Cognition). Confirmatory factor analysis was used to compare unidimensional and multidimensional models. Six criteria were used to make item selections to shorten the PFS-R: scale’s content validity, items’ relationship with fatigue, content redundancy, differential item functioning by race and/or education, scale reliability, and literacy demand. Results Factor analyses supported the original 4-factor structure. There was also evidence from the bi-factor model for a dominant underlying fatigue factor. Six items tested positive for differential item functioning between African-American and Caucasian survivors. Four additional items either showed poor association, local dependence, or content validity concerns. After removing these 10 items, the reliability of the PFS-12 subscales ranged from 0.87–0.89, compared to 0.90–0.94 prior to item removal. Conclusion The newly developed PFS-12 can be used to assess fatigue in African-American and Caucasian breast cancer survivors and reduces response burden without compromising reliability or validity. This is the first study to determine PFS literacy demand and to compare PFS-R responses in African-Americans and Caucasian breast cancer survivors. Further testing in diverse populations is warranted. PMID:22933027
Pedersen, Eric R; Huang, Wenjing; Dvorak, Robert D; Prince, Mark A; Hummer, Justin F
2017-08-01
Given recent state legislation legalizing marijuana for recreational purposes and majority popular opinion favoring these laws, we developed the Protective Behavioral Strategies for Marijuana scale (PBSM) to identify strategies that may mitigate the harms related to marijuana use among those young people who choose to use the drug. In the current study, we expand on the initial exploratory study of the PBSM to further validate the measure with a large and geographically diverse sample (N = 2,117; 60% women, 30% non-White) of college students from 11 different universities across the United States. We sought to develop a psychometrically sound item bank for the PBSM and to create a short assessment form that minimizes respondent burden and time. Quantitative item analyses, including exploratory and confirmatory factor analyses with item response theory (IRT) and evaluation of differential item functioning (DIF), revealed an item bank of 36 items that was examined for unidimensionality and good content coverage, as well as a short form of 17 items that is free of bias in terms of gender (men vs. women), race (White vs. non-White), ethnicity (Hispanic vs. non-Hispanic), and recreational marijuana use legal status (state recreational marijuana was legal for 25.5% of participants). We also provide a scoring table for easy transformation from sum scores to IRT scale scores. The PBSM item bank and short form associated strongly and negatively with past month marijuana use and consequences. The measure may be useful to researchers and clinicians conducting intervention and prevention programs with young adults. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Devine, J; Otto, C; Rose, M; Barthel, D; Fischer, F; Mühlan, H; Mülhan, H; Nolte, S; Schmidt, S; Ottova-Jordan, V; Ravens-Sieberer, U
2015-04-01
Assessing health-related quality of life (HRQoL) via Computerized Adaptive Tests (CAT) provides greater measurement precision coupled with a lower test burden compared to conventional tests. Currently, there are no European pediatric HRQoL CATs available. This manuscript aims at describing the development of a HRQoL CAT for children and adolescents: the Kids-CAT, which was developed based on the established KIDSCREEN-27 HRQoL domain structure. The Kids-CAT was developed combining classical test theory and item response theory methods and using large archival data of European KIDSCREEN norm studies (n = 10,577-19,580). Methods were applied in line with the US PROMIS project. Item bank development included the investigation of unidimensionality, local independence, exploration of Differential Item Functioning (DIF), evaluation of Item Response Curves (IRCs), estimation and norming of item parameters as well as first CAT simulations. The Kids-CAT was successfully built covering five item banks (with 26-46 items each) to measure physical well-being, psychological well-being, parent relations, social support and peers, and school well-being. The Kids-CAT item banks proved excellent psychometric properties: high content validity, unidimensionality, local independence, low DIF, and model conform IRCs. In CAT simulations, seven items were needed to achieve a measurement precision between .8 and .9 (reliability). It has a child-friendly design, is easy accessible online and gives immediate feedback reports of scores. The Kids-CAT has the potential to advance pediatric HRQoL measurement by making it less burdensome and enhancing the patient-doctor communication.
Item Response Models for Examinee-Selected Items
ERIC Educational Resources Information Center
Wang, Wen-Chung; Jin, Kuan-Yu; Qiu, Xue-Lan; Wang, Lei
2012-01-01
In some tests, examinees are required to choose a fixed number of items from a set of given items to answer. This practice creates a challenge to standard item response models, because more capable examinees may have an advantage by making wiser choices. In this study, we developed a new class of item response models to account for the choice…
ERIC Educational Resources Information Center
Lee, Woo-yeol; Cho, Sun-Joo
2017-01-01
Cross-level invariance in a multilevel item response model can be investigated by testing whether the within-level item discriminations are equal to the between-level item discriminations. Testing the cross-level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model…
Guenole, Nigel; Brown, Anna A; Cooper, Andrew J
2018-06-01
This article describes an investigation of whether Thurstonian item response modeling is a viable method for assessment of maladaptive traits. Forced-choice responses from 420 working adults to a broad-range personality inventory assessing six maladaptive traits were considered. The Thurstonian item response model's fit to the forced-choice data was adequate, while the fit of a counterpart item response model to responses to the same items but arranged in a single-stimulus design was poor. Monotrait heteromethod correlations indicated corresponding traits in the two formats overlapped substantially, although they did not measure equivalent constructs. A better goodness of fit and higher factor loadings for the Thurstonian item response model, coupled with a clearer conceptual alignment to the theoretical trait definitions, suggested that the single-stimulus item responses were influenced by biases that the independent clusters measurement model did not account for. Researchers may wish to consider forced-choice designs and appropriate item response modeling techniques such as Thurstonian item response modeling for personality questionnaire applications in industrial psychology, especially when assessing maladaptive traits. We recommend further investigation of this approach in actual selection situations and with different assessment instruments.
NASA Astrophysics Data System (ADS)
Qian, Xiaoyu
Science is an area where a large achievement gap has been observed between White and minority, and between male and female students. The science minority gap has continued as indicated by the National Assessment of Educational Progress and the Trends in International Mathematics and Science Studies (TIMSS). TIMSS also shows a gender gap favoring males emerging at the eighth grade. Both gaps continue to be wider in the number of doctoral degrees and full professorships awarded (NSF, 2008). The current study investigated both minority and gender achievement gaps in science utilizing a multi-level differential item functioning (DIF) methodology (Kamata, 2001) within fully Bayesian framework. All dichotomously coded items from TIMSS 2007 science assessment at eighth grade were analyzed. Both gender DIF and minority DIF were studied. Multi-level models were employed to identify DIF items and sources of DIF at both student and teacher levels. The study found that several student variables were potential sources of achievement gaps. It was also found that gender DIF favoring male students was more noticeable in the content areas of physics and earth science than biology and chemistry. In terms of item type, the majority of these gender DIF items were multiple choice than constructed response items. Female students also performed less well on items requiring visual-spatial ability. Minority students performed significantly worse on physics and earth science items as well. A higher percentage of minority DIF items in earth science and biology were constructed response than multiple choice items, indicating that literacy may be the cause of minority DIF. Three-level model results suggested that some teacher variables may be the cause of DIF variations from teacher to teacher. It is essential for both middle school science teachers and science educators to find instructional methods that work more effectively to improve science achievement of both female and minority students. Physics and earth science are two areas to be improved for both groups. Curriculum and instruction need to enhance female students' learning interests and give them opportunities to improve their visual perception skills. Science instruction should address improving minority students' literacy skills while teaching science.
Age and sex differences in paranormal beliefs: a response to Vitulli, Tipton, and Rowe (1999)
Irwin, H J
2000-04-01
Vitulli, Tipton, and Rowe (1999) report evidence of age and sex differences in the strength of paranormal beliefs. An alternative interpretation of their data is offered in terms of differential item functioning. It is suggested that respondents' interpretation of paranormal belief test items may vary with age and sex, and that such differences in the strength with which such beliefs are endorsed has not been conclusively established by Vitulli, et al.
Inchausti, Felix; Mole, Joe; Fonseca-Pedrero, Eduardo; Ortuño-Sierra, Javier
2015-01-01
The aim of this study was to analyse the psychometric properties of the Spanish NEO Five Factor Inventory–Revised (NEO-FFI-R) using Rasch analyses, in order to test its rating scale functioning, the reliability of scores, internal structure, and differential item functioning (DIF) by gender in a psychiatric sample. The NEO-FFI-R responses of 433 Spanish adults (154 males) with an anxiety disorder as primary diagnosis were analysed using the Rasch model for rating scales. Two intermediate categories of response (‘neutral’ and ‘agree’) malfunctioned in the Neuroticism and Conscientiousness scales. In addition, model reliabilities were lower than expected in Agreeableness and Neuroticism, and the item fit values indicated each scale had items that did not achieve moderate to high discrimination on its dimension, particularly in the Agreeableness scale. Concerning unidimensionality, the five NEO-FFI-R scales showed large first components of unexplained variance. Finally, DIF by gender was detected in many items. The results suggest that the scores of the Spanish NEO-FFI-R are unreliable in psychiatric samples and cannot be generalized between males and females, especially in the Openness, Conscientiousness, and Agreeableness scales. Future directions for testing and refinement should be developed before the NEO-FFI-R can be used reliably in clinical samples. PMID:25954224
Iidaka, Tetsuya; Matsumoto, Atsushi; Nogawa, Junpei; Yamamoto, Yukiko; Sadato, Norihiro
2006-09-01
The neural basis for successful recognition of previously studied items, referred to as "retrieval success," has been investigated using either neuroimaging or brain potentials; however, few studies have used both modalities. Our study combined event-related functional magnetic resonance imaging (fMRI) and event-related potential (ERP) in separate groups of subjects. The neural responses were measured while the subjects performed an old/new recognition task with pictures that had been previously studied in either a deep- or shallow-encoding condition. The fMRI experiment showed that among the frontoparietal regions involved in retrieval success, the inferior frontal gyrus and intraparietal sulcus were crucial to conscious recollection because the activity of these regions was influenced by the depth of memory at encoding. The activity of the right parietal region in response to a repeated item was modulated by the repetition lag, indicating that this area would be critical to familiarity-based judgment. The results of structural equation modeling revealed that the functional connectivity among the regions in the left hemisphere was more significant than that in the right hemisphere. The results of the ERP experiment and independent component analysis paralleled those of the fMRI experiment and demonstrated that the repeated item produced an earlier peak than the hit item by approximately 50 ms.
Examining the validity of the adapted Alabama Parenting Questionnaire-Parent Global Report Version.
Maguin, Eugene; Nochajski, Thomas H; De Wit, David J; Safyer, Andrew
2016-05-01
The purpose of the present study was to comprehensively examine the validity of an adapted version of the parent global report form of the Alabama Parenting Questionnaire (APQ) with respect to its factor structure, relationships with demographic and response style covariates, and differential item functioning (DIF). The APQ was adapted by omitting the corporal punishment and the other discipline items. The sample consisted of 674 Canadian and United States families having a 9- to 12-year-old child and at least 1 parent figure who had received treatment within the past 5 years for alcohol problems or met criteria for alcohol abuse or dependence. The primary parent in each family completed the APQ. The 4-factor CFA model of the 4 published scales used and the 3-factor CFA model of those scales from prior research were rejected. Exploratory structural equation modeling was then used. The final 3-factor model combined the author-defined Involvement and Positive Parenting scales and retained the original Poor Monitoring/Supervision and Inconsistent Discipline scales. However, there were substantial numbers of moderate magnitude cross-loadings and large magnitude residual covariances. Differential item functioning (DIF) was observed for a number of APQ items. Controlling for DIF, response style and demographic variables were related significantly to the factors. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Validation of a Health Literacy Measure for Adolescents and Young Adults Diagnosed with Cancer.
McDonald, Fiona E J; Patterson, Pandora; Costa, Daniel S J; Shepherd, Heather L
2016-03-01
Health literacy can influence long-term health outcomes. This study aimed to validate an adapted version of the Functional, Communicative and Critical Health Literacy measure for adolescent and young adult (AYA) cancer patients and survivors (N = 105; age 12-24 years). Exploratory factor analysis was used to validate the measure, and indicated that a slightly modified item structure better fit the results. Furthermore, item response theory analysis highlighted location and discrimination parameter differences among items. Acceptability of the measure was high. This is the first validation of a health literacy measure among AYAs with an illness such as cancer.
Land, Stephanie R; Warren, Graham W; Crafts, Jennifer L; Hatsukami, Dorothy K; Ostroff, Jamie S; Willis, Gordon B; Chollette, Veronica Y; Mitchell, Sandra A; Folz, Jasmine N M; Gulley, James L; Szabo, Eva; Brandon, Thomas H; Duffy, Sonia A; Toll, Benjamin A
2016-06-01
To the authors' knowledge, there are currently no standardized measures of tobacco use and secondhand smoke exposure in patients diagnosed with cancer, and this gap hinders the conduct of studies examining the impact of tobacco on cancer treatment outcomes. The objective of the current study was to evaluate and refine questionnaire items proposed by an expert task force to assess tobacco use. Trained interviewers conducted cognitive testing with cancer patients aged ≥21 years with a history of tobacco use and a cancer diagnosis of any stage and organ site who were recruited at the National Institutes of Health Clinical Center in Bethesda, Maryland. Iterative rounds of testing and item modification were conducted to identify and resolve cognitive issues (comprehension, memory retrieval, decision/judgment, and response mapping) and instrument navigation issues until no items warranted further significant modification. Thirty participants (6 current cigarette smokers, 1 current cigar smoker, and 23 former cigarette smokers) were enrolled from September 2014 to February 2015. The majority of items functioned well. However, qualitative testing identified wording ambiguities related to cancer diagnosis and treatment trajectory, such as "treatment" and "surgery"; difficulties with lifetime recall; errors in estimating quantities; and difficulties with instrument navigation. Revisions to item wording, format, order, response options, and instructions resulted in a questionnaire that demonstrated navigational ease as well as good question comprehension and response accuracy. The Cancer Patient Tobacco Use Questionnaire (C-TUQ) can be used as a standardized item set to accelerate the investigation of tobacco use in the cancer setting. Cancer 2016;122:1728-34. © 2016 American Cancer Society. © 2016 American Cancer Society.
Leidy, Nancy Kline; Hamilton, Alan; Becker, Karin
2012-01-01
The performance of daily activities is a major challenge for people with chronic obstructive pulmonary disease (COPD). The Functional Performance Inventory (FPI) was developed based on an analytical framework of functional status and qualitative interviews with COPD patients describing these difficulties. The 65-item FPI was reduced to a 32-item short form (SF) through a systematic process of qualitative and quantitative item reduction and formatted for greater clarity and ease of use. This study examined the content validity of the reduced, reformatted form of the instrument, the FPI-SF. Qualitative cognitive interviews were conducted with COPD patients recruited from three geographically diverse pulmonary clinics in the United States. Interviews were designed to assess respondent interpretation of the instrument, evaluate clarity and ease of completion, and identify any new activities participants found important and difficult to perform that were not represented by the existing items. Twenty subjects comprised the sample; 12 (60%) were male, 14 (70%) were Caucasian, the mean age was 63.0 ± 11.3 years, 12 (60%) were retired, the mean forced expiratory volume in 1 second (FEV(1)) was 1.5 ± 0.5 L, and the mean percent predicted FEV(1) was 48.4% ± 13.1%. Participants understood the FPI-SF as intended, including instructions, items, and response options. Two minor formatting changes were suggested to improve clarity of presentation. Participants found the content of the FPI-SF to be comprehensive, with items covering activities they felt were important and often difficult to perform. These results, together with its development history and previously tested quantitative properties, suggest that the FPI-SF is content valid for use in clinical studies of COPD.
Leidy, Nancy Kline; Hamilton, Alan; Becker, Karin
2012-01-01
Purpose The performance of daily activities is a major challenge for people with chronic obstructive pulmonary disease (COPD). The Functional Performance Inventory (FPI) was developed based on an analytical framework of functional status and qualitative interviews with COPD patients describing these difficulties. The 65-item FPI was reduced to a 32-item short form (SF) through a systematic process of qualitative and quantitative item reduction and formatted for greater clarity and ease of use. This study examined the content validity of the reduced, reformatted form of the instrument, the FPI-SF. Patients and methods Qualitative cognitive interviews were conducted with COPD patients recruited from three geographically diverse pulmonary clinics in the United States. Interviews were designed to assess respondent interpretation of the instrument, evaluate clarity and ease of completion, and identify any new activities participants found important and difficult to perform that were not represented by the existing items. Results Twenty subjects comprised the sample; 12 (60%) were male, 14 (70%) were Caucasian, the mean age was 63.0 ± 11.3 years, 12 (60%) were retired, the mean forced expiratory volume in 1 second (FEV1) was 1.5 ± 0.5 L, and the mean percent predicted FEV1 was 48.4% ± 13.1%. Participants understood the FPI-SF as intended, including instructions, items, and response options. Two minor formatting changes were suggested to improve clarity of presentation. Participants found the content of the FPI-SF to be comprehensive, with items covering activities they felt were important and often difficult to perform. Conclusion These results, together with its development history and previously tested quantitative properties, suggest that the FPI-SF is content valid for use in clinical studies of COPD. PMID:22969295
Anorexia/cachexia-related quality of life for children with cancer.
Lai, Jin-Shei; Cella, David; Peterman, Amy; Barocas, Joshua; Goldman, Stewart
2005-10-01
Anorexia is a common symptom in patients with cancer, which can lead to poor tolerance of treatment and can contribute to cachexia in extreme cases. Children with advanced-stage cancer are especially vulnerable to malnutrition resulting from anorexia and cachexia. Currently, there are no instruments that measure common concerns specifically associated with anorexia and cachexia in children with cancer. The purpose of the current article was to test the psychometric properties of a newly developed pediatric Functional Assessment of Anorexia and Cachexia Therapy (peds-FAACT) for children with cancer. Ninety-six patients (ages 7-17 yrs) receiving cancer treatment and their parents were asked to complete the 12-item peds-FAACT. The authors implemented both classical test theory and item response theory to evaluate the agreement between parents and patients, internal consistency and unidimensionality of the scale, and stability of items across subgroups. As a result, a patient-reported six-item scale was recommended as the core measure for all pediatric patients with cancer and four additional peripheral items were recommended for adolescent patients. The peds-FAACT demonstrated good psychometric properties, differentiated patients with different functional performance status, and was determined to be a useful tool for future clinical trials.
Towse, John N; Cowan, Nelson; Hitch, Graham J; Horton, Neil J
2008-01-01
We describe and evaluate a recall reconstruction hypothesis for working memory (WM), according to which items can be recovered from multiple memory representations. Across four experiments, participants recalled memoranda that were either integrated with or independent of the sentence content. We found consistently longer pauses accompanying the correct recall of integrated compared with independent words, supporting the argument that sentence memory could scaffold the access of target items. Integrated words were also more likely to be recalled correctly, dependent on the details of the task. Experiment 1 investigated the chronometry of spoken recall for word span and reading span, with participants completing an unfinished sentence in the latter case. Experiments 2 and 3 confirm recall time differences without using word generation requirements, while Experiment 4 used an item and order response choice paradigm with nonspoken responses. Data emphasise the value of recall timing in constraining theories of WM functioning.
Qualitative Development of the PROMIS® Pediatric Stress Response Item Banks
Gardner, William; Pajer, Kathleen; Riley, Anne W.; Forrest, Christopher B.
2013-01-01
Objective To describe the qualitative development of the Patient-Reported Outcome Measurement Information System (PROMIS®) Pediatric Stress Response item banks. Methods Stress response concepts were specified through a literature review and interviews with content experts, children, and parents. A library comprising 2,677 items derived from 71 instruments was developed. Items were classified into conceptual categories; new items were written and redundant items were removed. Items were then revised based on cognitive interviews (n = 39 children), readability analyses, and translatability reviews. Results 2 pediatric Stress Response sub-domains were identified: somatic experiences (43 items) and psychological experiences (64 items). Final item pools cover the full range of children’s stress experiences. Items are comprehensible among children aged ≥8 years and ready for translation. Conclusions Child- and parent-report versions of the item banks assess children’s somatic and psychological states when demands tax their adaptive capabilities. PMID:23124904
Welcome, Suzanne E; Paivio, Allan; McRae, Ken; Joanisse, Marc F
2011-07-01
We examined ERP responses during the generation of word associates or mental images in response to concrete and abstract concepts. Of interest were the predictions of dual coding theory (DCT), which proposes that processing lexical concepts depends on functionally independent but interconnected verbal and nonverbal systems. ERP responses were time-locked to either stimulus onset or response to compensate for potential latency differences across conditions. During word associate generation, but not mental imagery, concrete items elicited a greater N400 than abstract items. A concreteness effect emerged at a later time point during the mental imagery task. Data were also analyzed using time-frequency analysis that investigated synchronization of neuronal populations over time during processing. Concrete words elicited an enhanced late going desynchronization of theta-band power (723-938 ms post stimulus onset) during associate generation. During mental imagery, abstract items elicited greater delta-band power from 800 to 1,000 ms following stimulus onset, theta-band power from 350 to 205 ms before response, and alpha-band power from 900 to 800 ms before response. Overall, the findings support DCT in suggesting that lexical concepts are not amodal and that concreteness effects are modulated by tasks that focus participants on verbal versus nonverbal, imagery-based knowledge.
Irrational Delay Revisited: Examining Five Procrastination Scales in a Global Sample
Svartdal, Frode; Steel, Piers
2017-01-01
Scales attempting to measure procrastination focus on different facets of the phenomenon, yet they share a common understanding of procrastination as an unnecessary, unwanted, and disadvantageous delay. The present paper examines in a global sample (N = 4,169) five different procrastination scales – Decisional Procrastination Scale (DPS), Irrational Procrastination Scale (IPS), Pure Procrastination Scale (PPS), Adult Inventory of Procrastination Scale (AIP), and General Procrastination Scale (GPS), focusing on factor structures and item functioning using Confirmatory Factor Analysis and Item Response Theory. The results indicated that The PPS (12 items selected from DPS, AIP, and GPS) measures different facets of procrastination even better than the three scales it is based on. An even shorter version of the PPS (5 items focusing on irrational delay), corresponds well to the nine-item IPS. Both scales demonstrate good psychometric properties and appear to be superior measures of core procrastination attributes than alternative procrastination scales. PMID:29163302
Irrational Delay Revisited: Examining Five Procrastination Scales in a Global Sample.
Svartdal, Frode; Steel, Piers
2017-01-01
Scales attempting to measure procrastination focus on different facets of the phenomenon, yet they share a common understanding of procrastination as an unnecessary, unwanted, and disadvantageous delay. The present paper examines in a global sample ( N = 4,169) five different procrastination scales - Decisional Procrastination Scale (DPS), Irrational Procrastination Scale (IPS), Pure Procrastination Scale (PPS), Adult Inventory of Procrastination Scale (AIP), and General Procrastination Scale (GPS), focusing on factor structures and item functioning using Confirmatory Factor Analysis and Item Response Theory. The results indicated that The PPS (12 items selected from DPS, AIP, and GPS) measures different facets of procrastination even better than the three scales it is based on. An even shorter version of the PPS (5 items focusing on irrational delay), corresponds well to the nine-item IPS. Both scales demonstrate good psychometric properties and appear to be superior measures of core procrastination attributes than alternative procrastination scales.
Kawasaki, Yohei; Ide, Kazuki; Akutagawa, Maiko; Yamada, Hiroshi; Furukawa, Toshiaki A.; Ono, Yutaka
2016-01-01
Background Several studies have shown that total depressive symptom scores in the general population approximate an exponential pattern, except for the lower end of the distribution. The Center for Epidemiologic Studies Depression Scale (CES-D) consists of 20 items, each of which may take on four scores: “rarely,” “some,” “occasionally,” and “most of the time.” Recently, we reported that the item responses for 16 negative affect items commonly exhibit exponential patterns, except for the level of “rarely,” leading us to hypothesize that the item responses at the level of “rarely” may be related to the non-exponential pattern typical of the lower end of the distribution. To verify this hypothesis, we investigated how the item responses contribute to the distribution of the sum of the item scores. Methods Data collected from 21,040 subjects who had completed the CES-D questionnaire as part of a Japanese national survey were analyzed. To assess the item responses of negative affect items, we used a parameter r, which denotes the ratio of “rarely” to “some” in each item response. The distributions of the sum of negative affect items in various combinations were analyzed using log-normal scales and curve fitting. Results The sum of the item scores approximated an exponential pattern regardless of the combination of items, whereas, at the lower end of the distributions, there was a clear divergence between the actual data and the predicted exponential pattern. At the lower end of the distributions, the sum of the item scores with high values of r exhibited higher scores compared to those predicted from the exponential pattern, whereas the sum of the item scores with low values of r exhibited lower scores compared to those predicted. Conclusions The distributional pattern of the sum of the item scores could be predicted from the item responses of such items. PMID:27806132
Pollard, Beth; Dixon, Diane; Dieppe, Paul; Johnston, Marie
2009-01-01
Background The International Classification of Functioning, Disability and Health (ICF) proposes three main health outcomes, Impairment (I), Activity Limitation (A) and Participation Restriction (P), but good measures of these constructs are needed The aim of this study was to use both Classical Test Theory (CTT) and Item Response Theory (IRT) methods to carry out an item analysis to improve measurement of these three components in patients having joint replacement surgery mainly for osteoarthritis (OA). Methods A geographical cohort of patients about to undergo lower limb joint replacement was invited to participate. Five hundred and twenty four patients completed ICF items that had been previously identified as measuring only a single ICF construct in patients with osteoarthritis. There were 13 I, 26 A and 20 P items. The SF-36 was used to explore the construct validity of the resultant I, A and P measures. The CTT and IRT analyses were run separately to identify items for inclusion or exclusion in the measurement of each construct. The results from both analyses were compared and contrasted. Results Overall, the item analysis resulted in the removal of 4 I items, 9 A items and 11 P items. CTT and IRT identified the same 14 items for removal, with CTT additionally excluding 3 items, and IRT a further 7 items. In a preliminary exploration of reliability and validity, the new measures appeared acceptable. Conclusion New measures were developed that reflect the ICF components of Impairment, Activity Limitation and Participation Restriction for patients with advanced arthritis. The resulting Aberdeen IAP measures (Ab-IAP) comprising I (Ab-I, 9 items), A (Ab-A, 17 items), and P (Ab-P, 9 items) met the criteria of conventional psychometric (CTT) analyses and the additional criteria (information and discrimination) of IRT. The use of both methods was more informative than the use of only one of these methods. Thus combining CTT and IRT appears to be a valuable tool in the development of measures. PMID:19422677
Vastamäki, Heidi; Vastamäki, Martti; Laimi, Katri; Saltychev, Michail
2017-07-01
Poorly functioning work environments may lead to dissatisfaction for the employees and financial loss for the employers. The Job Content Questionnaire (JCQ) was designed to measure social and psychological characteristics of work environments. To investigate the factor construct of the Finnish 14-item version of JCQ when applied to professional orchestra musicians. In a cross-sectional survey, the questionnaire was sent by mail to 1550 orchestra musicians and students. 630 responses were received. Full data were available for 590 respondents (response rate 38%).The questionnaire also contained questions on demographics, job satisfaction, health status, health behaviors, and intensity of playing music. Confirmatory factor analysis of the 2-factor model of JCQ was conducted. Of the 5 estimates, JCQ items in the "job demand" construct, the "conflicting demands" (question 5) explained most of the total variance in this construct (79%) demonstrating almost perfect correlation of 0.63. In the construct of "job control," "opinions influential" (question 10) demonstrated a perfect correlation index of 0.84 and the items "little decision freedom" (question 14) and "allows own decisions" (question 6) showed substantial correlations of 0.77 and 0.65. The 2-factor model of the Finnish 14-item version of JCQ proposed in this study fitted well into the observed data. The "conflicting demands," "opinions influential," "little decision freedom," and "allows own decisions" items demonstrated the strongest correlations with latent factors suggesting that in a population similar to the studied one, especially these items should be taken into account when observed in the response of a population.
Validation of the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch Analysis.
Garzón Umerenkova, Angélica; de la Fuente Arias, Jesús; Martínez-Vicente, José Manuel; Zapata Sevillano, Lucía; Pichardo, Mari Carmen; García-Berbén, Ana Belén
2017-01-01
Background: The aim of the study was to psychometrically characterize the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch analysis. Materials and Methods: 831 Spaniard university students (262 men), between 17 and 39 years of age and ranging from the first to the 5th year of studies, completed the SSSRQ questionnaire. Confirmatory factor analysis (CFA) was carried out in order to establish structural adequacy. Afterward, by means of the Rasch model, a study of each sub scale was conducted to test for dimensionality, fit of the sample questions, functionality of the response categories, reliability and estimation of Differential Item Functioning by gender and course. Results: The four sub-scales comply with the unidimensionality criteria, the questions are in line with the model, the response categories operate properly and the reliability of the sample is acceptable. Nonetheless, the test could benefit from the inclusion of additional items of both high and low difficulty in order to increase construct validity, discrimination and reliability for the respondents. Several items with differences in gender and course were also identified. Discussion: The results evidence the need and adequacy of this complementary psychometric analysis strategy, in relation to the CFA to enhance the instrument.
Validation of the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch Analysis
Garzón Umerenkova, Angélica; de la Fuente Arias, Jesús; Martínez-Vicente, José Manuel; Zapata Sevillano, Lucía; Pichardo, Mari Carmen; García-Berbén, Ana Belén
2017-01-01
Background: The aim of the study was to psychometrically characterize the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch analysis. Materials and Methods: 831 Spaniard university students (262 men), between 17 and 39 years of age and ranging from the first to the 5th year of studies, completed the SSSRQ questionnaire. Confirmatory factor analysis (CFA) was carried out in order to establish structural adequacy. Afterward, by means of the Rasch model, a study of each sub scale was conducted to test for dimensionality, fit of the sample questions, functionality of the response categories, reliability and estimation of Differential Item Functioning by gender and course. Results: The four sub-scales comply with the unidimensionality criteria, the questions are in line with the model, the response categories operate properly and the reliability of the sample is acceptable. Nonetheless, the test could benefit from the inclusion of additional items of both high and low difficulty in order to increase construct validity, discrimination and reliability for the respondents. Several items with differences in gender and course were also identified. Discussion: The results evidence the need and adequacy of this complementary psychometric analysis strategy, in relation to the CFA to enhance the instrument. PMID:28298898
NASA Astrophysics Data System (ADS)
Ilich, Maria O.
Psychometricians and test developers evaluate standardized tests for potential bias against groups of test-takers by using differential item functioning (DIF). English language learners (ELLs) are a diverse group of students whose native language is not English. While they are still learning the English language, they must take their standardized tests for their school subjects, including science, in English. In this study, linguistic complexity was examined as a possible source of DIF that may result in test scores that confound science knowledge with a lack of English proficiency among ELLs. Two years of fifth-grade state science tests were analyzed for evidence of DIF using two DIF methods, Simultaneous Item Bias Test (SIBTest) and logistic regression. The tests presented a unique challenge in that the test items were grouped together into testlets---groups of items referring to a scientific scenario to measure knowledge of different science content or skills. Very large samples of 10, 256 students in 2006 and 13,571 students in 2007 were examined. Half of each sample was composed of Spanish-speaking ELLs; the balance was comprised of native English speakers. The two DIF methods were in agreement about the items that favored non-ELLs and the items that favored ELLs. Logistic regression effect sizes were all negligible, while SIBTest flagged items with low to high DIF. A decrease in socioeconomic status and Spanish-speaking ELL diversity may have led to inconsistent SIBTest effect sizes for items used in both testing years. The DIF results for the testlets suggested that ELLs lacked sufficient opportunity to learn science content. The DIF results further suggest that those constructed response test items requiring the student to draw a conclusion about a scientific investigation or to plan a new investigation tended to favor ELLs.
Development and validation of a vision-specific quality-of-life questionnaire for Timor-Leste.
du Toit, Rènée; Palagyi, Anna; Ramke, Jacqueline; Brian, Garry; Lamoureux, Ecosse L
2008-10-01
To develop and determine the reliability and validity of a vision-specific quality-of-life instrument (TL-VSQOL) designed to assess the impact of distance and near vision impairment in adults living in Timor-Leste. A vision-specific quality-of-life questionnaire was developed, piloted, and administered to 704 Timorese aged >or=40 years during a population-based eye health rapid assessment. Rasch analysis was performed on the data of 457 participants with presenting near vision worse than N8 (78.5%) and/or distance vision worse than 6/18 (69.8%). Unidimensionality, item fit to the model, response category performance, differential item functioning, and targeting of items to participants were assessed. Initially, the questionnaire lacked fit to the Rasch model. Removal of two items concerning emotional well-being resulted in a fit of the data (overall item-trait interaction: chi(2) (df) = 81 (51); mean (SD) person and item fit residual values: -0.30 (1.02) and -0.32 (1.46), and good targeting of person ability and item difficulty was evident. Poorer distance and near visual acuities were significantly associated with worse quality-of-life scores (P < 0.001). Person separation reliability was substantial (0.93), indicating that the instrument can discriminate between groups with normal and impaired vision. All 17 items were free of differential item functioning, and there was no evidence of multidimensionality. This 17-item TL-VSQOL has high reliability, construct, and criterion validity and effective targeting. It can effectively assess the impact on quality of life of adult Timorese with distance and near vision impairment. The TL-VSQOL could be adapted for use in other low-resource settings.
Development of a vision-targeted health-related quality of life item measure
Slotkin, Jerry; McKean-Cowdin, Roberta; Lee, Paul; Owsley, Cynthia; Vitale, Susan; Varma, Rohit; Gershon, Richard; Hays, Ron D.
2013-01-01
Purpose To develop a vision-targeted health-related quality of life (HRQOL) measure for the NIH Toolbox for the Assessment of Neurological and Behavioral Function. Methods We conducted a review of existing vision-targeted HRQOL surveys and identified color vision, low luminance vision, distance vision, general vision, near vision, ocular symptoms, psychosocial well-being, and role performance domains. Items in existing survey instruments were sorted into these domains. We selected non-redundant items and revised them to improve clarity and to limit the number of different response options. We conducted 10 cognitive interviews to evaluate the items. Finally, we revised the items and administered them to 819 individuals to calibrate the items and estimate the measure’s reliability and validity. Results The field test provided support for the 53-item vision-targeted HRQOL measure encompassing 6 domains: color vision, distance vision, near vision, ocular symptoms, psychosocial well-being, and role performance. The domain scores had high levels of reliability (coefficient alphas ranged from 0.848 to 0.940). Validity was supported by high correlations between National Eye Institute Visual Function Questionnaire scales and the new-vision-targeted scales (highest values were 0.771 between psychosocial well-being and mental health, and 0.729 between role performance and role difficulties), and by lower mean scores in those groups self-reporting eye disease (F statistic with p < 0.01 for all comparisons except cataract with ocular symptoms, psychosocial well-being, and role performance scales). Conclusions This vision-targeted HRQOL measure provides a basis for comprehensive assessment of the impact of eye diseases and treatments on daily functioning and well-being in adults. PMID:23475688
Ayala, Alba; Bilbao, Amaia; Garcia-Perez, Sonia; Escobar, Antonio; Forjaz, Maria João
2018-03-01
The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) measures the quality of life of patients with osteoarthritis (OA), and there is a specific scale for the physical functioning dimension, the short version with seven items WOMAC-pf. This study describes the application of the Rasch model to explore scale invariance and response stability of the WOMAC-pf short version across affected joint and over time. A sample of 884 patients with OA, from 15 hospitals in Spain, completed the WOMAC-pf before surgery (baseline) and at 3, 6 and 12 months post-surgery of hip or knee. The invariance by joint was explored through the differential item functioning (DIF) analysis of the Rasch model using baseline data, and time stability (DIF by time) were evaluated in stack data (each participant is represented four times, one by time point). Mean age of the patients was of 69.13 years (SD 10.01), 59.3% of them were women (n = 524), 59.2% had knee OA (n = 523) and 40.8% hip OA (n = 361). Item "putting on socks" showed DIF by joint and time. Fit to the Rasch model using stack data improved when this item was removed. Good reliability for individual use, local independency and unidimensionality of the models were confirmed. WOMAC-pf 7-item short version was invariant over time and joint when item "putting on socks" was removed. Researchers should carefully evaluate this item as it presents problems in scale invariance and stability, which could affect results when comparing data by joint or when computing change scores.
Stochastic Approximation Methods for Latent Regression Item Response Models
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2010-01-01
This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
Rasch Analysis of the Student Refractive Error and Eyeglass Questionnaire
Crescioni, Mabel; Messer, Dawn H.; Warholak, Terri L.; Miller, Joseph M.; Twelker, J. Daniel; Harvey, Erin M.
2014-01-01
Purpose To evaluate and refine a newly developed instrument, the Student Refractive Error and Eyeglasses Questionnaire (SREEQ), designed to measure the impact of uncorrected and corrected refractive error on vision-related quality of life (VRQoL) in school-aged children. Methods. A 38 statement instrument consisting of two parts was developed: Part A relates to perceptions regarding uncorrected vision and Part B relates to perceptions regarding corrected vision and includes other statements regarding VRQoL with spectacle correction. The SREEQ was administered to 200 Native American 6th through 12th grade students known to have previously worn and who currently require eyeglasses. Rasch analysis was conducted to evaluate the functioning of the SREEQ. Statements on Part A and Part B were analyzed to examine the dimensionality and constructs of the questionnaire, how well the items functioned, and the appropriateness of the response scale used. Results Rasch analysis suggested two items be eliminated and the measurement scale for matching items be reduced from a 4-point response scale to a 3-point response scale. With these modifications, categorical data were converted to interval level data, to conduct an item and person analysis. A shortened version of the SREEQ was constructed with these modifications, the SREEQ-R, which included the statements that were able to capture changes in VRQoL associated with spectacle wear for those with significant refractive error in our study population. Conclusions While the SREEQ Part B appears to be a have less than optimal reliability to assess the impact of spectacle correction on VRQoL in our student population, it is also able to detect statistically significant differences from pretest to posttest on both the group and individual levels to show that the instrument can assess the impact that glasses have on VRQoL. Further modifications to the questionnaire, such as those included in the SREEQ-R, could enhance its functionality. PMID:24811844
Negative Symptom Dimensions of the Positive and Negative Syndrome Scale Across Geographical Regions
Liharska, Lora; Harvey, Philip D.; Atkins, Alexandra; Ulshen, Daniel; Keefe, Richard S.E.
2017-01-01
Objective: Recognizing the discrete dimensions that underlie negative symptoms in schizophrenia and how these dimensions are understood across localities might result in better understanding and treatment of these symptoms. To this end, the objectives of this study were to 1) identify the Positive and Negative Syndrome Scale negative symptom dimensions of expressive deficits and experiential deficits and 2) analyze performance on these dimensions over 15 geographical regions to determine whether the items defining them manifest similar reliability across these regions. Design: Data were obtained for the baseline Positive and Negative Syndrome Scale visits of 6,889 subjects across 15 geographical regions. Using confirmatory factor analysis, we examined whether a two-factor negative symptom structure that is found in schizophrenia (experiential deficits and expressive deficits) would be replicated in our sample, and using differential item functioning, we tested the degree to which specific items from each negative symptom subfactor performed across geographical regions in comparison with the United States. Results: The two-factor negative symptom solution was replicated in this sample. Most geographical regions showed moderate-to-large differential item functioning for Positive and Negative Syndrome Scale expressive deficit items, especially N3 Poor Rapport, as compared with Positive and Negative Syndrome Scale experiential deficit items, showing that these items might be interpreted or scored differently in different regions. Across countries, except for India, the differential item functioning values did not favor raters in the United States. Conclusion: These results suggest that the Positive and Negative Syndrome Scale negative symptom factor can be better represented by a two-factor model than by a single-factor model. Additionally, the results show significant differences in responses to items representing the Positive and Negative Syndrome Scale expressive factors, but not the experiential factors, across regions. This could be due to a lack of equivalence between the original and translated versions, cultural differences with the interpretation of items, dissimilarities in rater training, or diversity in the understanding of scoring anchors. Knowing which items are challenging for raters across regions can help to guide Positive and Negative Syndrome Scale training and improve the results of international clinical trials aimed at negative symptoms. PMID:29410935
Computerized Adaptive Test (CAT) Applications and Item Response Theory Models for Polytomous Items
ERIC Educational Resources Information Center
Aybek, Eren Can; Demirtasli, R. Nukhet
2017-01-01
This article aims to provide a theoretical framework for computerized adaptive tests (CAT) and item response theory models for polytomous items. Besides that, it aims to introduce the simulation and live CAT software to the related researchers. Computerized adaptive test algorithm, assumptions of item response theory models, nominal response…
Bayesian inference in an item response theory model with a generalized student t link function
NASA Astrophysics Data System (ADS)
Azevedo, Caio L. N.; Migon, Helio S.
2012-10-01
In this paper we introduce a new item response theory (IRT) model with a generalized Student t-link function with unknown degrees of freedom (df), named generalized t-link (GtL) IRT model. In this model we consider only the difficulty parameter in the item response function. GtL is an alternative to the two parameter logit and probit models, since the degrees of freedom (df) play a similar role to the discrimination parameter. However, the behavior of the curves of the GtL is different from those of the two parameter models and the usual Student t link, since in GtL the curve obtained from different df's can cross the probit curves in more than one latent trait level. The GtL model has similar proprieties to the generalized linear mixed models, such as the existence of sufficient statistics and easy parameter interpretation. Also, many techniques of parameter estimation, model fit assessment and residual analysis developed for that models can be used for the GtL model. We develop fully Bayesian estimation and model fit assessment tools through a Metropolis-Hastings step within Gibbs sampling algorithm. We consider a prior sensitivity choice concerning the degrees of freedom. The simulation study indicates that the algorithm recovers all parameters properly. In addition, some Bayesian model fit assessment tools are considered. Finally, a real data set is analyzed using our approach and other usual models. The results indicate that our model fits the data better than the two parameter models.
ERIC Educational Resources Information Center
Ito, Kyoko; Sykes, Robert C.
This study investigated the practice of weighting a type of test item, such as constructed response, more than other types of items, such as selected response, to compute student scores for a mixed-item type of test. The study used data from statewide writing field tests in grades 3, 5, and 8 and considered two contexts, that in which a single…
ERIC Educational Resources Information Center
Willoughby, Michael T.; Wirth, R. J.; Blair, Clancy B.
2011-01-01
This study demonstrates the merits of evaluating a newly developed battery of executive function tasks, designed for use in early childhood, from the perspective of item response theory (IRT). The battery was included in the 48-month assessment of the Family Life Project, a prospective longitudinal study of 1292 children oversampled from…
ERIC Educational Resources Information Center
Montpetit, Kathleen; Haley, Stephen; Bilodeau, Nathalie; Ni, Pengsheng; Tian, Feng; Gorton, George, III; Mulcahey, M. J.
2011-01-01
This article reports on the content range and measurement precision of an upper extremity (UE) computer adaptive testing (CAT) platform of physical function in children with cerebral palsy. Upper extremity items representing skills of all abilities were administered to 305 parents. These responses were compared with two traditional standardized…
Lin, Ching-Hua; Huang, Chun-Jen; Chen, Cheng-Chung
2018-01-01
The burden of major depressive disorder includes suffering due to symptom severity, functional impairment, and quality of life deficits. The aim of this study was to compare the differences between electroconvulsive therapy and pharmacotherapy in reducing such burdens. This was a pooled analysis study including 2 open-label trials for major depressive disorder inpatients receiving either standard bitemporal and modified electroconvulsive therapy with a maximum of 12 sessions or 20 mg/d of fluoxetine for 6 weeks. Symptom severity, functioning, and quality of life were assessed using the 17-item Hamilton Rating Scale for Depression, the Modified Work and Social Adjustment Scale, and SF-36. Side effects following treatment, including subjective memory impairment, nausea/vomiting, and headache, were recorded. The differences between these 2 groups in 17-item Hamilton Rating Scale for Depression, Modified Work and Social Adjustment Scale, quality of life, side effects, and time to response (at least a 50% reduction of 17-item Hamilton Rating Scale for Depression) and remission (17-item Hamilton Rating Scale for Depression ≤7) following treatment were analyzed. Electroconvulsive therapy (n=116) showed a significantly greater reduction in 17-item Hamilton Rating Scale for Depression, Modified Work and Social Adjustment Scale, and quality of life deficits and had significantly shorter time to response/remission than fluoxetine (n=126). However, the electroconvulsive therapy group was more likely to experience subjective memory impairment and headache. Compared with fluoxetine, electroconvulsive therapy was more effective in alleviating the burden of major depressive disorder and had a substantially increased speed of response/remission in the acute phase. Increased education and information about electroconvulsive therapy for clinicians, patients, and their families and the general public is warranted. © The Author(s) 2017. Published by Oxford University Press on behalf of CINP.
Hung, Man; Baumhauer, Judith F; Latt, L Daniel; Saltzman, Charles L; SooHoo, Nelson F; Hunt, Kenneth J
2013-11-01
In 2012, the American Orthopaedic Foot & Ankle Society(®) established a national network for collecting and sharing data on treatment outcomes and improving patient care. One of the network's initiatives is to explore the use of computerized adaptive tests (CATs) for patient-level outcome reporting. We determined whether the CAT from the NIH Patient Reported Outcome Measurement Information System(®) (PROMIS(®)) Physical Function (PF) item bank provides efficient, reliable, valid, precise, and adequately covered point estimates of patients' physical function. After informed consent, 288 patients with a mean age of 51 years (range, 18-81 years) undergoing surgery for common foot and ankle problems completed a web-based questionnaire. Efficiency was determined by time for test administration. Reliability was assessed with person and item reliability estimates. Validity evaluation included content validity from expert review and construct validity measured against the PROMIS(®) Pain CAT and patient responses based on tradeoff perceptions. Precision was assessed by standard error of measurement (SEM) across patients' physical function levels. Instrument coverage was based on a person-item map. Average time of test administration was 47 seconds. Reliability was 0.96 for person and 0.99 for item. Construct validity against the Pain CAT had an r value of -0.657 (p < 0.001). Precision had an SEM of less than 3.3 (equivalent to a Cronbach's alpha of ≥ 0.90) across a broad range of function. Concerning coverage, the ceiling effect was 0.32% and there was no floor effect. The PROMIS(®) PF CAT appears to be an excellent method for measuring outcomes for patients with foot and ankle surgery. Further validation of the PROMIS(®) item banks may ultimately provide a valid and reliable tool for measuring patient-reported outcomes after injuries and treatment.
2013-01-01
Introduction: Craving is useful in the diagnosis of drug dependence, but it is unclear how various items used to assess craving might influence the diagnostic performance of craving measures. This study determined the diagnostic performance of individual items and item subgroups of the 32-item Questionnaire on Smoking Urges (QSU) as a function of item wording, level of craving intensity, and item stability. Methods: Nondaily and daily smokers (n = 222) completed the QSU on 6 separate occasions, and item responses were averaged across the administrations. Nicotine dependence was assessed with the Wisconsin Inventory of Smoking Dependence Motives. The discriminative performance of the QSU items was evaluated with receiver-operating characteristic curves and area under the curve statistics. Results: Although each of the QSU items and selected subgroups of items significantly discriminated dependent from nondependent smokers, certain item subgroups outperformed others. There was no difference in discriminative performance between use of the specific terms urge and crave or between items assessing intention to smoke relative to those assessing desire to smoke, but there were significant differences in the two major factors represented on the QSU and in craving items reflecting more intense relative to less intense craving. Stability of the item scores was strongly related to the discriminative performance of craving. Conclusions: Items indexing stable, high-intensity aspects of craving that reflect the negative reinforcing effects of smoking will likely be most useful for diagnostic purposes. Future directions and implications are discussed. PMID:23817585
Germeroth, Lisa J; Wray, Jennifer M; Gass, Julie C; Tiffany, Stephen T
2013-12-01
Craving is useful in the diagnosis of drug dependence, but it is unclear how various items used to assess craving might influence the diagnostic performance of craving measures. This study determined the diagnostic performance of individual items and item subgroups of the 32-item Questionnaire on Smoking Urges (QSU) as a function of item wording, level of craving intensity, and item stability. Nondaily and daily smokers (n = 222) completed the QSU on 6 separate occasions, and item responses were averaged across the administrations. Nicotine dependence was assessed with the Wisconsin Inventory of Smoking Dependence Motives. The discriminative performance of the QSU items was evaluated with receiver-operating characteristic curves and area under the curve statistics. Although each of the QSU items and selected subgroups of items significantly discriminated dependent from nondependent smokers, certain item subgroups outperformed others. There was no difference in discriminative performance between use of the specific terms urge and crave or between items assessing intention to smoke relative to those assessing desire to smoke, but there were significant differences in the two major factors represented on the QSU and in craving items reflecting more intense relative to less intense craving. Stability of the item scores was strongly related to the discriminative performance of craving. Items indexing stable, high-intensity aspects of craving that reflect the negative reinforcing effects of smoking will likely be most useful for diagnostic purposes. Future directions and implications are discussed.
An Effect Size Measure for Raju's Differential Functioning for Items and Tests
ERIC Educational Resources Information Center
Wright, Keith D.; Oshima, T. C.
2015-01-01
This study established an effect size measure for differential functioning for items and tests' noncompensatory differential item functioning (NCDIF). The Mantel-Haenszel parameter served as the benchmark for developing NCDIF's effect size measure for reporting moderate and large differential item functioning in test items. The effect size of…
Novakovic, A M; Krekels, E H J; Munafo, A; Ueckert, S; Karlsson, M O
2017-01-01
In this study, we report the development of the first item response theory (IRT) model within a pharmacometrics framework to characterize the disease progression in multiple sclerosis (MS), as measured by Expanded Disability Status Score (EDSS). Data were collected quarterly from a 96-week phase III clinical study by a blinder rater, involving 104,206 item-level observations from 1319 patients with relapsing-remitting MS (RRMS), treated with placebo or cladribine. Observed scores for each EDSS item were modeled describing the probability of a given score as a function of patients' (unobserved) disability using a logistic model. Longitudinal data from placebo arms were used to describe the disease progression over time, and the model was then extended to cladribine arms to characterize the drug effect. Sensitivity with respect to patient disability was calculated as Fisher information for each EDSS item, which were ranked according to the amount of information they contained. The IRT model was able to describe baseline and longitudinal EDSS data on item and total level. The final model suggested that cladribine treatment significantly slows disease-progression rate, with a 20% decrease in disease-progression rate compared to placebo, irrespective of exposure, and effects an additional exposure-dependent reduction in disability progression. Four out of eight items contained 80% of information for the given range of disabilities. This study has illustrated that IRT modeling is specifically suitable for accurate quantification of disease status and description and prediction of disease progression in phase 3 studies on RRMS, by integrating EDSS item-level data in a meaningful manner.
A Non-Parametric Item Response Theory Evaluation of the CAGE Instrument Among Older Adults.
Abdin, Edimansyah; Sagayadevan, Vathsala; Vaingankar, Janhavi Ajit; Picco, Louisa; Chong, Siow Ann; Subramaniam, Mythily
2018-02-23
The validity of the CAGE using item response theory (IRT) has not yet been examined in older adult population. This study aims to investigate the psychometric properties of the CAGE using both non-parametric and parametric IRT models, assess whether there is any differential item functioning (DIF) by age, gender and ethnicity and examine the measurement precision at the cut-off scores. We used data from the Well-being of the Singapore Elderly study to conduct Mokken scaling analysis (MSA), dichotomous Rasch and 2-parameter logistic IRT models. The measurement precision at the cut-off scores were evaluated using classification accuracy (CA) and classification consistency (CC). The MSA showed the overall scalability H index was 0.459, indicating a medium performing instrument. All items were found to be homogenous, measuring the same construct and able to discriminate well between respondents with high levels of the construct and the ones with lower levels. The item discrimination ranged from 1.07 to 6.73 while the item difficulty ranged from 0.33 to 2.80. Significant DIF was found for 2-item across ethnic group. More than 90% (CC and CA ranged from 92.5% to 94.3%) of the respondents were consistently and accurately classified by the CAGE cut-off scores of 2 and 3. The current study provides new evidence on the validity of the CAGE from the IRT perspective. This study provides valuable information of each item in the assessment of the overall severity of alcohol problem and the precision of the cut-off scores in older adult population.
Nayak, Madhabika B.; Bond, Jason C.; Greenfield, Thomas K.
2015-01-01
Background Efficient alcohol screening measures are important to prevent or treat alcohol use disorders (AUDs). Objectives We studied different versions of the Alcohol Use Disorders Identification Test (AUDIT) comparing their performance to the full AUDIT and an AUD measure as screeners for alcohol use problems in Goa, India. Methods Data from a general population study on 743 male drinkers aged 18 to 49 years are reported. Drinkers completed the AUDIT and an AUD measure. We created shorter versions of the AUDIT by a) collapsing AUDIT item responses into 3 and 2 categories and b) deleting 2 items with the lowest factor loadings. Each version was evaluated using factor, reliability and validity, and differential item functioning (DIF) analysis by age, education, standard of living index (SLI), and area of residence. Results A single factor solution was found for each version with lower factor loadings for items on guilt and concern. There were no significant differences among the different AUDIT versions in predicting AUD. No significant DIF was found by education, SLI or area of residence. DIF was observed for the alcohol frequency item by age. Conclusions/Importance The AUDIT may be used with dichotomized response options without loss of predictive validity. A shortened 8-item dichotomized scale can adequately screen for AUDs in Goa when brevity is of paramount importance, although with lower predictive validity. Although the frequency item was endorsed more by older men, there is no evidence that the AUDIT items perform differently in other groups of male drinkers in Goa. PMID:26549791
Writing, Evaluating and Assessing Data Response Items in Economics.
ERIC Educational Resources Information Center
Trotman-Dickenson, D. I.
1989-01-01
Describes some of the problems in writing data response items in economics for use by A Level and General Certificate of Secondary Education (GCSE) students. Examines the experience of two series of workshops on writing items, evaluating them and assessing responses from schools. Offers suggestions for producing packages of data response items as…
Item Response Modeling with Sum Scores
ERIC Educational Resources Information Center
Johnson, Timothy R.
2013-01-01
One of the distinctions between classical test theory and item response theory is that the former focuses on sum scores and their relationship to true scores, whereas the latter concerns item responses and their relationship to latent scores. Although item response theory is often viewed as the richer of the two theories, sum scores are still…
Osman, Augustine; Lamis, Dorian A; Bagge, Courtney L; Freedenthal, Stacey; Barnes, Sean M
2016-01-01
We examined the factor structure and psychometric properties of the Mindful Attention Awareness Scale (MAAS) in a sample of 810 undergraduate students. Using common exploratory factor analysis (EFA), we obtained evidence for a 1-factor solution (41.84% common variance). To confirm unidimensionality of the 15-item MAAS, we conducted a 1-factor confirmatory factor analysis (CFA). Results of the EFA and CFA, respectively, provided support for a unidimensional model. Using differential item functioning analysis methods within item response theory modeling (IRT-based DIF), we found that individuals with high and low levels of nonattachment responded similarly to the MAAS items. Following a detailed item analysis, we proposed a 5-item short version of the instrument and present descriptive statistics and composite score reliability for the short and full versions of the MAAS. Finally, correlation analyses showed that scores on the full and short versions of the MAAS were associated with measures assessing related constructs. The 5-item MAAS is as useful as the original MAAS in enhancing our understanding of the mindfulness construct.
Item Information and Discrimination Functions for Trinary PCM Items.
ERIC Educational Resources Information Center
Akkermans, Wies; Muraki, Eiji
1997-01-01
For trinary partial credit items, the shape of the item information and item discrimination functions is examined in relation to the item parameters. Conditions under which these functions are unimodal and bimodal are discussed, and the locations and values of maxima are derived. Practical relevance of the results is discussed. (SLD)
2012-01-01
Background The mini-Mental Adjustment to Cancer Scale (mini-MAC) is a well-recognised, popular measure of coping in psycho-oncology and assesses five cancer-specific coping strategies. It has been suggested that these five subscales could be grouped to form the over-arching adaptive and maladptive coping subscales to facilitate the interpretation and clinical application of the scale. Despite the popularity of the mini-MAC, few studies have examined its psychometric properties among long-term cancer survivors, and further validation of the mini-MAC is needed to substantiate its use with the growing population of survivors. Therefore, this study examined the psychometric properties and dimensionality of the mini-MAC in a sample of long-term cancer survivors using Rasch analysis. Methods RUMM 2030 was used to analyse the mini-MAC data (n=851). Separate Rasch analyses were conducted for each of the original mini-MAC subscales as well as the over-arching adaptive and maladaptive coping subscales to examine summary and individual model fit statistics, person separation index (PSI), response format, local dependency, targeting, item bias (or differential item functioning -DIF), and dimensionality. Results For the fighting spirit, fatalism, and helplessness-hopelessness subscales, a revised three-point response format seemed more optimal than the original four-point response. To achieve model fit, items were deleted from four of the five subscales – Anxious Preoccupation items 7, 25, and 29; Cognitive Avoidance items 11 and 17; Fighting Spirit item 18; and Helplessness-Hopelessness items 16 and 20. For those subscales with sufficient items, analyses supported unidimensionality. Combining items to form the adaptive and maladaptive subscales was partially supported. Conclusions The original five subscales required item deletion and/or rescaling to improve goodness of fit to the Rasch model. While evidence was found for overarching subscales of adaptive and maladaptive coping, extensive modifications were necessary to achieve this result. Further exploration and validation of over-arching subscales assessing adaptive and maladaptive coping is necessary with cancer survivors. PMID:22607052
ERIC Educational Resources Information Center
Thurman, Carol
2009-01-01
The increased use of polytomous item formats has led assessment developers to pay greater attention to the detection of differential item functioning (DIF) in these items. DIF occurs when an item performs differently for two contrasting groups of respondents (e.g., males versus females) after controlling for differences in the abilities of the…
Galindo-Garre, Francisca; Hidalgo, María Dolores; Guilera, Georgina; Pino, Oscar; Rojo, J Emilio; Gómez-Benito, Juana
2015-03-01
The World Health Organization Disability Assessment Schedule II (WHO-DAS II) is a multidimensional instrument developed for measuring disability. It comprises six domains (getting around, self-care, getting along with others, life activities and participation in society). The main purpose of this paper is the evaluation of the psychometric properties for each domain of the WHO-DAS II with parametric and non-parametric Item Response Theory (IRT) models. A secondary objective is to assess whether the WHO-DAS II items within each domain form a hierarchy of invariantly ordered severity indicators of disability. A sample of 352 patients with a schizophrenia spectrum disorder is used in this study. The 36 items WHO-DAS II was administered during the consultation. Partial Credit and Mokken scale models are used to study the psychometric properties of the questionnaire. The psychometric properties of the WHO-DAS II scale are satisfactory for all the domains. However, we identify a few items that do not discriminate satisfactorily between different levels of disability and cannot be invariantly ordered in the scale. In conclusion the WHO-DAS II can be used to assess overall disability in patients with schizophrenia, but some domains are too general to assess functionality in these patients because they contain items that are not applicable to this pathology. Copyright © 2014 John Wiley & Sons, Ltd.
Grigg, Kaine; Manderson, Lenore
2016-03-17
Racism and associated discrimination are pervasive and persistent challenges with multiple cumulative deleterious effects contributing to inequities in various health outcomes. Globally, research over the past decade has shown consistent associations between racism and negative health concerns. Such research confirms that race endures as one of the strongest predictors of poor health. Due to the lack of validated Australian measures of racist attitudes, RACES (Racism, Acceptance, and Cultural-Ethnocentrism Scale) was developed. Here, we examine RACES' psychometric properties, including the latent structure, utilising Item Response Theory (IRT). Unidimensional and Multidimensional Rating Scale Model (RSM) Rasch analyses were utilised with 296 Victorian primary school students and 182 adolescents and 220 adults from the Australian community. RACES was demonstrated to be a robust 24-item three-dimensional scale of Accepting Attitudes (12 items), Racist Attitudes (8 items), and Ethnocentric Attitudes (4 items). RSM Rasch analyses provide strong support for the instrument as a robust measure of racist attitudes in the Australian context, and for the overall factorial and construct validity of RACES across primary school children, adolescents, and adults. RACES provides a reliable and valid measure that can be utilised across the lifespan to evaluate attitudes towards all racial, ethnic, cultural, and religious groups. A core function of RACES is to assess the effectiveness of interventions to reduce community levels of racism and in turn inequities in health outcomes within Australia.
Development and assessment of floor and ceiling items for the PROMIS physical function item bank
2013-01-01
Introduction Disability and Physical Function (PF) outcome assessment has had limited ability to measure functional status at the floor (very poor functional abilities) or the ceiling (very high functional abilities). We sought to identify, develop and evaluate new floor and ceiling items to enable broader and more precise assessment of PF outcomes for the NIH Patient-Reported-Outcomes Measurement Information System (PROMIS). Methods We conducted two cross-sectional studies using NIH PROMIS item improvement protocols with expert review, participant survey and focus group methods. In Study 1, respondents with low PF abilities evaluated new floor items, and those with high PF abilities evaluated new ceiling items for clarity, importance and relevance. In Study 2, we compared difficulty ratings of new floor items by low functioning respondents and ceiling items by high functioning respondents to reference PROMIS PF-10 items. We used frequencies, percentages, means and standard deviations to analyze the data. Results In Study 1, low (n = 84) and high (n = 90) functioning respondents were mostly White, women, 70 years old, with some college, and disability scores of 0.62 and 0.30. More than 90% of the 31 new floor and 31 new ceiling items were rated as clear, important and relevant, leaving 26 ceiling and 30 floor items for Study 2. Low (n = 246) and high (n = 637) functioning Study 2 respondents were mostly White, women, 70 years old, with some college, and Health Assessment Questionnaire (HAQ) scores of 1.62 and 0.003. Compared to difficulty ratings of reference items, ceiling items were rated to be 10% more to greater than 40% more difficult to do, and floor items were rated to be about 12% to nearly 90% less difficult to do. Conclusions These new floor and ceiling items considerably extend the measurable range of physical function at either extreme. They will help improve instrument performance in populations with broad functional ranges and those concentrated at one or the other extreme ends of functioning. Optimal use of these new items will be assisted by computerized adaptive testing (CAT), reducing questionnaire burden and insuring item administration to appropriate individuals. PMID:24286166
Further evaluation of leisure items in the attention condition of functional analyses.
Roscoe, Eileen M; Carreau, Abbey; MacDonald, Jackie; Pence, Sacha T
2008-01-01
Research suggests that including leisure items in the attention condition of a functional analysis may produce engagement that masks sensitivity to attention. In this study, 4 individuals' initial functional analyses indicated that behavior was maintained by nonsocial variables (n = 3) or by attention (n = 1). A preference assessment was used to identify items for subsequent functional analyses. Four conditions were compared, attention with and without leisure items and control with and without leisure items. Following this, either high- or low-preference items were included in the attention condition. Problem behavior was more probable during the attention condition when no leisure items or low-preference items were included, and lower levels of problem behavior were observed during the attention condition when high-preference leisure items were included. These findings suggest how preferred items may hinder detection of behavioral function.
Vaingankar, Janhavi Ajit; Subramaniam, Mythily; Chong, Siow Ann; Abdin, Edimansyah; Orlando Edelen, Maria; Picco, Louisa; Lim, Yee Wei; Phua, Mei Yen; Chua, Boon Yiang; Tee, Joseph Y S; Sherbourne, Cathy
2011-10-31
Instruments to measure mental health and well-being are largely developed and often used within Western populations and this compromises their validity in other cultures. A previous qualitative study in Singapore demonstrated the relevance of spiritual and religious practices to mental health, a dimension currently not included in exiting multi-dimensional measures. The objective of this study was to develop a self-administered measure that covers all key and culturally appropriate domains of mental health, which can be applied to compare levels of mental health across different age, gender and ethnic groups. We present the item reduction and validation of the Positive Mental Health (PMH) instrument in a community-based adult sample in Singapore. Surveys were conducted among adult (21-65 years) residents belonging to Chinese, Malay and Indian ethnicities. Exploratory and confirmatory factor analysis (EFA, CFA) were conducted and items were reduced using item response theory tests (IRT). The final version of the PMH instrument was tested for internal consistency and criterion validity. Items were tested for differential item functioning (DIF) to check if items functioned in the same way across all subgroups. EFA and CFA identified six first-order factor structure (General coping, Personal growth and autonomy, Spirituality, Interpersonal skills, Emotional support, and Global affect) under one higher-order dimension of Positive Mental Health (RMSEA=0.05, CFI=0.96, TLI=0.96). A 47-item self-administered multi-dimensional instrument with a six-point Likert response scale was constructed. The slope estimates and strength of the relation to the theta for all items in each six PMH subscales were high (range:1.39 to 5.69), suggesting good discrimination properties. The threshold estimates for the instrument ranged from -3.45 to 1.61 indicating that the instrument covers entire spectrums for the six dimensions. The instrument demonstrated high internal consistency and had significant and expected correlations with other well-being measures. Results confirmed absence of DIF. The PMH instrument is a reliable and valid instrument that can be used to measure and compare level of mental health across different age, gender and ethnic groups in Singapore.
Walton, David M; Beattie, Tyler; Putos, Joseph; MacDermid, Joy C
2016-06-01
The Brief Pain Inventory is composed of two quantifiable scales: pain severity and pain interference. The reported factor structure of the interference subscale is not consistent in the extant literature, with no clear choice between a single- or two-factor structure. Here, we report on the results of Rasch-based analysis of the interference subscale using a large population-based ambulatory patient database (the Quebec Pain Registry). Observational cohort. A total of 1,000 responses were randomly drawn from a total database of 5,654 for this analysis. Both the original 7-item and an expanded 10-item version (Tyler 2002) of the interference subscale were evaluated. Rasch analysis revealed significant misfit of both versions of the scale, with the original 7-item version outperforming the expanded 10-item version. Analysis of dimensionality revealed that both versions showed improved model fit when considered two subscales (affective and physical interference) with the item on sleep interference removed or considered separately. Additionally, significant uniform differential item functioning was identified for 6 of the 7 original items when the sample was stratified by age above or below 55 years. The interference subscale achieved adequate model fit when considered as two separate subscales with age as a mediator of response, while interpreting the sleep interference item separately. A transformation matrix revealed that in all cases, ordinal-level change at the extreme ends of the scale appears to be more meaningful than does a similar change at the midpoints. The Interference subscale of the BPI should be interpreted as two separate subscales (Affective Interference, Physical Interference) with the sleep item removed or interpreted separately for optimal fit to the Rasch model. Implications for research and clinical use are discussed. Copyright © 2016 Elsevier Inc. All rights reserved.
Lynch, Andrew D; Dodds, Nathan E; Yu, Lan; Pilkonis, Paul A; Irrgang, James J
2016-05-11
The content and wording of the Patient Reported Outcome Measurement Information System (PROMIS) Physical Function and Pain Interference item banks have not been qualitatively assessed by individuals with knee joint impairments. The purpose of this investigation was to identify items in the PROMIS Physical Function and Pain Interference Item Banks that are irrelevant, unclear, or otherwise difficult to respond to for individuals with impairment of the knee and to suggest modifications based on cognitive interviews. Twenty-nine individuals with knee joint impairments qualitatively assessed items in the Pain Interference and Physical Function Item Banks in a mixed-methods cognitive interview. Field notes were analyzed to identify themes and frequency counts were calculated to identify items not relevant to individuals with knee joint impairments. Issues with clarity were identified in 23 items in the Physical Function Item Bank, resulting in the creation of 43 new or modified items, typically changing words within the item to be clearer. Interpretation issues included whether or not the knee joint played a significant role in overall health and age/gender differences in items. One quarter of the original items (31 of 124) in the Physical Function Item Bank were identified as irrelevant to the knee joint. All 41 items in the Pain Interference Item Bank were identified as clear, although individuals without significant pain substituted other symptoms which interfered with their life. The Physical Function Item Bank would benefit from additional items that are relevant to individuals with knee joint impairments and, by extension, to other lower extremity impairments. Several issues in clarity were identified that are likely to be present in other patient cohorts as well.
2016-01-01
Reports an error in "A violation of the conditional independence assumption in the two-high-threshold model of recognition memory" by Tina Chen, Jeffrey J. Starns and Caren M. Rotello (Journal of Experimental Psychology: Learning, Memory, and Cognition, 2015[Jul], Vol 41[4], 1215-1222). In the article, Chen et al. compared three models: a continuous signal detection model (SDT), a standard two-high-threshold discrete-state model in which detect states always led to correct responses (2HT), and a full-mapping version of the 2HT model in which detect states could lead to either correct or incorrect responses. After publication, Rani Moran (personal communication, April 21, 2015) identified two errors that impact the reported fit statistics for the Bayesian information criterion (BIC) metric of all models as well as the Akaike information criterion (AIC) results for the full-mapping model. The errors are described in the erratum. (The following abstract of the original article appeared in record 2014-56216-001.) The 2-high-threshold (2HT) model of recognition memory assumes that test items result in distinct internal states: they are either detected or not, and the probability of responding at a particular confidence level that an item is "old" or "new" depends on the state-response mapping parameters. The mapping parameters are independent of the probability that an item yields a particular state (e.g., both strong and weak items that are detected as old have the same probability of producing a highest-confidence "old" response). We tested this conditional independence assumption by presenting nouns 1, 2, or 4 times. To maximize the strength of some items, "superstrong" items were repeated 4 times and encoded in conjunction with pleasantness, imageability, anagram, and survival processing tasks. The 2HT model failed to simultaneously capture the response rate data for all item classes, demonstrating that the data violated the conditional independence assumption. In contrast, a Gaussian signal detection model, which posits that the level of confidence that an item is "old" or "new" is a function of its continuous strength value, provided a good account of the data. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Dodeen, Hamzeh; Al-Darmaki, Fatima
2016-12-01
The aim of this study was to determine the feasibility of generating a shorter version of the Emirati Marital Satisfaction Scale (EMSS) using item response theory (IRT)-based methodology. The EMSS is the first national scale used to provide an understanding of the family function and level of marital satisfaction within the cultural context of the United Arab Emirates. A sample of 1,049 Emirati married individuals from different ages, genders, places of residence, and monthly incomes participated in this study. The IRT was calibrated using X-Calibre 4.2 and the graded response model. The analysis was developed on the basis of a short form of the EMSS (7 items), which constitutes a promising alternative to the original scale for practitioners and researchers. This short version is reliable, valid, and it gives results very similar to the original scale. The results of this study confirmed the usefulness of IRT-based methodology for developing psychological and counseling scales. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Ventura, Joseph; Reise, Steven P; Keefe, Richard S E; Baade, Lyle E; Gold, James M; Green, Michael F; Kern, Robert S; Mesholam-Gately, Raquelle; Nuechterlein, Keith H; Seidman, Larry J; Bilder, Robert M
2010-08-01
Practical, reliable "real world" measures of cognition are needed to supplement neurocognitive performance data to evaluate possible efficacy of new drugs targeting cognitive deficits associated with schizophrenia. Because interview-based measures of cognition offer one possible approach, data from the MATRICS initiative (n=176) were used to examine the psychometric properties of the Schizophrenia Cognition Rating Scale (SCoRS) and the Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS). We used classical test theory methods and item response theory to derive the 10-item Cognitive Assessment Interview (CAI) from the SCoRS and CGI-CogS ("parent instruments"). Sources of information for CAI ratings included the patient and an informant. Validity analyses examined the relationship between the CAI and objective measures of cognitive functioning, intermediate measures of cognition, and functional outcome. The rater's score from the newly derived CAI (10 items) correlate highly (r=.87) with those from the combined set of the SCoRS and CGI-CogS (41 items). Both the patient (r=.82) and the informant (r=.95) data were highly correlated with the rater's score. The CAI was modestly correlated with objectively measured neurocognition (r=-.32), functional capacity (r=-.44), and functional outcome (r=-.32), which was comparable to the parent instruments. The CAI allows for expert judgment in evaluating a patient's cognitive functioning and was modestly correlated with neurocognitive functioning, functional capacity, and functional outcome. The CAI is a brief, repeatable, and potentially valuable tool for rating cognition in schizophrenia patients who are participating in clinical trials. Copyright 2010 Elsevier B.V. All rights reserved.
Ventura, Joseph; Reise, Steven P.; Keefe, Richard S. E.; Baade, Lyle E.; Gold, James M.; Green, Michael F.; Kern, Robert S.; Mesholam-Gately, Raquelle; Nuechterlein, Keith H.; Seidman, Larry J.; Bilder, Robert M.
2011-01-01
Background Practical, reliable “real world” measures of cognition are needed to supplement neurocognitive performance data to evaluate possible efficacy of new drugs targeting cognitive deficits associated with schizophrenia. Because interview-based measures of cognition offer one possible approach, data from the MATRICS initiative (n=176) were used to examine the psychometric properties of the Schizophrenia Cognition Rating Scale (SCoRS) and the Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS). Method We used classical test theory methods and item response theory to derive the 10 item Cognitive Assessment Interview (CAI) from the SCoRS and CGI-Cogs (“parent instruments”). Sources of information for CAI ratings included the patient and an informant. Validity analyses examined the relationship between the CAI and objective measures of cognitive functioning, intermediate measures of cognition, and functional outcome. Results The rater’s score from the newly derived CAI (10-items) correlate highly (r = .87) with those from the combined set of the SCoRS and CGI-CogS (41 items). Both the patient (r= .82) and the informant (r= .95) data were highly correlated with the rater’s score. The CAI was modestly correlated with objectively measured neurocognition (r = −.32), functional capacity (r = −.44), and functional outcome (r = −.32), which was comparable to the parent instruments. Conclusions The CAI allows for expert judgment in evaluating a patient’s cognitive functioning and was modestly correlated with neurocognitive functioning, functional capacity, and functional outcome. The CAI is a brief, repeatable, and potentially valuable tool for rating cognition in schizophrenia patients who are participating in clinical trials. PMID:20542412
Paap, Muirne C S; Lenferink, Lonneke I M; Herzog, Nadine; Kroeze, Karel A; van der Palen, Job
2016-06-27
Health-related quality of life (HRQoL) is widely used as an outcome measure in the evaluation of treatment interventions in patients with chronic obstructive pulmonary disease (COPD). In order to address challenges associated with existing fixed-length measures (e.g., too long to be used routinely, too short to ensure both content validity and reliability), a COPD-specific item bank (COPD-SIB) was developed. Items were selected based on literature review and interviews with Dutch COPD patients, with a strong focus on both content validity and item comprehension. The psychometric quality of the item bank was evaluated using Mokken Scale Analysis and parametric Item Response Theory, using data of 666 COPD patients. The final item bank contains 46 items that form a strong scale, tapping into eight important themes that were identified based on literature review and patient interviews: Coping with disease/symptoms, adaptability; Autonomy; Anxiety about the course/end-state of the disease, hopelessness; Positive psychological functioning; Situations triggering or enhancing breathing problems; Symptoms; Activity; Impact. The 46-item COPD-SIB has good psychometric properties and content validity. Items are available in Dutch and English. The COPD-SIB can be used as a stand-alone instrument, or to inform computerised adaptive testing.
Daly, Justine B; Campbell, Elizabeth M; Wiggers, John H; Considine, Robyn J
2002-06-01
This study aimed to determine the prevalence of responsible hospitality policies in a group of licensed premises associated with alcohol-related harm. During March 1999, 108 licensed premises with one or more police-identified alcohol-related incidents in the previous 3 months received a visit from a police officer. A 30-item audit checklist was used to determine the responsible hospitality policies being undertaken by each premises within eight policy domains: display required signage (three items); responsible host practices to prevent intoxication and under-age drinking (five items); written policies and guidelines for responsible service (three items); discouraging inappropriate promotions (three items); safe transport (two items); responsible management issues (seven items); physical environment (three items) and entry conditions (four items). No premises were undertaking all 30 items. Eighty per cent of the premises were undertaking 20 of the 30 items. All premises were undertaking at least 17 of the items. The proportion of premises undertaking individual items ranged from 16% to 100%. Premises were less likely to report having and providing written responsible hospitality documentation to staff, using door charges and having entry/re-entry rules. Significant differences between rural and urban premises were evident for four policies. Clubs were significantly more likely than hotels to have a written responsible service of alcohol policy and to clearly display codes of dress and conditions of entry. This study provides an indication of the extent and nature of responsible hospitality policies in a sample of licensed premises that are associated with a broad range of alcohol related harms. The finding that a large majority of such premises appear to adopt responsible hospitality policies suggests a need to assess the validity and reliability of tools used in the routine assessment of such policies, and of the potential for harm from licensed premises.
Item Response Data Analysis Using Stata Item Response Theory Package
ERIC Educational Resources Information Center
Yang, Ji Seung; Zheng, Xiaying
2018-01-01
The purpose of this article is to introduce and review the capability and performance of the Stata item response theory (IRT) package that is available from Stata v.14, 2015. Using a simulated data set and a publicly available item response data set extracted from Programme of International Student Assessment, we review the IRT package from…
Item Response Models for Local Dependence among Multiple Ratings
ERIC Educational Resources Information Center
Wang, Wen-Chung; Su, Chi-Ming; Qiu, Xue-Lan
2014-01-01
Ratings given to the same item response may have a stronger correlation than those given to different item responses, especially when raters interact with one another before giving ratings. The rater bundle model was developed to account for such local dependence by forming multiple ratings given to an item response as a bundle and assigning…
Harpole, Jared K; Levinson, Cheri A; Woods, Carol M; Rodebaugh, Thomas L; Weeks, Justin W; Brown, Patrick J; Heimberg, Richard G; Menatti, Andrew R; Blanco, Carlos; Schneier, Franklin; Liebowitz, Michael
2015-06-01
The Brief Fear of Negative Evaluation Scale (BFNE; Leary Personality and Social Psychology Bulletin , 9, 371-375, 1983) assesses fear and worry about receiving negative evaluation from others. Rodebaugh et al. Psychological Assessment, 16 , 169-181, (2004) found that the BFNE is composed of a reverse-worded factor (BFNE-R) and straightforwardly-worded factor (BFNE-S). Further, they found the BFNE-S to have better psychometric properties and provide more information than the BFNE-R. Currently there is a lack of research regarding the measurement invariance of the BFNE-S across gender and ethnicity with respect to item thresholds. The present study uses item response theory (IRT) to test the BFNE-S for differential item functioning (DIF) related to gender and ethnicity (White, Asian, and Black). Six data sets consisting of clinical, community, and undergraduate participants were utilized ( N =2,109). The factor structure of the BFNE-S was confirmed using categorical confirmatory factor analysis, IRT model assumptions were tested, and the BFNE-S was evaluated for DIF. Item nine demonstrated significant non-uniform DIF between White and Black participants. No other items showed significant uniform or non-uniform DIF across gender or ethnicity. Results suggest the BFNE-S can be used reliably with men and women and Asian and White participants. More research is needed to understand the implications of using the BFNE-S with Black participants.
Pedraza, Otto; Graff-Radford, Neill R.; Smith, Glenn E.; Ivnik, Robert J.; Willis, Floyd B.; Petersen, Ronald C.; Lucas, John A.
2010-01-01
Scores on the Boston Naming Test (BNT) are frequently lower for African American when compared to Caucasian adults. Although demographically-based norms can mitigate the impact of this discrepancy on the likelihood of erroneous diagnostic impressions, a growing consensus suggests that group norms do not sufficiently address or advance our understanding of the underlying psychometric and sociocultural factors that lead to between-group score discrepancies. Using item response theory and methods to detect differential item functioning (DIF), the current investigation moves beyond comparisons of the summed total score to examine whether the conditional probability of responding correctly to individual BNT items differs between African American and Caucasian adults. Participants included 670 adults age 52 and older who took part in Mayo's Older Americans and Older African Americans Normative Studies. Under a 2-parameter logistic IRT framework and after correction for the false discovery rate, 12 items where shown to demonstrate DIF. Six of these 12 items (“dominoes,” “escalator,” “muzzle,” “latch,” “tripod,” and “palette”) were also identified in additional analyses using hierarchical logistic regression models and represent the strongest evidence for race/ethnicity-based DIF. These findings afford a finer characterization of the psychometric properties of the BNT and expand our understanding of between-group performance. PMID:19570311
Romero, Dulce; Ricarte, Jorge J.; Serrano, Juan P.; Nieto, Marta; Latorre, Jose M.
2018-01-01
The Autobiographical Memory Test (AMT) is the most widely used measure of overgeneral autobiographical memory (OGM). The AMT appears to have good psychometric properties, but more research is needed on the influence and applicability of individual cue words in different languages and populations. To date, no studies have evaluated its usefulness as a measure of OMG in Spanish or older populations. This work aims to analyze the applicability of the AMT in young and older Spanish samples. We administered a Spanish version of the AMT to samples of young (N = 520) and older adults (N = 155). We conducted confirmatory factor analysis (CFA), item response theory-based analysis (IRT) and differential item functioning (DIF). Results confirm the one-factor structure for the AMT. IRT analysis suggests that both groups find the AMT easy given that they generally perform well, and that it is more precise in individuals who score low on memory specificity. DIF analysis finds three items differ in their functioning depending on age group. This differential functioning of these items affects the overall AMT scores and, thus, they should be excluded from the AMT in studies comparing young and older samples. We discuss the possible implications of the samples and cue words used. PMID:29672583
Ros, Laura; Romero, Dulce; Ricarte, Jorge J; Serrano, Juan P; Nieto, Marta; Latorre, Jose M
2018-01-01
The Autobiographical Memory Test (AMT) is the most widely used measure of overgeneral autobiographical memory (OGM). The AMT appears to have good psychometric properties, but more research is needed on the influence and applicability of individual cue words in different languages and populations. To date, no studies have evaluated its usefulness as a measure of OMG in Spanish or older populations. This work aims to analyze the applicability of the AMT in young and older Spanish samples. We administered a Spanish version of the AMT to samples of young (N = 520) and older adults (N = 155). We conducted confirmatory factor analysis (CFA), item response theory-based analysis (IRT) and differential item functioning (DIF). Results confirm the one-factor structure for the AMT. IRT analysis suggests that both groups find the AMT easy given that they generally perform well, and that it is more precise in individuals who score low on memory specificity. DIF analysis finds three items differ in their functioning depending on age group. This differential functioning of these items affects the overall AMT scores and, thus, they should be excluded from the AMT in studies comparing young and older samples. We discuss the possible implications of the samples and cue words used.
Cook, Karon F; Kallen, Michael A; Bombardier, Charles; Bamer, Alyssa M; Choi, Seung W; Kim, Jiseon; Salem, Rana; Amtmann, Dagmar
2017-01-01
To evaluate whether items of three measures of depressive symptoms function differently in persons with spinal cord injury (SCI) than in persons from a primary care sample. This study was a retrospective analysis of responses to the Patient Health Questionnaire depression scale, the Center for Epidemiological Studies Depression scale, and the National Institutes of Health Patient-Reported Outcomes Measurement Information System (PROMIS ® ) version 1.0 eight-item depression short form 8b (PROMIS-D). The presence of differential item function (DIF) was evaluated using ordinal logistic regression. No items of any of the three target measures were flagged for DIF based on standard criteria. In a follow-up sensitivity analyses, the criterion was changed to make the analysis more sensitive to potential DIF. Scores were corrected for DIF flagged under this criterion. Minimal differences were found between the original scores and those corrected for DIF under the sensitivity criterion. The three depression screening measures evaluated in this study did not perform differently in samples of individuals with SCI compared to general and community samples. Transdiagnostic symptoms did not appear to spuriously inflate depression severity estimates when administered to people with SCI.
Item response theory - A first approach
NASA Astrophysics Data System (ADS)
Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar
2017-07-01
The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.
The Development of a Nystagmus-Specific Quality-of-Life Questionnaire.
McLean, Rebecca J; Maconachie, Gail D E; Gottlob, Irene; Maltby, John
2016-09-01
To develop a nystagmus-specific quality-of-life (QOL) questionnaire derived from patient concerns based on eudaimonic aspects of well-being. Cross-sectional study. A total of 206 participants with nystagmus for factor analysis phase and an additional 42 participants with nystagmus for construct validity phase. Questionnaire items were written on the basis of the 6 domains of everyday living affected by nystagmus that were elicited by previous semistructured interviews conducted with 21 people with nystagmus. After consultation with 8 nystagmus experts, 37 items were administered to 206 people with nystagmus. Factor analysis was used to identify latent factors among the items and identify items to propose new nystagmus QOL scales. Cronbach's alpha was used to assess the internal reliability of the new scales. To assess for discriminate and concurrent validity between the new nystagmus scales and an existing vision-related QOL tool, the Visual Function Questionnaire-25 (VFQ-25) was administered to 42 additional participants. Questionnaire response scores on nystagmus-specific QOL items. The factor analysis revealed the retention of 29 items to form a measure comprising 2 distinct subscales reflecting "personal and social" and "physical and environmental" functioning as relating to nystagmus-specific QOL. The Cronbach's alpha coefficients for the "personal and social" functioning scale and "physical and environmental" functioning were 0.95 and 0.93, respectively. Tests for validity of the measure, consistent with a priori predictions, when compared with the VFQ-25, revealed the "physical and environmental" subscale showed concurrent validity (0.88), whereas the "personal and social" subscale was demonstrated to have discriminative validity (0.81). We have developed a 29-item, nystagmus-specific QOL questionnaire (NYS-29) based on eudaimonic aspects of well-being with subscales that address not only physical functioning but also psycho-social issues. The NYS-29 is grounded in the perspectives and concerns of those who have nystagmus and can be used to determine the impact of nystagmus on daily living in terms of both physical and psychosocial aspects. Copyright © 2016 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
Kalpakjian, Claire Z.; Tate, Denise G.; Kisala, Pamela A.; Tulsky, David S.
2015-01-01
Objective To describe the development and psychometric properties of the Spinal Cord Injury-Quality of Life (SCI-QOL) Self-esteem item bank. Design Using a mixed-methods design, we developed and tested a self-esteem item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory- (IRT) based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. Setting We tested a pool of 30 items at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters/Bronx Department of Veterans Affairs hospital. Participants A total of 717 individuals with SCI completed the self-esteem items. Results A unidimensional model was observed (CFI = 0.946; RMSEA = 0.087) and measurement precision was good (theta range between −2.7 and 0.7). Eleven items were flagged for DIF; however, effect sizes were negligible with little practical impact on score estimates. The final calibrated item bank resulted in 23 retained items. Conclusion This study indicates that the SCI-QOL Self-esteem item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010972
Kalpakjian, Claire Z; Tate, Denise G; Kisala, Pamela A; Tulsky, David S
2015-05-01
To describe the development and psychometric properties of the Spinal Cord Injury-Quality of Life (SCI-QOL) Self-esteem item bank. Using a mixed-methods design, we developed and tested a self-esteem item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory-(IRT) based analytic approaches, including tests of model fit, differential item functioning (DIF) and precision. We tested a pool of 30 items at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters/Bronx Department of Veterans Affairs hospital. A total of 717 individuals with SCI completed the self-esteem items. A unidimensional model was observed (CFI=0.946; RMSEA=0.087) and measurement precision was good (theta range between -2.7 and 0.7). Eleven items were flagged for DIF; however, effect sizes were negligible with little practical impact on score estimates. The final calibrated item bank resulted in 23 retained items. This study indicates that the SCI-QOL Self-esteem item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available.
Victorson, David; Tulsky, David S; Kisala, Pamela A; Kalpakjian, Claire Z; Weiland, Brian; Choi, Seung W
2015-05-01
To describe the development and psychometric properties of the Spinal Cord Injury--Quality of Life (SCI-QOL) Resilience item bank and short form. Using a mixed-methods design, we developed and tested a resilience item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory based analytic approaches, including tests of model fit and differential item functioning (DIF). We tested a 32-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs medical center. A total of 717 individuals with SCI completed the Resilience items. A unidimensional model was observed (CFI=0.968; RMSEA=0.074) and measurement precision was good (theta range between -3.1 and 0.9). Ten items were flagged for DIF, however, after examination of effect sizes we found this to be negligible with little practical impact on score estimates. The final calibrated item bank resulted in 21 retained items. This study indicates that the SCI-QOL Resilience item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available.
Victorson, David; Tulsky, David S.; Kisala, Pamela A.; Kalpakjian, Claire Z.; Weiland, Brian; Choi, Seung W.
2015-01-01
Objective To describe the development and psychometric properties of the Spinal Cord Injury - Quality of Life (SCI-QOL) Resilience item bank and short form. Design Using a mixed-methods design, we developed and tested a resilience item bank through the use of focus groups with individuals with SCI and clinicians with expertise in SCI, cognitive interviews, and item-response theory based analytic approaches, including tests of model fit and differential item functioning (DIF). Setting We tested a 32-item pool at several medical institutions across the United States, including the University of Michigan, Kessler Foundation, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs medical center. Participants A total of 717 individuals with SCI completed the Resilience items. Results A unidimensional model was observed (CFI = 0.968; RMSEA = 0.074) and measurement precision was good (theta range between −3.1 and 0.9). Ten items were flagged for DIF, however, after examination of effect sizes we found this to be negligible with little practical impact on score estimates. The final calibrated item bank resulted in 21 retained items. Conclusion This study indicates that the SCI-QOL Resilience item bank represents a psychometrically robust measurement tool. Short form items are also suggested and computer adaptive tests are available. PMID:26010971
Fieo, Robert; Ocepek-Welikson, Katja; Kleinman, Marjorie; Eimicke, Joseph P.; Crane, Paul K.; Cella, David; Teresi, Jeanne A.
2017-01-01
Aims The goals of these analyses were to examine the psychometric properties and measurement equivalence of a self-reported cognition measure, the Patient Reported Outcome Measurement Information System® (PROMIS®) Applied Cognition – General Concerns short form. These items are also found in the PROMIS Cognitive Function (version 2) item bank. This scale consists of eight items related to subjective cognitive concerns. Differential item functioning (DIF) analyses of gender, education, race, age, and (Spanish) language were performed using an ethnically diverse sample (n = 5,477) of individuals with cancer. This is the first analysis examining DIF in this item set across ethnic and racial groups. Methods DIF hypotheses were derived by asking content experts to indicate whether they posited DIF for each item and to specify the direction. The principal DIF analytic model was item response theory (IRT) using the graded response model for polytomous data, with accompanying Wald tests and measures of magnitude. Sensitivity analyses were conducted using ordinal logistic regression (OLR) with a latent conditioning variable. IRT-based reliability, precision and information indices were estimated. Results DIF was identified consistently only for the item, brain not working as well as usual. After correction for multiple comparisons, this item showed significant DIF for both the primary and sensitivity analyses. Black respondents and Hispanics in comparison to White non-Hispanic respondents evidenced a lower conditional probability of endorsing the item, brain not working as well as usual. The same pattern was observed for the education grouping variable: as compared to those with a graduate degree, conditioning on overall level of subjective cognitive concerns, those with less than high school education also had a lower probability of endorsing this item. DIF was also observed for age for two items after correction for multiple comparisons for both the IRT and OLR-based models: “I have had to work really hard to pay attention or I would make a mistake” and “I have had trouble shifting back and forth between different activities that require thinking”. For both items, conditional on cognitive complaints, older respondents had a higher likelihood than younger respondents of endorsing the item in the cognitive complaints direction. The magnitude and impact of DIF was minimal. The scale showed high precision along much of the subjective cognitive concerns continuum; the overall IRT-based reliability estimate for the total sample was 0.88 and the estimates for subgroups ranged from 0.87 to 0.92. Conclusion Little DIF of high magnitude or impact was observed in the PROMIS Applied Cognition – General Concerns short form item set. One item, “It has seemed like my brain was not working as well as usual” might be singled out for further study. However, in general the short form item set was highly reliable, informative, and invariant across differing race/ethnic, educational, age, gender, and language groups. PMID:28523238
Fieo, Robert; Ocepek-Welikson, Katja; Kleinman, Marjorie; Eimicke, Joseph P; Crane, Paul K; Cella, David; Teresi, Jeanne A
2016-01-01
The goals of these analyses were to examine the psychometric properties and measurement equivalence of a self-reported cognition measure, the Patient Reported Outcome Measurement Information System ® (PROMIS ® ) Applied Cognition - General Concerns short form. These items are also found in the PROMIS Cognitive Function (version 2) item bank. This scale consists of eight items related to subjective cognitive concerns. Differential item functioning (DIF) analyses of gender, education, race, age, and (Spanish) language were performed using an ethnically diverse sample ( n = 5,477) of individuals with cancer. This is the first analysis examining DIF in this item set across ethnic and racial groups. DIF hypotheses were derived by asking content experts to indicate whether they posited DIF for each item and to specify the direction. The principal DIF analytic model was item response theory (IRT) using the graded response model for polytomous data, with accompanying Wald tests and measures of magnitude. Sensitivity analyses were conducted using ordinal logistic regression (OLR) with a latent conditioning variable. IRT-based reliability, precision and information indices were estimated. DIF was identified consistently only for the item, brain not working as well as usual. After correction for multiple comparisons, this item showed significant DIF for both the primary and sensitivity analyses. Black respondents and Hispanics in comparison to White non-Hispanic respondents evidenced a lower conditional probability of endorsing the item, brain not working as well as usual. The same pattern was observed for the education grouping variable: as compared to those with a graduate degree, conditioning on overall level of subjective cognitive concerns, those with less than high school education also had a lower probability of endorsing this item. DIF was also observed for age for two items after correction for multiple comparisons for both the IRT and OLR-based models: "I have had to work really hard to pay attention or I would make a mistake" and "I have had trouble shifting back and forth between different activities that require thinking". For both items, conditional on cognitive complaints, older respondents had a higher likelihood than younger respondents of endorsing the item in the cognitive complaints direction. The magnitude and impact of DIF was minimal. The scale showed high precision along much of the subjective cognitive concerns continuum; the overall IRT-based reliability estimate for the total sample was 0.88 and the estimates for subgroups ranged from 0.87 to 0.92. Little DIF of high magnitude or impact was observed in the PROMIS Applied Cognition - General Concerns short form item set. One item, "It has seemed like my brain was not working as well as usual" might be singled out for further study. However, in general the short form item set was highly reliable, informative, and invariant across differing race/ethnic, educational, age, gender, and language groups.